2025-09-07T09:35:24.8127314Z Current runner version: '2.328.0' 2025-09-07T09:35:24.8132962Z Runner name: 'i-0d73070610f53945f-1005' 2025-09-07T09:35:24.8133727Z Runner group name: 'default' 2025-09-07T09:35:24.8134542Z Machine name: 'c9e10662379e' 2025-09-07T09:35:24.8137219Z ##[group]GITHUB_TOKEN Permissions 2025-09-07T09:35:24.8139258Z Contents: read 2025-09-07T09:35:24.8139816Z Metadata: read 2025-09-07T09:35:24.8140371Z ##[endgroup] 2025-09-07T09:35:24.8142737Z Secret source: Actions 2025-09-07T09:35:24.8143438Z Prepare workflow directory 2025-09-07T09:35:24.8630169Z Prepare all required actions 2025-09-07T09:35:24.8665491Z Getting action download info 2025-09-07T09:35:25.2051683Z Download action repository 'pytorch/test-infra@main' (SHA:548a4bc624d43a01cdf165a63b041f0ae014ddbd) 2025-09-07T09:35:29.4768219Z Download action repository 'pytorch/pytorch@main' (SHA:7a83cf430e97d83d6fb14880b9049e77ff725685) 2025-09-07T09:35:33.5009320Z Download action repository 'actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065' (SHA:a26af69be951a213d495a4c3e4e4022e16d87065) 2025-09-07T09:35:33.8558658Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722) 2025-09-07T09:35:34.1853389Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076) 2025-09-07T09:35:34.5367966Z Download action repository 'seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-09-07T09:35:34.9314824Z Getting action download info 2025-09-07T09:35:35.0450325Z Download action repository 'actions/checkout@v4' (SHA:08eba0b27e820071cde6df949e0beb9ba4906955) 2025-09-07T09:35:35.3601835Z Getting action download info 2025-09-07T09:35:35.4686210Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e) 2025-09-07T09:35:35.7813737Z Getting action download info 2025-09-07T09:35:35.9045520Z Download action repository 'nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482' (SHA:3e91a01664abd3c5cd539100d10d33b9c5b68482) 2025-09-07T09:35:36.2226363Z Getting action download info 2025-09-07T09:35:36.3436381Z Uses: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/heads/main (93fb23d6fae7c4e82c4239a1033e522088742634) 2025-09-07T09:35:36.3440046Z ##[group] Inputs 2025-09-07T09:35:36.3440410Z build-environment: linux-jammy-cuda12.8-py3.10-gcc9-sm90 2025-09-07T09:35:36.3445925Z test-matrix: {"include": [{"config": "inductor_huggingface_perf_cuda_h100", "shard": 1, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 2, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 3, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 4, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 5, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 1, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 2, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 3, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 4, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 5, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 6, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 7, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 1, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 2, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 3, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 4, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 5, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 6, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 7, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 8, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 9, "num_shards": 9, "runner": "linux.aws.h100"}]} 2025-09-07T09:35:36.3451537Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T09:35:36.3452243Z sync-tag: 2025-09-07T09:35:36.3452981Z timeout-minutes: 1440 2025-09-07T09:35:36.3453198Z use-gha: 2025-09-07T09:35:36.3454065Z dashboard-tag: training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true 2025-09-07T09:35:36.3455165Z s3-bucket: gha-artifacts 2025-09-07T09:35:36.3455385Z aws-role-to-assume: 2025-09-07T09:35:36.3455923Z disable-monitor: false 2025-09-07T09:35:36.3456184Z monitor-log-interval: 15 2025-09-07T09:35:36.3456425Z monitor-data-collect-interval: 4 2025-09-07T09:35:36.3456711Z ##[endgroup] 2025-09-07T09:35:36.3457020Z Complete job name: test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100) 2025-09-07T09:35:36.3927749Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main 2025-09-07T09:35:36.3928411Z with: 2025-09-07T09:35:36.3928907Z github-secret: *** 2025-09-07T09:35:36.3929448Z instructions: All testing is done inside the container, to start an interactive session run: docker exec -it $(docker container ps --format '{{.ID}}') bash 2025-09-07T09:35:36.3930029Z activate-with-label: false 2025-09-07T09:35:36.3930244Z label: with-ssh 2025-09-07T09:35:36.3930432Z remove-existing-keys: true 2025-09-07T09:35:36.3930639Z fail-silently: true 2025-09-07T09:35:36.3931020Z env: 2025-09-07T09:35:36.3931205Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:35:36.3931407Z ##[endgroup] 2025-09-07T09:35:36.4997450Z Please see https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-for-Github-Actions for more info. 2025-09-07T09:35:36.4998310Z Not on pull request and ciflow reference could not be extracted, skipping adding ssh keys 2025-09-07T09:35:36.5176382Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main 2025-09-07T09:35:36.5176727Z with: 2025-09-07T09:35:36.5176903Z no-sudo: true 2025-09-07T09:35:36.5177090Z submodules: recursive 2025-09-07T09:35:36.5177283Z fetch-depth: 0 2025-09-07T09:35:36.5177451Z env: 2025-09-07T09:35:36.5177605Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:35:36.5177820Z ##[endgroup] 2025-09-07T09:35:36.5249996Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T09:35:36.5250742Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T09:35:36.5269374Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:35:36.5269697Z env: 2025-09-07T09:35:36.5269870Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:35:36.5270065Z ##[endgroup] 2025-09-07T09:35:36.5420524Z ##[group]Run actions/checkout@v4 2025-09-07T09:35:36.5420774Z with: 2025-09-07T09:35:36.5420959Z ref: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T09:35:36.5421201Z fetch-depth: 0 2025-09-07T09:35:36.5421381Z submodules: recursive 2025-09-07T09:35:36.5421659Z show-progress: false 2025-09-07T09:35:36.5421852Z repository: pytorch/pytorch 2025-09-07T09:35:36.5422189Z token: *** 2025-09-07T09:35:36.5422623Z ssh-strict: true 2025-09-07T09:35:36.5422801Z ssh-user: git 2025-09-07T09:35:36.5422980Z persist-credentials: true 2025-09-07T09:35:36.5423185Z clean: true 2025-09-07T09:35:36.5423368Z sparse-checkout-cone-mode: true 2025-09-07T09:35:36.5423601Z fetch-tags: false 2025-09-07T09:35:36.5423770Z lfs: false 2025-09-07T09:35:36.5423941Z set-safe-directory: true 2025-09-07T09:35:36.5424136Z env: 2025-09-07T09:35:36.5424379Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:35:36.5424593Z ##[endgroup] 2025-09-07T09:35:36.6420225Z Syncing repository: pytorch/pytorch 2025-09-07T09:35:36.6421651Z ##[group]Getting Git version info 2025-09-07T09:35:36.6422186Z Working directory is '/home/eve/_work/pytorch/pytorch' 2025-09-07T09:35:36.6422691Z [command]/usr/bin/git version 2025-09-07T09:35:36.6422924Z git version 2.50.1 2025-09-07T09:35:36.6440549Z ##[endgroup] 2025-09-07T09:35:36.6451835Z Temporarily overriding HOME='/home/eve/_work/_temp/64698613-376d-4be7-ae0f-2a63af72545a' before making global git config changes 2025-09-07T09:35:36.6452928Z Adding repository directory to the temporary git global config as a safe directory 2025-09-07T09:35:36.6456737Z [command]/usr/bin/git config --global --add safe.directory /home/eve/_work/pytorch/pytorch 2025-09-07T09:35:36.6495872Z Deleting the contents of '/home/eve/_work/pytorch/pytorch' 2025-09-07T09:35:36.6499252Z ##[group]Initializing the repository 2025-09-07T09:35:36.6502146Z [command]/usr/bin/git init /home/eve/_work/pytorch/pytorch 2025-09-07T09:35:36.6552156Z hint: Using 'master' as the name for the initial branch. This default branch name 2025-09-07T09:35:36.6552791Z hint: is subject to change. To configure the initial branch name to use in all 2025-09-07T09:35:36.6553333Z hint: of your new repositories, which will suppress this warning, call: 2025-09-07T09:35:36.6553700Z hint: 2025-09-07T09:35:36.6554000Z hint: git config --global init.defaultBranch 2025-09-07T09:35:36.6554314Z hint: 2025-09-07T09:35:36.6554604Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2025-09-07T09:35:36.6555266Z hint: 'development'. The just-created branch can be renamed via this command: 2025-09-07T09:35:36.6555655Z hint: 2025-09-07T09:35:36.6555853Z hint: git branch -m 2025-09-07T09:35:36.6556081Z hint: 2025-09-07T09:35:36.6556395Z hint: Disable this message with "git config set advice.defaultBranchName false" 2025-09-07T09:35:36.6556961Z Initialized empty Git repository in /home/eve/_work/pytorch/pytorch/.git/ 2025-09-07T09:35:36.6562142Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch 2025-09-07T09:35:36.6681333Z ##[endgroup] 2025-09-07T09:35:36.6682158Z ##[group]Disabling automatic garbage collection 2025-09-07T09:35:36.6684530Z [command]/usr/bin/git config --local gc.auto 0 2025-09-07T09:35:36.6714486Z ##[endgroup] 2025-09-07T09:35:36.6714845Z ##[group]Setting up auth 2025-09-07T09:35:36.6720303Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-09-07T09:35:36.6750387Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-09-07T09:35:36.7008084Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-09-07T09:35:36.7036743Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-09-07T09:35:36.7280916Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-09-07T09:35:36.7321917Z ##[endgroup] 2025-09-07T09:35:36.7322312Z ##[group]Fetching the repository 2025-09-07T09:35:36.7328845Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-09-07T09:36:19.1938841Z From https://github.com/pytorch/pytorch 2025-09-07T09:36:19.1939396Z * [new branch] 160583 -> origin/160583 2025-09-07T09:36:19.1940169Z * [new branch] 2.6.0.dev20241004+ -> origin/2.6.0.dev20241004+ 2025-09-07T09:36:19.1940762Z * [new branch] 5addvllmbuild -> origin/5addvllmbuild 2025-09-07T09:36:19.1943124Z * [new branch] AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest 2025-09-07T09:36:19.1943776Z * [new branch] HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes 2025-09-07T09:36:19.1944359Z * [new branch] ISSUE-154849 -> origin/ISSUE-154849 2025-09-07T09:36:19.1947384Z * [new branch] JackCaoG/dynamo_make_fx_non_core_aten_ops -> origin/JackCaoG/dynamo_make_fx_non_core_aten_ops 2025-09-07T09:36:19.1948717Z * [new branch] NicoshevSVE128 -> origin/NicoshevSVE128 2025-09-07T09:36:19.1950368Z * [new branch] PR-AOTInductorNoneBug -> origin/PR-AOTInductorNoneBug 2025-09-07T09:36:19.1952074Z * [new branch] PR-AOTInductorNoneBugFix -> origin/PR-AOTInductorNoneBugFix 2025-09-07T09:36:19.1953609Z * [new branch] PR-FixConfigsIssue -> origin/PR-FixConfigsIssue 2025-09-07T09:36:19.1955174Z * [new branch] PR-NoneBugFix-viable -> origin/PR-NoneBugFix-viable 2025-09-07T09:36:19.1956889Z * [new branch] PR-ResetToZero -> origin/PR-ResetToZero 2025-09-07T09:36:19.1958506Z * [new branch] Update-Flash-Packaging -> origin/Update-Flash-Packaging 2025-09-07T09:36:19.1959975Z * [new branch] VLA_exp -> origin/VLA_exp 2025-09-07T09:36:19.1962224Z * [new branch] actually-run-mps-aot-inductor -> origin/actually-run-mps-aot-inductor 2025-09-07T09:36:19.1963781Z * [new branch] add-missing-args-normalization -> origin/add-missing-args-normalization 2025-09-07T09:36:19.1965678Z * [new branch] add-user-guide-structure -> origin/add-user-guide-structure 2025-09-07T09:36:19.1967240Z * [new branch] add-vllm-nightly-build -> origin/add-vllm-nightly-build 2025-09-07T09:36:19.1968725Z * [new branch] add_compile_benchmarking -> origin/add_compile_benchmarking 2025-09-07T09:36:19.1970316Z * [new branch] addmm-heuristic -> origin/addmm-heuristic 2025-09-07T09:36:19.1971858Z * [new branch] addsimde -> origin/addsimde 2025-09-07T09:36:19.1973907Z * [new branch] addvllmtest -> origin/addvllmtest 2025-09-07T09:36:19.1976133Z * [new branch] adi/acl_upgrade -> origin/adi/acl_upgrade 2025-09-07T09:36:19.1977687Z * [new branch] adi/test -> origin/adi/test 2025-09-07T09:36:19.1979304Z * [new branch] adi/test_bgemm -> origin/adi/test_bgemm 2025-09-07T09:36:19.1980827Z * [new branch] adi/test_fusions -> origin/adi/test_fusions 2025-09-07T09:36:19.1982505Z * [new branch] adi/test_onednn_v3.9 -> origin/adi/test_onednn_v3.9 2025-09-07T09:36:19.1984373Z * [new branch] adi/test_presve_change -> origin/adi/test_presve_change 2025-09-07T09:36:19.1985865Z * [new branch] adi/test_timm -> origin/adi/test_timm 2025-09-07T09:36:19.1987497Z * [new branch] adi/testpresve_change -> origin/adi/testpresve_change 2025-09-07T09:36:19.1990215Z * [new branch] aditew01/test/vec_bf16 -> origin/aditew01/test/vec_bf16 2025-09-07T09:36:19.1991844Z * [new branch] ah-globalfeedback-hook -> origin/ah-globalfeedback-hook 2025-09-07T09:36:19.1993522Z * [new branch] alt-disable -> origin/alt-disable 2025-09-07T09:36:19.1996064Z * [new branch] angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files 2025-09-07T09:36:19.1997644Z * [new branch] angelayi/aoti_inductor_fx -> origin/angelayi/aoti_inductor_fx 2025-09-07T09:36:19.1999154Z * [new branch] angelayi/benchmark -> origin/angelayi/benchmark 2025-09-07T09:36:19.2000847Z * [new branch] angelayi/benchmark2 -> origin/angelayi/benchmark2 2025-09-07T09:36:19.2002512Z * [new branch] angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization 2025-09-07T09:36:19.2003993Z * [new branch] angelayi/cpp_loader -> origin/angelayi/cpp_loader 2025-09-07T09:36:19.2005835Z * [new branch] angelayi/custom_op_subgraph -> origin/angelayi/custom_op_subgraph 2025-09-07T09:36:19.2007333Z * [new branch] angelayi/customop -> origin/angelayi/customop 2025-09-07T09:36:19.2008886Z * [new branch] angelayi/fake_cache_empty -> origin/angelayi/fake_cache_empty 2025-09-07T09:36:19.2010534Z * [new branch] angelayi/is_symbolic_tracing -> origin/angelayi/is_symbolic_tracing 2025-09-07T09:36:19.2012004Z * [new branch] angelayi/item -> origin/angelayi/item 2025-09-07T09:36:19.2013560Z * [new branch] angelayi/no_so_weight -> origin/angelayi/no_so_weight 2025-09-07T09:36:19.2015380Z * [new branch] angelayi/opoverload -> origin/angelayi/opoverload 2025-09-07T09:36:19.2017037Z * [new branch] angelayi/pattern -> origin/angelayi/pattern 2025-09-07T09:36:19.2018582Z * [new branch] angelayi/pytree -> origin/angelayi/pytree 2025-09-07T09:36:19.2020426Z * [new branch] angelayi/scan_layers -> origin/angelayi/scan_layers 2025-09-07T09:36:19.2022155Z * [new branch] angelayi/symint_input -> origin/angelayi/symint_input 2025-09-07T09:36:19.2023770Z * [new branch] angelayi/test_cpp -> origin/angelayi/test_cpp 2025-09-07T09:36:19.2025450Z * [new branch] angelayi/torch_size -> origin/angelayi/torch_size 2025-09-07T09:36:19.2027158Z * [new branch] aoti-cuda-alloc -> origin/aoti-cuda-alloc 2025-09-07T09:36:19.2028846Z * [new branch] aoti_target_windows -> origin/aoti_target_windows 2025-09-07T09:36:19.2030428Z * [new branch] aoti_weight_sharing -> origin/aoti_weight_sharing 2025-09-07T09:36:19.2032157Z * [new branch] atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124 2025-09-07T09:36:19.2033926Z * [new branch] atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1 2025-09-07T09:36:19.2035499Z * [new branch] atalman-patch-1 -> origin/atalman-patch-1 2025-09-07T09:36:19.2037310Z * [new branch] atalman-patch-3 -> origin/atalman-patch-3 2025-09-07T09:36:19.2038826Z * [new branch] atalman-patch-4 -> origin/atalman-patch-4 2025-09-07T09:36:19.2040534Z * [new branch] atalman-patch-5 -> origin/atalman-patch-5 2025-09-07T09:36:19.2042213Z * [new branch] atalman-patch-6 -> origin/atalman-patch-6 2025-09-07T09:36:19.2043926Z * [new branch] atalman_inductor_2.3.0 -> origin/atalman_inductor_2.3.0 2025-09-07T09:36:19.2045756Z * [new branch] atalman_inductor_2.3.1 -> origin/atalman_inductor_2.3.1 2025-09-07T09:36:19.2047373Z * [new branch] atalman_inductor_2.4.0 -> origin/atalman_inductor_2.4.0 2025-09-07T09:36:19.2049059Z * [new branch] atalman_inductor_2.4.x -> origin/atalman_inductor_2.4.x 2025-09-07T09:36:19.2050820Z * [new branch] autoupdate-transformers-pin-via-pr -> origin/autoupdate-transformers-pin-via-pr 2025-09-07T09:36:19.2052843Z * [new branch] bahuang/dtensor_demo -> origin/bahuang/dtensor_demo 2025-09-07T09:36:19.2054400Z * [new branch] bahuang/test -> origin/bahuang/test 2025-09-07T09:36:19.2057014Z * [new branch] base/1.5 -> origin/base/1.5 2025-09-07T09:36:19.2058744Z * [new branch] batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention 2025-09-07T09:36:19.2060244Z * [new branch] bc-lint-config -> origin/bc-lint-config 2025-09-07T09:36:19.2061929Z * [new branch] bc-lint-test-new-config -> origin/bc-lint-test-new-config 2025-09-07T09:36:19.2063881Z * [new branch] benchmark-updates -> origin/benchmark-updates 2025-09-07T09:36:19.2065887Z * [new branch] benchmarker_compat_with_do_bench -> origin/benchmarker_compat_with_do_bench 2025-09-07T09:36:19.2067516Z * [new branch] benchmarking-script -> origin/benchmarking-script 2025-09-07T09:36:19.2069770Z * [new branch] bertmaher/pinbump26 -> origin/bertmaher/pinbump26 2025-09-07T09:36:19.2072008Z * [new branch] bertrand/cutlass -> origin/bertrand/cutlass 2025-09-07T09:36:19.2074180Z * [new branch] bf/cg-custom-wrapper -> origin/bf/cg-custom-wrapper 2025-09-07T09:36:19.2075968Z * [new branch] bf/cg-or-error -> origin/bf/cg-or-error 2025-09-07T09:36:19.2077594Z * [new branch] bf/cg-remove-check -> origin/bf/cg-remove-check 2025-09-07T09:36:19.2079189Z * [new branch] bf/cg-skip-1-kernel -> origin/bf/cg-skip-1-kernel 2025-09-07T09:36:19.2080651Z * [new branch] bf/cudagraph -> origin/bf/cudagraph 2025-09-07T09:36:19.2082278Z * [new branch] bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation 2025-09-07T09:36:19.2084069Z * [new branch] bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark 2025-09-07T09:36:19.2085411Z * [new branch] bf/cudagraph-partition -> origin/bf/cudagraph-partition 2025-09-07T09:36:19.2087171Z * [new branch] bf/default-recompile-reason -> origin/bf/default-recompile-reason 2025-09-07T09:36:19.2088625Z * [new branch] bf/donated-buffer-bench -> origin/bf/donated-buffer-bench 2025-09-07T09:36:19.2090181Z * [new branch] bf/exp -> origin/bf/exp 2025-09-07T09:36:19.2091706Z * [new branch] bf/pa-non-divisible -> origin/bf/pa-non-divisible 2025-09-07T09:36:19.2093442Z * [new branch] bf/partition-move-cpu -> origin/bf/partition-move-cpu 2025-09-07T09:36:19.2094867Z * [new branch] bf/partition-turn-on -> origin/bf/partition-turn-on 2025-09-07T09:36:19.2096727Z * [new branch] bf/remove-check-55b0c39d -> origin/bf/remove-check-55b0c39d 2025-09-07T09:36:19.2098173Z * [new branch] bf/rope -> origin/bf/rope 2025-09-07T09:36:19.2099956Z * [new branch] bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492 2025-09-07T09:36:19.2101631Z * [new branch] bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb 2025-09-07T09:36:19.2103284Z * [new branch] bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129 2025-09-07T09:36:19.2104800Z * [new branch] bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d 2025-09-07T09:36:19.2106852Z * [new branch] bisect_perf_hf_T5_5268754e -> origin/bisect_perf_hf_T5_5268754e 2025-09-07T09:36:19.2108421Z * [new branch] bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c 2025-09-07T09:36:19.2110091Z * [new branch] bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c 2025-09-07T09:36:19.2111725Z * [new branch] bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f 2025-09-07T09:36:19.2113354Z * [new branch] bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0 2025-09-07T09:36:19.2115119Z * [new branch] bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149 2025-09-07T09:36:19.2116758Z * [new branch] bisect_perf_hf_T5_d65f194a -> origin/bisect_perf_hf_T5_d65f194a 2025-09-07T09:36:19.2118313Z * [new branch] bisect_perf_hf_T5_da94ab0b -> origin/bisect_perf_hf_T5_da94ab0b 2025-09-07T09:36:19.2119911Z * [new branch] bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new 2025-09-07T09:36:19.2121559Z * [new branch] bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8 2025-09-07T09:36:19.2123142Z * [new branch] bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2 2025-09-07T09:36:19.2124826Z * [new branch] bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563 2025-09-07T09:36:19.2127294Z * [new branch] bowbao/bench_updates_stage -> origin/bowbao/bench_updates_stage 2025-09-07T09:36:19.2128795Z * [new branch] bowbao/dort_rewriter -> origin/bowbao/dort_rewriter 2025-09-07T09:36:19.2130278Z * [new branch] bowbao/wip_prs -> origin/bowbao/wip_prs 2025-09-07T09:36:19.2132462Z * [new branch] brister/break_tensorbox -> origin/brister/break_tensorbox 2025-09-07T09:36:19.2133951Z * [new branch] brister/custom_fx_backend -> origin/brister/custom_fx_backend 2025-09-07T09:36:19.2135830Z * [new branch] brister/fx_custom_triton -> origin/brister/fx_custom_triton 2025-09-07T09:36:19.2137292Z * [new branch] brister/tensor_box_output -> origin/brister/tensor_box_output 2025-09-07T09:36:19.2138933Z * [new branch] brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check 2025-09-07T09:36:19.2140546Z * [new branch] c57382a49 -> origin/c57382a49 2025-09-07T09:36:19.2142353Z * [new branch] ca_0431d47eaa -> origin/ca_0431d47eaa 2025-09-07T09:36:19.2143954Z * [new branch] ca_fix_0431d47eaa -> origin/ca_fix_0431d47eaa 2025-09-07T09:36:19.2146966Z * [new branch] camyll/revert-94bc900da97ad7f3c35b3b819bb53b23c74b581a-for-release-2.8 -> origin/camyll/revert-94bc900da97ad7f3c35b3b819bb53b23c74b581a-for-release-2.8 2025-09-07T09:36:19.2148897Z * [new branch] camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push 2025-09-07T09:36:19.2150947Z * [new branch] cherry-pick-149654-by-pytorch_bot_bot_ -> origin/cherry-pick-149654-by-pytorch_bot_bot_ 2025-09-07T09:36:19.2152602Z * [new branch] cherry-pick-151939-by-pytorch_bot_bot_ -> origin/cherry-pick-151939-by-pytorch_bot_bot_ 2025-09-07T09:36:19.2154293Z * [new branch] cherry-pick-154174-by-pytorch_bot_bot_ -> origin/cherry-pick-154174-by-pytorch_bot_bot_ 2025-09-07T09:36:19.2156365Z * [new branch] cherry-pick-156260-by-pytorch_bot_bot_ -> origin/cherry-pick-156260-by-pytorch_bot_bot_ 2025-09-07T09:36:19.2158123Z * [new branch] cherry-pick-157453-by-pytorch_bot_bot_ -> origin/cherry-pick-157453-by-pytorch_bot_bot_ 2025-09-07T09:36:19.2159868Z * [new branch] cherry-pick-157513-by-pytorch_bot_bot_ -> origin/cherry-pick-157513-by-pytorch_bot_bot_ 2025-09-07T09:36:19.2161872Z * [new branch] cherry-pick-157695-by-pytorch_bot_bot_ -> origin/cherry-pick-157695-by-pytorch_bot_bot_ 2025-09-07T09:36:19.2163647Z * [new branch] cherry-pick-157732-by-pytorch_bot_bot_ -> origin/cherry-pick-157732-by-pytorch_bot_bot_ 2025-09-07T09:36:19.2165500Z * [new branch] cherry-pick-158537-by-pytorch_bot_bot_ -> origin/cherry-pick-158537-by-pytorch_bot_bot_ 2025-09-07T09:36:19.2167286Z * [new branch] cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_ 2025-09-07T09:36:19.2169068Z * [new branch] cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_ 2025-09-07T09:36:19.2171214Z * [new branch] chilli/flex_vllm -> origin/chilli/flex_vllm 2025-09-07T09:36:19.2173518Z * [new branch] cleanup-inductor-benchmark-images -> origin/cleanup-inductor-benchmark-images 2025-09-07T09:36:19.2175848Z * [new branch] codex-testing -> origin/codex-testing 2025-09-07T09:36:19.2178454Z * [new branch] codex/add-helper-function-to-sizevars.py -> origin/codex/add-helper-function-to-sizevars.py 2025-09-07T09:36:19.2179987Z * [new branch] codex/add-helper-function-to-sizevars.py_2025-09-05 -> origin/codex/add-helper-function-to-sizevars.py_2025-09-05 2025-09-07T09:36:19.2181367Z * [new branch] codex/add-metadata-field-for-file-path -> origin/codex/add-metadata-field-for-file-path 2025-09-07T09:36:19.2183268Z * [new branch] codex/add-test-for-inductor-local-cache-behavior -> origin/codex/add-test-for-inductor-local-cache-behavior 2025-09-07T09:36:19.2184665Z * [new branch] codex/create-test-for-tensor-memory-leak-in-cudagraph -> origin/codex/create-test-for-tensor-memory-leak-in-cudagraph 2025-09-07T09:36:19.2186512Z * [new branch] codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch 2025-09-07T09:36:19.2187939Z * [new branch] codex/fix-issue-160415-in-pytorch -> origin/codex/fix-issue-160415-in-pytorch 2025-09-07T09:36:19.2189530Z * [new branch] codex/fix-noqengine-quantized-engine-support -> origin/codex/fix-noqengine-quantized-engine-support 2025-09-07T09:36:19.2190967Z * [new branch] codex/fix-pin_memory-error-handling -> origin/codex/fix-pin_memory-error-handling 2025-09-07T09:36:19.2192464Z * [new branch] codex/propose-fix-for-issue-160332 -> origin/codex/propose-fix-for-issue-160332 2025-09-07T09:36:19.2194131Z * [new branch] codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run 2025-09-07T09:36:19.2196105Z * [new branch] codex/remove-allow-untyped-defs-and-fix-type-errors -> origin/codex/remove-allow-untyped-defs-and-fix-type-errors 2025-09-07T09:36:19.2197709Z * [new branch] compile_fsdp2_disable_stream_and_event -> origin/compile_fsdp2_disable_stream_and_event 2025-09-07T09:36:19.2199253Z * [new branch] context_test -> origin/context_test 2025-09-07T09:36:19.2201743Z * [new branch] copilot/fix-157446 -> origin/copilot/fix-157446 2025-09-07T09:36:19.2203254Z * [new branch] copy_graph -> origin/copy_graph 2025-09-07T09:36:19.2205746Z * [new branch] cpio/fix_new_ami_tests -> origin/cpio/fix_new_ami_tests 2025-09-07T09:36:19.2208091Z * [new branch] csl/always_produce_xml -> origin/csl/always_produce_xml 2025-09-07T09:36:19.2209645Z * [new branch] csl/build_test_more_procs -> origin/csl/build_test_more_procs 2025-09-07T09:36:19.2211175Z * [new branch] csl/build_test_more_procs2 -> origin/csl/build_test_more_procs2 2025-09-07T09:36:19.2212729Z * [new branch] csl/disable_flaky_cpp_test -> origin/csl/disable_flaky_cpp_test 2025-09-07T09:36:19.2214235Z * [new branch] csl/disable_periodic_test -> origin/csl/disable_periodic_test 2025-09-07T09:36:19.2215997Z * [new branch] csl/exclude_rocm_viable_strict -> origin/csl/exclude_rocm_viable_strict 2025-09-07T09:36:19.2217459Z * [new branch] csl/katex -> origin/csl/katex 2025-09-07T09:36:19.2219025Z * [new branch] csl/larger_runner -> origin/csl/larger_runner 2025-09-07T09:36:19.2220533Z * [new branch] csl/lintrunner_stuff -> origin/csl/lintrunner_stuff 2025-09-07T09:36:19.2222209Z * [new branch] csl/mps_sharding -> origin/csl/mps_sharding 2025-09-07T09:36:19.2223740Z * [new branch] csl/multistage_docker -> origin/csl/multistage_docker 2025-09-07T09:36:19.2225351Z * [new branch] csl/name_link_check_job -> origin/csl/name_link_check_job 2025-09-07T09:36:19.2227041Z * [new branch] csl/no_keep_goin_rocm -> origin/csl/no_keep_goin_rocm 2025-09-07T09:36:19.2228591Z * [new branch] csl/not_600_timeout -> origin/csl/not_600_timeout 2025-09-07T09:36:19.2230095Z * [new branch] csl/revert_open -> origin/csl/revert_open 2025-09-07T09:36:19.2231648Z * [new branch] csl/skip_build -> origin/csl/skip_build 2025-09-07T09:36:19.2233249Z * [new branch] csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner 2025-09-07T09:36:19.2234751Z * [new branch] csl/win_sccache -> origin/csl/win_sccache 2025-09-07T09:36:19.2236812Z * [new branch] cublasltrelax2 -> origin/cublasltrelax2 2025-09-07T09:36:19.2238416Z * [new branch] cublasrelax2 -> origin/cublasrelax2 2025-09-07T09:36:19.2240127Z * [new branch] cudnnsdparefactor -> origin/cudnnsdparefactor 2025-09-07T09:36:19.2241904Z * [new branch] custom_lowering_dict -> origin/custom_lowering_dict 2025-09-07T09:36:19.2243662Z * [new branch] czhuge_muon_dev -> origin/czhuge_muon_dev 2025-09-07T09:36:19.2246312Z * [new branch] d4l3k/delete_hook -> origin/d4l3k/delete_hook 2025-09-07T09:36:19.2248113Z * [new branch] dcp_zoc -> origin/dcp_zoc 2025-09-07T09:36:19.2249916Z * [new branch] debug-guard -> origin/debug-guard 2025-09-07T09:36:19.2251686Z * [new branch] delete-quant-docs -> origin/delete-quant-docs 2025-09-07T09:36:19.2256798Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.2 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.2 2025-09-07T09:36:19.2258547Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.3 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.3 2025-09-07T09:36:19.2260308Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.4 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.4 2025-09-07T09:36:19.2262266Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.56.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.56.0 2025-09-07T09:36:19.2263662Z * [new branch] dependabot/pip/dot-ci/docker/protobuf-5.29.5 -> origin/dependabot/pip/dot-ci/docker/protobuf-5.29.5 2025-09-07T09:36:19.2266876Z * [new branch] dependabot/pip/dot-github/requirements/protobuf-5.29.5 -> origin/dependabot/pip/dot-github/requirements/protobuf-5.29.5 2025-09-07T09:36:19.2268860Z * [new branch] desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper 2025-09-07T09:36:19.2270419Z * [new branch] desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64 2025-09-07T09:36:19.2273446Z * [new branch] dev/joona/MPSNDArrayAdd -> origin/dev/joona/MPSNDArrayAdd 2025-09-07T09:36:19.2275259Z * [new branch] dev/joona/Unranked -> origin/dev/joona/Unranked 2025-09-07T09:36:19.2277215Z * [new branch] dev/joona/cat -> origin/dev/joona/cat 2025-09-07T09:36:19.2278887Z * [new branch] dev/joona/cat_remove_graph -> origin/dev/joona/cat_remove_graph 2025-09-07T09:36:19.2280453Z * [new branch] dev/joona/embeddingbag -> origin/dev/joona/embeddingbag 2025-09-07T09:36:19.2282276Z * [new branch] dev/joona/getTensorsString -> origin/dev/joona/getTensorsString 2025-09-07T09:36:19.2284061Z * [new branch] dev/joona/maxpool2dwithindices_errmsg -> origin/dev/joona/maxpool2dwithindices_errmsg 2025-09-07T09:36:19.2286149Z * [new branch] dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14 2025-09-07T09:36:19.2287792Z * [new branch] dev/joona/sdpa -> origin/dev/joona/sdpa 2025-09-07T09:36:19.2289805Z * [new branch] dev/joona/topk_newapi -> origin/dev/joona/topk_newapi 2025-09-07T09:36:19.2291395Z * [new branch] dev/joona/type_inf -> origin/dev/joona/type_inf 2025-09-07T09:36:19.2293091Z * [new branch] dev/joona/upsize3d -> origin/dev/joona/upsize3d 2025-09-07T09:36:19.2294886Z * [new branch] disable -> origin/disable 2025-09-07T09:36:19.2296955Z * [new branch] e2e-baseline -> origin/e2e-baseline 2025-09-07T09:36:19.2298727Z * [new branch] eigen_for_sparse_addmm_v2 -> origin/eigen_for_sparse_addmm_v2 2025-09-07T09:36:19.2300996Z * [new branch] embg/test_inductor_ci_128B -> origin/embg/test_inductor_ci_128B 2025-09-07T09:36:19.2302744Z * [new branch] embg/test_inductor_ci_base -> origin/embg/test_inductor_ci_base 2025-09-07T09:36:19.2304322Z * [new branch] embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control 2025-09-07T09:36:19.2306216Z * [new branch] embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B 2025-09-07T09:36:19.2307692Z * [new branch] embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B 2025-09-07T09:36:19.2309558Z * [new branch] eqy-patch-1 -> origin/eqy-patch-1 2025-09-07T09:36:19.2311305Z * [new branch] eqy-patch-2 -> origin/eqy-patch-2 2025-09-07T09:36:19.2313019Z * [new branch] eqy-patch-3 -> origin/eqy-patch-3 2025-09-07T09:36:19.2314750Z * [new branch] eqy-patch-4 -> origin/eqy-patch-4 2025-09-07T09:36:19.2316982Z * [new branch] example-convert-torch.nn -> origin/example-convert-torch.nn 2025-09-07T09:36:19.2319326Z * [new branch] exclamaforte/add-contiguous-threshold -> origin/exclamaforte/add-contiguous-threshold 2025-09-07T09:36:19.2320749Z * [new branch] exclamaforte/amd-ma -> origin/exclamaforte/amd-ma 2025-09-07T09:36:19.2322284Z * [new branch] exclamaforte/bump-transformer-version -> origin/exclamaforte/bump-transformer-version 2025-09-07T09:36:19.2324038Z * [new branch] exclamaforte/clear-feedback-savers -> origin/exclamaforte/clear-feedback-savers 2025-09-07T09:36:19.2325604Z * [new branch] exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run 2025-09-07T09:36:19.2327236Z * [new branch] exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor 2025-09-07T09:36:19.2328819Z * [new branch] exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion 2025-09-07T09:36:19.2330377Z * [new branch] exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning 2025-09-07T09:36:19.2331955Z * [new branch] exclamaforte/fix-exhuastive-autotuning-reland -> origin/exclamaforte/fix-exhuastive-autotuning-reland 2025-09-07T09:36:19.2333412Z * [new branch] exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg 2025-09-07T09:36:19.2335083Z * [new branch] exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run 2025-09-07T09:36:19.2336668Z * [new branch] exclamaforte/fusion-data -> origin/exclamaforte/fusion-data 2025-09-07T09:36:19.2338236Z * [new branch] exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run 2025-09-07T09:36:19.2339721Z * [new branch] exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model 2025-09-07T09:36:19.2341123Z * [new branch] exclamaforte/gemm-model -> origin/exclamaforte/gemm-model 2025-09-07T09:36:19.2342964Z * [new branch] exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection 2025-09-07T09:36:19.2344337Z * [new branch] exclamaforte/gemm-to-amd -> origin/exclamaforte/gemm-to-amd 2025-09-07T09:36:19.2346189Z * [new branch] exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model 2025-09-07T09:36:19.2347759Z * [new branch] exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor 2025-09-07T09:36:19.2349279Z * [new branch] exclamaforte/max-autotune-ieee -> origin/exclamaforte/max-autotune-ieee 2025-09-07T09:36:19.2350788Z * [new branch] exclamaforte/memory-counter -> origin/exclamaforte/memory-counter 2025-09-07T09:36:19.2352311Z * [new branch] exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo 2025-09-07T09:36:19.2353836Z * [new branch] exclamaforte/profiler-combo -> origin/exclamaforte/profiler-combo 2025-09-07T09:36:19.2355555Z * [new branch] exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode 2025-09-07T09:36:19.2357244Z * [new branch] exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs 2025-09-07T09:36:19.2358733Z * [new branch] exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2 2025-09-07T09:36:19.2361040Z * [new branch] exclamforte/gemm-model-final -> origin/exclamforte/gemm-model-final 2025-09-07T09:36:19.2362844Z * [new branch] exec -> origin/exec 2025-09-07T09:36:19.2364636Z * [new branch] executorch-module-shim -> origin/executorch-module-shim 2025-09-07T09:36:19.2366935Z * [new branch] experimental-mosaic -> origin/experimental-mosaic 2025-09-07T09:36:19.2368780Z * [new branch] export-D58091437 -> origin/export-D58091437 2025-09-07T09:36:19.2370752Z * [new branch] export-D61047529 -> origin/export-D61047529 2025-09-07T09:36:19.2372502Z * [new branch] export-D70112642 -> origin/export-D70112642 2025-09-07T09:36:19.2374357Z * [new branch] export-D71412006 -> origin/export-D71412006 2025-09-07T09:36:19.2377035Z * [new branch] export-D73042989 -> origin/export-D73042989 2025-09-07T09:36:19.2378724Z * [new branch] export-D75183591 -> origin/export-D75183591 2025-09-07T09:36:19.2380559Z * [new branch] export-D75617432 -> origin/export-D75617432 2025-09-07T09:36:19.2382466Z * [new branch] export-D75659965 -> origin/export-D75659965 2025-09-07T09:36:19.2384278Z * [new branch] export-D76080931 -> origin/export-D76080931 2025-09-07T09:36:19.2386324Z * [new branch] export-D76797250 -> origin/export-D76797250 2025-09-07T09:36:19.2388059Z * [new branch] export-D76885271 -> origin/export-D76885271 2025-09-07T09:36:19.2389891Z * [new branch] export-D76885620 -> origin/export-D76885620 2025-09-07T09:36:19.2391749Z * [new branch] export-D76936623 -> origin/export-D76936623 2025-09-07T09:36:19.2393648Z * [new branch] export-D76958268 -> origin/export-D76958268 2025-09-07T09:36:19.2395466Z * [new branch] export-D78375400 -> origin/export-D78375400 2025-09-07T09:36:19.2397594Z * [new branch] export-D78431305 -> origin/export-D78431305 2025-09-07T09:36:19.2399400Z * [new branch] export-D78580107 -> origin/export-D78580107 2025-09-07T09:36:19.2401235Z * [new branch] export-D78822171 -> origin/export-D78822171 2025-09-07T09:36:19.2403537Z * [new branch] export-D78822351 -> origin/export-D78822351 2025-09-07T09:36:19.2405605Z * [new branch] export-D78822507 -> origin/export-D78822507 2025-09-07T09:36:19.2407556Z * [new branch] export-D78826994 -> origin/export-D78826994 2025-09-07T09:36:19.2409608Z * [new branch] export-D78894324 -> origin/export-D78894324 2025-09-07T09:36:19.2411353Z * [new branch] export-D78929245 -> origin/export-D78929245 2025-09-07T09:36:19.2413173Z * [new branch] export-D78934925 -> origin/export-D78934925 2025-09-07T09:36:19.2414740Z * [new branch] export-D78953203 -> origin/export-D78953203 2025-09-07T09:36:19.2416715Z * [new branch] export-D78953229 -> origin/export-D78953229 2025-09-07T09:36:19.2418336Z * [new branch] export-D78957093 -> origin/export-D78957093 2025-09-07T09:36:19.2420035Z * [new branch] export-D78957389 -> origin/export-D78957389 2025-09-07T09:36:19.2422090Z * [new branch] export-D78996107 -> origin/export-D78996107 2025-09-07T09:36:19.2423636Z * [new branch] export-D79026433 -> origin/export-D79026433 2025-09-07T09:36:19.2425544Z * [new branch] export-D79230339 -> origin/export-D79230339 2025-09-07T09:36:19.2427332Z * [new branch] export-D79319835 -> origin/export-D79319835 2025-09-07T09:36:19.2428936Z * [new branch] export-D79328456 -> origin/export-D79328456 2025-09-07T09:36:19.2451237Z * [new branch] export-D79534608 -> origin/export-D79534608 2025-09-07T09:36:19.2451861Z * [new branch] export-D79785974 -> origin/export-D79785974 2025-09-07T09:36:19.2452286Z * [new branch] export-D80025417 -> origin/export-D80025417 2025-09-07T09:36:19.2452681Z * [new branch] export-D80120333 -> origin/export-D80120333 2025-09-07T09:36:19.2453045Z * [new branch] export-D80214882 -> origin/export-D80214882 2025-09-07T09:36:19.2453419Z * [new branch] export-D80319069 -> origin/export-D80319069 2025-09-07T09:36:19.2453778Z * [new branch] export-D80321215 -> origin/export-D80321215 2025-09-07T09:36:19.2454377Z * [new branch] export-D80503451 -> origin/export-D80503451 2025-09-07T09:36:19.2454768Z * [new branch] export-D80771648 -> origin/export-D80771648 2025-09-07T09:36:19.2455372Z * [new branch] export-D80823877 -> origin/export-D80823877 2025-09-07T09:36:19.2455737Z * [new branch] export-D80948073 -> origin/export-D80948073 2025-09-07T09:36:19.2456101Z * [new branch] export-D80958642 -> origin/export-D80958642 2025-09-07T09:36:19.2456472Z * [new branch] export-D80970483 -> origin/export-D80970483 2025-09-07T09:36:19.2456827Z * [new branch] export-D81054193 -> origin/export-D81054193 2025-09-07T09:36:19.2457182Z * [new branch] export-D81060182 -> origin/export-D81060182 2025-09-07T09:36:19.2458626Z * [new branch] export-D81078973 -> origin/export-D81078973 2025-09-07T09:36:19.2460306Z * [new branch] export-D81204584 -> origin/export-D81204584 2025-09-07T09:36:19.2462247Z * [new branch] export-D81284190 -> origin/export-D81284190 2025-09-07T09:36:19.2464037Z * [new branch] export-D81299840 -> origin/export-D81299840 2025-09-07T09:36:19.2466165Z * [new branch] export-D81429090 -> origin/export-D81429090 2025-09-07T09:36:19.2467903Z * [new branch] export-D81698719 -> origin/export-D81698719 2025-09-07T09:36:19.2469807Z * [new branch] export-D81747409 -> origin/export-D81747409 2025-09-07T09:36:19.2471798Z * [new branch] exported-model-train-idempotent -> origin/exported-model-train-idempotent 2025-09-07T09:36:19.2474045Z * [new branch] ezyang/wip-aot-descriptors -> origin/ezyang/wip-aot-descriptors 2025-09-07T09:36:19.2476058Z * [new branch] fa_u8_brgemm -> origin/fa_u8_brgemm 2025-09-07T09:36:19.2477825Z * [new branch] fastmath_baseline -> origin/fastmath_baseline 2025-09-07T09:36:19.2480174Z * [new branch] fbcode/warm -> origin/fbcode/warm 2025-09-07T09:36:19.2482053Z * [new branch] fca -> origin/fca 2025-09-07T09:36:19.2483741Z * [new branch] fca2_ca5984c -> origin/fca2_ca5984c 2025-09-07T09:36:19.2485875Z * [new branch] fca5 -> origin/fca5 2025-09-07T09:36:19.2488214Z * [new branch] feature/function-numa-binding -> origin/feature/function-numa-binding 2025-09-07T09:36:19.2489754Z * [new branch] feature/function-numa-binding-take2 -> origin/feature/function-numa-binding-take2 2025-09-07T09:36:19.2491189Z * [new branch] feature/numa-nproc-fix -> origin/feature/numa-nproc-fix 2025-09-07T09:36:19.2492755Z * [new branch] feature/numa-signpost-serialize -> origin/feature/numa-signpost-serialize 2025-09-07T09:36:19.2494248Z * [new branch] feature/parallel-numa-binding -> origin/feature/parallel-numa-binding 2025-09-07T09:36:19.2496822Z * [new branch] fengyuan/external-proj -> origin/fengyuan/external-proj 2025-09-07T09:36:19.2498407Z * [new branch] fengyuan/out-of-tree-xpu-ops-improve-test -> origin/fengyuan/out-of-tree-xpu-ops-improve-test 2025-09-07T09:36:19.2499916Z * [new branch] fengyuan/out-of-tree-xpu-ops-remove-dtype -> origin/fengyuan/out-of-tree-xpu-ops-remove-dtype 2025-09-07T09:36:19.2501258Z * [new branch] fengyuan/test-xpu -> origin/fengyuan/test-xpu 2025-09-07T09:36:19.2503610Z * [new branch] ffast_math_baseline -> origin/ffast_math_baseline 2025-09-07T09:36:19.2505453Z * [new branch] ffast_math_target -> origin/ffast_math_target 2025-09-07T09:36:19.2507899Z * [new branch] findhao/base_commit -> origin/findhao/base_commit 2025-09-07T09:36:19.2509635Z * [new branch] findhao/base_commit1 -> origin/findhao/base_commit1 2025-09-07T09:36:19.2510933Z * [new branch] findhao/multistream2 -> origin/findhao/multistream2 2025-09-07T09:36:19.2512441Z * [new branch] findhao/multistream5 -> origin/findhao/multistream5 2025-09-07T09:36:19.2513849Z * [new branch] findhao/multistream6 -> origin/findhao/multistream6 2025-09-07T09:36:19.2515685Z * [new branch] findhao/operatorbench3 -> origin/findhao/operatorbench3 2025-09-07T09:36:19.2517229Z * [new branch] findhao/operatorbench5 -> origin/findhao/operatorbench5 2025-09-07T09:36:19.2518737Z * [new branch] findhao/tritonparse -> origin/findhao/tritonparse 2025-09-07T09:36:19.2520574Z * [new branch] fix -> origin/fix 2025-09-07T09:36:19.2522395Z * [new branch] fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format 2025-09-07T09:36:19.2524094Z * [new branch] fix-config-ignore -> origin/fix-config-ignore 2025-09-07T09:36:19.2526123Z * [new branch] fix-dict-guard -> origin/fix-dict-guard 2025-09-07T09:36:19.2528007Z * [new branch] fix-inductor-periodic-0528 -> origin/fix-inductor-periodic-0528 2025-09-07T09:36:19.2529699Z * [new branch] fix-mps-benchmark -> origin/fix-mps-benchmark 2025-09-07T09:36:19.2531571Z * [new branch] fix-rlease-feature-template -> origin/fix-rlease-feature-template 2025-09-07T09:36:19.2533328Z * [new branch] fix-run-condition-upload-results -> origin/fix-run-condition-upload-results 2025-09-07T09:36:19.2535119Z * [new branch] fix-torchbench -> origin/fix-torchbench 2025-09-07T09:36:19.2537065Z * [new branch] fix_153389 -> origin/fix_153389 2025-09-07T09:36:19.2538862Z * [new branch] fix_fsdp_rs_bucket2 -> origin/fix_fsdp_rs_bucket2 2025-09-07T09:36:19.2540671Z * [new branch] fix_inductor_peridic_tests -> origin/fix_inductor_peridic_tests 2025-09-07T09:36:19.2542433Z * [new branch] fix_ubn_159469 -> origin/fix_ubn_159469 2025-09-07T09:36:19.2544296Z * [new branch] fixes-triage -> origin/fixes-triage 2025-09-07T09:36:19.2546284Z * [new branch] fixflashinfer -> origin/fixflashinfer 2025-09-07T09:36:19.2548061Z * [new branch] flash_decoding_cpu -> origin/flash_decoding_cpu 2025-09-07T09:36:19.2549871Z * [new branch] flex-flash -> origin/flex-flash 2025-09-07T09:36:19.2551580Z * [new branch] flex-lowering -> origin/flex-lowering 2025-09-07T09:36:19.2553285Z * [new branch] flex-warning -> origin/flex-warning 2025-09-07T09:36:19.2555283Z * [new branch] flex_attention_functorch_grad -> origin/flex_attention_functorch_grad 2025-09-07T09:36:19.2557284Z * [new branch] flex_flash -> origin/flex_flash 2025-09-07T09:36:19.2559086Z * [new branch] flexdecode-gqa-groups -> origin/flexdecode-gqa-groups 2025-09-07T09:36:19.2561657Z * [new branch] fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule 2025-09-07T09:36:19.2563503Z * [new branch] fsdp2_trace_rules -> origin/fsdp2_trace_rules 2025-09-07T09:36:19.2565474Z * [new branch] fsdpv2_3d -> origin/fsdpv2_3d 2025-09-07T09:36:19.2567505Z * [new branch] fsdpv2_3d_m1 -> origin/fsdpv2_3d_m1 2025-09-07T09:36:19.2569349Z * [new branch] fx_cpp -> origin/fx_cpp 2025-09-07T09:36:19.2571706Z * [new branch] fy/fix-win -> origin/fy/fix-win 2025-09-07T09:36:19.2575313Z * [new branch] gh/AlnisM/1/base -> origin/gh/AlnisM/1/base 2025-09-07T09:36:19.2577093Z * [new branch] gh/AlnisM/1/head -> origin/gh/AlnisM/1/head 2025-09-07T09:36:19.2579581Z * [new branch] gh/CaoE/2/base -> origin/gh/CaoE/2/base 2025-09-07T09:36:19.2581079Z * [new branch] gh/CaoE/2/head -> origin/gh/CaoE/2/head 2025-09-07T09:36:19.2582760Z * [new branch] gh/CaoE/2/orig -> origin/gh/CaoE/2/orig 2025-09-07T09:36:19.2585783Z * [new branch] gh/ColinPeppler/79/base -> origin/gh/ColinPeppler/79/base 2025-09-07T09:36:19.2587437Z * [new branch] gh/ColinPeppler/79/head -> origin/gh/ColinPeppler/79/head 2025-09-07T09:36:19.2589026Z * [new branch] gh/ColinPeppler/79/orig -> origin/gh/ColinPeppler/79/orig 2025-09-07T09:36:19.2591394Z * [new branch] gh/ColinPeppler/80/base -> origin/gh/ColinPeppler/80/base 2025-09-07T09:36:19.2593113Z * [new branch] gh/ColinPeppler/80/head -> origin/gh/ColinPeppler/80/head 2025-09-07T09:36:19.2594616Z * [new branch] gh/ColinPeppler/80/orig -> origin/gh/ColinPeppler/80/orig 2025-09-07T09:36:19.2597684Z * [new branch] gh/EikanWang/67/base -> origin/gh/EikanWang/67/base 2025-09-07T09:36:19.2599291Z * [new branch] gh/EikanWang/67/head -> origin/gh/EikanWang/67/head 2025-09-07T09:36:19.2601580Z * [new branch] gh/EikanWang/80/base -> origin/gh/EikanWang/80/base 2025-09-07T09:36:19.2603161Z * [new branch] gh/EikanWang/80/head -> origin/gh/EikanWang/80/head 2025-09-07T09:36:19.2604664Z * [new branch] gh/EikanWang/80/orig -> origin/gh/EikanWang/80/orig 2025-09-07T09:36:19.2607204Z * [new branch] gh/EikanWang/81/base -> origin/gh/EikanWang/81/base 2025-09-07T09:36:19.2608749Z * [new branch] gh/EikanWang/81/head -> origin/gh/EikanWang/81/head 2025-09-07T09:36:19.2610266Z * [new branch] gh/EikanWang/81/orig -> origin/gh/EikanWang/81/orig 2025-09-07T09:36:19.2612405Z * [new branch] gh/EikanWang/82/base -> origin/gh/EikanWang/82/base 2025-09-07T09:36:19.2613951Z * [new branch] gh/EikanWang/82/head -> origin/gh/EikanWang/82/head 2025-09-07T09:36:19.2615857Z * [new branch] gh/EikanWang/82/orig -> origin/gh/EikanWang/82/orig 2025-09-07T09:36:19.2618850Z * [new branch] gh/Gasoonjia/1/base -> origin/gh/Gasoonjia/1/base 2025-09-07T09:36:19.2620387Z * [new branch] gh/Gasoonjia/1/head -> origin/gh/Gasoonjia/1/head 2025-09-07T09:36:19.2623343Z * [new branch] gh/H-Huang/131/base -> origin/gh/H-Huang/131/base 2025-09-07T09:36:19.2624881Z * [new branch] gh/H-Huang/131/head -> origin/gh/H-Huang/131/head 2025-09-07T09:36:19.2626720Z * [new branch] gh/H-Huang/131/orig -> origin/gh/H-Huang/131/orig 2025-09-07T09:36:19.2628940Z * [new branch] gh/H-Huang/132/base -> origin/gh/H-Huang/132/base 2025-09-07T09:36:19.2630557Z * [new branch] gh/H-Huang/132/head -> origin/gh/H-Huang/132/head 2025-09-07T09:36:19.2632092Z * [new branch] gh/H-Huang/132/orig -> origin/gh/H-Huang/132/orig 2025-09-07T09:36:19.2635859Z * [new branch] gh/H-Huang/180/base -> origin/gh/H-Huang/180/base 2025-09-07T09:36:19.2637497Z * [new branch] gh/H-Huang/180/head -> origin/gh/H-Huang/180/head 2025-09-07T09:36:19.2639042Z * [new branch] gh/H-Huang/180/orig -> origin/gh/H-Huang/180/orig 2025-09-07T09:36:19.2641216Z * [new branch] gh/H-Huang/182/base -> origin/gh/H-Huang/182/base 2025-09-07T09:36:19.2642770Z * [new branch] gh/H-Huang/182/head -> origin/gh/H-Huang/182/head 2025-09-07T09:36:19.2644312Z * [new branch] gh/H-Huang/182/orig -> origin/gh/H-Huang/182/orig 2025-09-07T09:36:19.2647133Z * [new branch] gh/H-Huang/187/base -> origin/gh/H-Huang/187/base 2025-09-07T09:36:19.2648528Z * [new branch] gh/H-Huang/187/head -> origin/gh/H-Huang/187/head 2025-09-07T09:36:19.2649969Z * [new branch] gh/H-Huang/187/orig -> origin/gh/H-Huang/187/orig 2025-09-07T09:36:19.2652197Z * [new branch] gh/H-Huang/202/base -> origin/gh/H-Huang/202/base 2025-09-07T09:36:19.2653911Z * [new branch] gh/H-Huang/202/head -> origin/gh/H-Huang/202/head 2025-09-07T09:36:19.2655465Z * [new branch] gh/H-Huang/202/orig -> origin/gh/H-Huang/202/orig 2025-09-07T09:36:19.2657739Z * [new branch] gh/H-Huang/203/base -> origin/gh/H-Huang/203/base 2025-09-07T09:36:19.2659396Z * [new branch] gh/H-Huang/203/head -> origin/gh/H-Huang/203/head 2025-09-07T09:36:19.2661122Z * [new branch] gh/H-Huang/203/orig -> origin/gh/H-Huang/203/orig 2025-09-07T09:36:19.2663529Z * [new branch] gh/H-Huang/204/base -> origin/gh/H-Huang/204/base 2025-09-07T09:36:19.2665180Z * [new branch] gh/H-Huang/204/head -> origin/gh/H-Huang/204/head 2025-09-07T09:36:19.2666876Z * [new branch] gh/H-Huang/204/orig -> origin/gh/H-Huang/204/orig 2025-09-07T09:36:19.2669176Z * [new branch] gh/H-Huang/205/base -> origin/gh/H-Huang/205/base 2025-09-07T09:36:19.2670706Z * [new branch] gh/H-Huang/205/head -> origin/gh/H-Huang/205/head 2025-09-07T09:36:19.2672263Z * [new branch] gh/H-Huang/205/orig -> origin/gh/H-Huang/205/orig 2025-09-07T09:36:19.2674422Z * [new branch] gh/H-Huang/206/base -> origin/gh/H-Huang/206/base 2025-09-07T09:36:19.2676335Z * [new branch] gh/H-Huang/206/head -> origin/gh/H-Huang/206/head 2025-09-07T09:36:19.2677844Z * [new branch] gh/H-Huang/206/orig -> origin/gh/H-Huang/206/orig 2025-09-07T09:36:19.2680037Z * [new branch] gh/H-Huang/207/base -> origin/gh/H-Huang/207/base 2025-09-07T09:36:19.2681564Z * [new branch] gh/H-Huang/207/head -> origin/gh/H-Huang/207/head 2025-09-07T09:36:19.2683131Z * [new branch] gh/H-Huang/207/orig -> origin/gh/H-Huang/207/orig 2025-09-07T09:36:19.2685503Z * [new branch] gh/H-Huang/208/base -> origin/gh/H-Huang/208/base 2025-09-07T09:36:19.2687102Z * [new branch] gh/H-Huang/208/head -> origin/gh/H-Huang/208/head 2025-09-07T09:36:19.2688731Z * [new branch] gh/H-Huang/208/orig -> origin/gh/H-Huang/208/orig 2025-09-07T09:36:19.2690797Z * [new branch] gh/H-Huang/209/base -> origin/gh/H-Huang/209/base 2025-09-07T09:36:19.2692449Z * [new branch] gh/H-Huang/209/head -> origin/gh/H-Huang/209/head 2025-09-07T09:36:19.2693972Z * [new branch] gh/H-Huang/209/orig -> origin/gh/H-Huang/209/orig 2025-09-07T09:36:19.2696460Z * [new branch] gh/H-Huang/210/base -> origin/gh/H-Huang/210/base 2025-09-07T09:36:19.2698013Z * [new branch] gh/H-Huang/210/head -> origin/gh/H-Huang/210/head 2025-09-07T09:36:19.2699511Z * [new branch] gh/H-Huang/210/orig -> origin/gh/H-Huang/210/orig 2025-09-07T09:36:19.2701776Z * [new branch] gh/H-Huang/211/base -> origin/gh/H-Huang/211/base 2025-09-07T09:36:19.2703393Z * [new branch] gh/H-Huang/211/head -> origin/gh/H-Huang/211/head 2025-09-07T09:36:19.2704856Z * [new branch] gh/H-Huang/211/orig -> origin/gh/H-Huang/211/orig 2025-09-07T09:36:19.2707453Z * [new branch] gh/H-Huang/212/base -> origin/gh/H-Huang/212/base 2025-09-07T09:36:19.2708981Z * [new branch] gh/H-Huang/212/head -> origin/gh/H-Huang/212/head 2025-09-07T09:36:19.2710724Z * [new branch] gh/H-Huang/212/orig -> origin/gh/H-Huang/212/orig 2025-09-07T09:36:19.2712908Z * [new branch] gh/H-Huang/213/base -> origin/gh/H-Huang/213/base 2025-09-07T09:36:19.2714528Z * [new branch] gh/H-Huang/213/head -> origin/gh/H-Huang/213/head 2025-09-07T09:36:19.2716279Z * [new branch] gh/H-Huang/213/orig -> origin/gh/H-Huang/213/orig 2025-09-07T09:36:19.2718398Z * [new branch] gh/H-Huang/214/base -> origin/gh/H-Huang/214/base 2025-09-07T09:36:19.2719959Z * [new branch] gh/H-Huang/214/head -> origin/gh/H-Huang/214/head 2025-09-07T09:36:19.2721484Z * [new branch] gh/H-Huang/214/orig -> origin/gh/H-Huang/214/orig 2025-09-07T09:36:19.2724179Z * [new branch] gh/IvanKobzarev/112/base -> origin/gh/IvanKobzarev/112/base 2025-09-07T09:36:19.2726262Z * [new branch] gh/IvanKobzarev/112/head -> origin/gh/IvanKobzarev/112/head 2025-09-07T09:36:19.2727836Z * [new branch] gh/IvanKobzarev/112/orig -> origin/gh/IvanKobzarev/112/orig 2025-09-07T09:36:19.2730081Z * [new branch] gh/IvanKobzarev/115/base -> origin/gh/IvanKobzarev/115/base 2025-09-07T09:36:19.2731711Z * [new branch] gh/IvanKobzarev/115/head -> origin/gh/IvanKobzarev/115/head 2025-09-07T09:36:19.2733337Z * [new branch] gh/IvanKobzarev/115/orig -> origin/gh/IvanKobzarev/115/orig 2025-09-07T09:36:19.2736174Z * [new branch] gh/IvanKobzarev/116/base -> origin/gh/IvanKobzarev/116/base 2025-09-07T09:36:19.2737770Z * [new branch] gh/IvanKobzarev/116/head -> origin/gh/IvanKobzarev/116/head 2025-09-07T09:36:19.2739232Z * [new branch] gh/IvanKobzarev/116/orig -> origin/gh/IvanKobzarev/116/orig 2025-09-07T09:36:19.2741622Z * [new branch] gh/IvanKobzarev/118/base -> origin/gh/IvanKobzarev/118/base 2025-09-07T09:36:19.2743373Z * [new branch] gh/IvanKobzarev/118/head -> origin/gh/IvanKobzarev/118/head 2025-09-07T09:36:19.2744834Z * [new branch] gh/IvanKobzarev/118/orig -> origin/gh/IvanKobzarev/118/orig 2025-09-07T09:36:19.2747501Z * [new branch] gh/IvanKobzarev/126/base -> origin/gh/IvanKobzarev/126/base 2025-09-07T09:36:19.2749140Z * [new branch] gh/IvanKobzarev/126/head -> origin/gh/IvanKobzarev/126/head 2025-09-07T09:36:19.2750660Z * [new branch] gh/IvanKobzarev/126/orig -> origin/gh/IvanKobzarev/126/orig 2025-09-07T09:36:19.2752896Z * [new branch] gh/IvanKobzarev/127/base -> origin/gh/IvanKobzarev/127/base 2025-09-07T09:36:19.2754716Z * [new branch] gh/IvanKobzarev/127/head -> origin/gh/IvanKobzarev/127/head 2025-09-07T09:36:19.2756563Z * [new branch] gh/IvanKobzarev/127/orig -> origin/gh/IvanKobzarev/127/orig 2025-09-07T09:36:19.2758716Z * [new branch] gh/IvanKobzarev/128/base -> origin/gh/IvanKobzarev/128/base 2025-09-07T09:36:19.2760329Z * [new branch] gh/IvanKobzarev/128/head -> origin/gh/IvanKobzarev/128/head 2025-09-07T09:36:19.2761832Z * [new branch] gh/IvanKobzarev/128/orig -> origin/gh/IvanKobzarev/128/orig 2025-09-07T09:36:19.2764128Z * [new branch] gh/IvanKobzarev/132/base -> origin/gh/IvanKobzarev/132/base 2025-09-07T09:36:19.2766038Z * [new branch] gh/IvanKobzarev/132/head -> origin/gh/IvanKobzarev/132/head 2025-09-07T09:36:19.2767561Z * [new branch] gh/IvanKobzarev/132/orig -> origin/gh/IvanKobzarev/132/orig 2025-09-07T09:36:19.2770196Z * [new branch] gh/IvanKobzarev/133/base -> origin/gh/IvanKobzarev/133/base 2025-09-07T09:36:19.2771951Z * [new branch] gh/IvanKobzarev/133/head -> origin/gh/IvanKobzarev/133/head 2025-09-07T09:36:19.2773490Z * [new branch] gh/IvanKobzarev/133/orig -> origin/gh/IvanKobzarev/133/orig 2025-09-07T09:36:19.2776214Z * [new branch] gh/IvanKobzarev/134/base -> origin/gh/IvanKobzarev/134/base 2025-09-07T09:36:19.2777656Z * [new branch] gh/IvanKobzarev/134/head -> origin/gh/IvanKobzarev/134/head 2025-09-07T09:36:19.2779099Z * [new branch] gh/IvanKobzarev/134/orig -> origin/gh/IvanKobzarev/134/orig 2025-09-07T09:36:19.2781666Z * [new branch] gh/IvanKobzarev/135/base -> origin/gh/IvanKobzarev/135/base 2025-09-07T09:36:19.2783352Z * [new branch] gh/IvanKobzarev/135/head -> origin/gh/IvanKobzarev/135/head 2025-09-07T09:36:19.2784836Z * [new branch] gh/IvanKobzarev/135/orig -> origin/gh/IvanKobzarev/135/orig 2025-09-07T09:36:19.2787432Z * [new branch] gh/IvanKobzarev/136/base -> origin/gh/IvanKobzarev/136/base 2025-09-07T09:36:19.2789068Z * [new branch] gh/IvanKobzarev/136/head -> origin/gh/IvanKobzarev/136/head 2025-09-07T09:36:19.2790627Z * [new branch] gh/IvanKobzarev/136/orig -> origin/gh/IvanKobzarev/136/orig 2025-09-07T09:36:19.2792619Z * [new branch] gh/IvanKobzarev/137/base -> origin/gh/IvanKobzarev/137/base 2025-09-07T09:36:19.2794176Z * [new branch] gh/IvanKobzarev/137/head -> origin/gh/IvanKobzarev/137/head 2025-09-07T09:36:19.2796006Z * [new branch] gh/IvanKobzarev/137/orig -> origin/gh/IvanKobzarev/137/orig 2025-09-07T09:36:19.2798340Z * [new branch] gh/IvanKobzarev/138/base -> origin/gh/IvanKobzarev/138/base 2025-09-07T09:36:19.2800108Z * [new branch] gh/IvanKobzarev/138/head -> origin/gh/IvanKobzarev/138/head 2025-09-07T09:36:19.2801829Z * [new branch] gh/IvanKobzarev/138/orig -> origin/gh/IvanKobzarev/138/orig 2025-09-07T09:36:19.2804047Z * [new branch] gh/IvanKobzarev/139/base -> origin/gh/IvanKobzarev/139/base 2025-09-07T09:36:19.2805919Z * [new branch] gh/IvanKobzarev/139/head -> origin/gh/IvanKobzarev/139/head 2025-09-07T09:36:19.2807447Z * [new branch] gh/IvanKobzarev/139/orig -> origin/gh/IvanKobzarev/139/orig 2025-09-07T09:36:19.2809833Z * [new branch] gh/IvanKobzarev/140/base -> origin/gh/IvanKobzarev/140/base 2025-09-07T09:36:19.2811361Z * [new branch] gh/IvanKobzarev/140/head -> origin/gh/IvanKobzarev/140/head 2025-09-07T09:36:19.2812927Z * [new branch] gh/IvanKobzarev/140/orig -> origin/gh/IvanKobzarev/140/orig 2025-09-07T09:36:19.2815435Z * [new branch] gh/IvanKobzarev/141/base -> origin/gh/IvanKobzarev/141/base 2025-09-07T09:36:19.2817319Z * [new branch] gh/IvanKobzarev/141/head -> origin/gh/IvanKobzarev/141/head 2025-09-07T09:36:19.2819430Z * [new branch] gh/IvanKobzarev/141/orig -> origin/gh/IvanKobzarev/141/orig 2025-09-07T09:36:19.2822467Z * [new branch] gh/IvanKobzarev/142/base -> origin/gh/IvanKobzarev/142/base 2025-09-07T09:36:19.2823291Z * [new branch] gh/IvanKobzarev/142/head -> origin/gh/IvanKobzarev/142/head 2025-09-07T09:36:19.2824815Z * [new branch] gh/IvanKobzarev/142/orig -> origin/gh/IvanKobzarev/142/orig 2025-09-07T09:36:19.2827271Z * [new branch] gh/IvanKobzarev/143/base -> origin/gh/IvanKobzarev/143/base 2025-09-07T09:36:19.2828889Z * [new branch] gh/IvanKobzarev/143/head -> origin/gh/IvanKobzarev/143/head 2025-09-07T09:36:19.2830389Z * [new branch] gh/IvanKobzarev/143/orig -> origin/gh/IvanKobzarev/143/orig 2025-09-07T09:36:19.2832671Z * [new branch] gh/IvanKobzarev/144/base -> origin/gh/IvanKobzarev/144/base 2025-09-07T09:36:19.2834217Z * [new branch] gh/IvanKobzarev/144/head -> origin/gh/IvanKobzarev/144/head 2025-09-07T09:36:19.2836138Z * [new branch] gh/IvanKobzarev/144/orig -> origin/gh/IvanKobzarev/144/orig 2025-09-07T09:36:19.2838348Z * [new branch] gh/IvanKobzarev/145/base -> origin/gh/IvanKobzarev/145/base 2025-09-07T09:36:19.2840250Z * [new branch] gh/IvanKobzarev/145/head -> origin/gh/IvanKobzarev/145/head 2025-09-07T09:36:19.2841602Z * [new branch] gh/IvanKobzarev/145/orig -> origin/gh/IvanKobzarev/145/orig 2025-09-07T09:36:19.2843806Z * [new branch] gh/IvanKobzarev/146/base -> origin/gh/IvanKobzarev/146/base 2025-09-07T09:36:19.2845611Z * [new branch] gh/IvanKobzarev/146/head -> origin/gh/IvanKobzarev/146/head 2025-09-07T09:36:19.2847232Z * [new branch] gh/IvanKobzarev/146/orig -> origin/gh/IvanKobzarev/146/orig 2025-09-07T09:36:19.2850138Z * [new branch] gh/NikhilAPatel/1/base -> origin/gh/NikhilAPatel/1/base 2025-09-07T09:36:19.2851885Z * [new branch] gh/NikhilAPatel/1/head -> origin/gh/NikhilAPatel/1/head 2025-09-07T09:36:19.2853933Z * [new branch] gh/NikhilAPatel/2/base -> origin/gh/NikhilAPatel/2/base 2025-09-07T09:36:19.2855591Z * [new branch] gh/NikhilAPatel/2/head -> origin/gh/NikhilAPatel/2/head 2025-09-07T09:36:19.2858142Z * [new branch] gh/NikhilAPatel/4/base -> origin/gh/NikhilAPatel/4/base 2025-09-07T09:36:19.2859727Z * [new branch] gh/NikhilAPatel/4/head -> origin/gh/NikhilAPatel/4/head 2025-09-07T09:36:19.2862403Z * [new branch] gh/PaliC/1/base -> origin/gh/PaliC/1/base 2025-09-07T09:36:19.2863890Z * [new branch] gh/PaliC/1/head -> origin/gh/PaliC/1/head 2025-09-07T09:36:19.2865687Z * [new branch] gh/PaliC/1/orig -> origin/gh/PaliC/1/orig 2025-09-07T09:36:19.2867927Z * [new branch] gh/PaliC/17/base -> origin/gh/PaliC/17/base 2025-09-07T09:36:19.2869481Z * [new branch] gh/PaliC/17/head -> origin/gh/PaliC/17/head 2025-09-07T09:36:19.2871148Z * [new branch] gh/PaliC/17/orig -> origin/gh/PaliC/17/orig 2025-09-07T09:36:19.2873259Z * [new branch] gh/PaliC/18/base -> origin/gh/PaliC/18/base 2025-09-07T09:36:19.2874854Z * [new branch] gh/PaliC/18/head -> origin/gh/PaliC/18/head 2025-09-07T09:36:19.2876717Z * [new branch] gh/PaliC/18/orig -> origin/gh/PaliC/18/orig 2025-09-07T09:36:19.2878851Z * [new branch] gh/PaliC/2/base -> origin/gh/PaliC/2/base 2025-09-07T09:36:19.2880420Z * [new branch] gh/PaliC/2/head -> origin/gh/PaliC/2/head 2025-09-07T09:36:19.2881928Z * [new branch] gh/PaliC/2/orig -> origin/gh/PaliC/2/orig 2025-09-07T09:36:19.2884171Z * [new branch] gh/PaliC/20/base -> origin/gh/PaliC/20/base 2025-09-07T09:36:19.2886088Z * [new branch] gh/PaliC/20/head -> origin/gh/PaliC/20/head 2025-09-07T09:36:19.2887613Z * [new branch] gh/PaliC/20/orig -> origin/gh/PaliC/20/orig 2025-09-07T09:36:19.2889769Z * [new branch] gh/PaliC/21/base -> origin/gh/PaliC/21/base 2025-09-07T09:36:19.2891322Z * [new branch] gh/PaliC/21/head -> origin/gh/PaliC/21/head 2025-09-07T09:36:19.2892962Z * [new branch] gh/PaliC/21/orig -> origin/gh/PaliC/21/orig 2025-09-07T09:36:19.2895189Z * [new branch] gh/PaliC/22/base -> origin/gh/PaliC/22/base 2025-09-07T09:36:19.2896855Z * [new branch] gh/PaliC/22/head -> origin/gh/PaliC/22/head 2025-09-07T09:36:19.2898451Z * [new branch] gh/PaliC/22/orig -> origin/gh/PaliC/22/orig 2025-09-07T09:36:19.2900557Z * [new branch] gh/PaliC/23/base -> origin/gh/PaliC/23/base 2025-09-07T09:36:19.2902260Z * [new branch] gh/PaliC/23/head -> origin/gh/PaliC/23/head 2025-09-07T09:36:19.2903805Z * [new branch] gh/PaliC/23/orig -> origin/gh/PaliC/23/orig 2025-09-07T09:36:19.2906250Z * [new branch] gh/PaliC/24/base -> origin/gh/PaliC/24/base 2025-09-07T09:36:19.2907900Z * [new branch] gh/PaliC/24/head -> origin/gh/PaliC/24/head 2025-09-07T09:36:19.2909306Z * [new branch] gh/PaliC/24/orig -> origin/gh/PaliC/24/orig 2025-09-07T09:36:19.2912126Z * [new branch] gh/PaulZhang12/17/base -> origin/gh/PaulZhang12/17/base 2025-09-07T09:36:19.2913651Z * [new branch] gh/PaulZhang12/17/head -> origin/gh/PaulZhang12/17/head 2025-09-07T09:36:19.2916309Z * [new branch] gh/PaulZhang12/20/base -> origin/gh/PaulZhang12/20/base 2025-09-07T09:36:19.2917814Z * [new branch] gh/PaulZhang12/20/head -> origin/gh/PaulZhang12/20/head 2025-09-07T09:36:19.2919411Z * [new branch] gh/PaulZhang12/20/orig -> origin/gh/PaulZhang12/20/orig 2025-09-07T09:36:19.2921615Z * [new branch] gh/PaulZhang12/21/base -> origin/gh/PaulZhang12/21/base 2025-09-07T09:36:19.2923199Z * [new branch] gh/PaulZhang12/21/head -> origin/gh/PaulZhang12/21/head 2025-09-07T09:36:19.2924738Z * [new branch] gh/PaulZhang12/21/orig -> origin/gh/PaulZhang12/21/orig 2025-09-07T09:36:19.2927344Z * [new branch] gh/PaulZhang12/22/base -> origin/gh/PaulZhang12/22/base 2025-09-07T09:36:19.2928980Z * [new branch] gh/PaulZhang12/22/head -> origin/gh/PaulZhang12/22/head 2025-09-07T09:36:19.2930514Z * [new branch] gh/PaulZhang12/22/orig -> origin/gh/PaulZhang12/22/orig 2025-09-07T09:36:19.2932671Z * [new branch] gh/PaulZhang12/23/base -> origin/gh/PaulZhang12/23/base 2025-09-07T09:36:19.2934248Z * [new branch] gh/PaulZhang12/23/head -> origin/gh/PaulZhang12/23/head 2025-09-07T09:36:19.2936133Z * [new branch] gh/PaulZhang12/23/orig -> origin/gh/PaulZhang12/23/orig 2025-09-07T09:36:19.2938222Z * [new branch] gh/PaulZhang12/24/base -> origin/gh/PaulZhang12/24/base 2025-09-07T09:36:19.2939766Z * [new branch] gh/PaulZhang12/24/head -> origin/gh/PaulZhang12/24/head 2025-09-07T09:36:19.2941411Z * [new branch] gh/PaulZhang12/24/orig -> origin/gh/PaulZhang12/24/orig 2025-09-07T09:36:19.2943892Z * [new branch] gh/PaulZhang12/25/base -> origin/gh/PaulZhang12/25/base 2025-09-07T09:36:19.2945494Z * [new branch] gh/PaulZhang12/25/head -> origin/gh/PaulZhang12/25/head 2025-09-07T09:36:19.2947307Z * [new branch] gh/PaulZhang12/25/orig -> origin/gh/PaulZhang12/25/orig 2025-09-07T09:36:19.2950087Z * [new branch] gh/SamGinzburg/11/base -> origin/gh/SamGinzburg/11/base 2025-09-07T09:36:19.2951724Z * [new branch] gh/SamGinzburg/11/head -> origin/gh/SamGinzburg/11/head 2025-09-07T09:36:19.2954404Z * [new branch] gh/Sidharth123-cpu/24/base -> origin/gh/Sidharth123-cpu/24/base 2025-09-07T09:36:19.2956724Z * [new branch] gh/Sidharth123-cpu/25/base -> origin/gh/Sidharth123-cpu/25/base 2025-09-07T09:36:19.2958733Z * [new branch] gh/Sidharth123-cpu/26/base -> origin/gh/Sidharth123-cpu/26/base 2025-09-07T09:36:19.2960932Z * [new branch] gh/Sidharth123-cpu/27/base -> origin/gh/Sidharth123-cpu/27/base 2025-09-07T09:36:19.2963641Z * [new branch] gh/StrongerXi/1/base -> origin/gh/StrongerXi/1/base 2025-09-07T09:36:19.2965277Z * [new branch] gh/StrongerXi/1/head -> origin/gh/StrongerXi/1/head 2025-09-07T09:36:19.2967761Z * [new branch] gh/StrongerXi/133/base -> origin/gh/StrongerXi/133/base 2025-09-07T09:36:19.2969227Z * [new branch] gh/StrongerXi/133/head -> origin/gh/StrongerXi/133/head 2025-09-07T09:36:19.2970861Z * [new branch] gh/StrongerXi/133/orig -> origin/gh/StrongerXi/133/orig 2025-09-07T09:36:19.2972994Z * [new branch] gh/StrongerXi/134/base -> origin/gh/StrongerXi/134/base 2025-09-07T09:36:19.2974680Z * [new branch] gh/StrongerXi/134/head -> origin/gh/StrongerXi/134/head 2025-09-07T09:36:19.2976416Z * [new branch] gh/StrongerXi/134/orig -> origin/gh/StrongerXi/134/orig 2025-09-07T09:36:19.2978526Z * [new branch] gh/StrongerXi/136/base -> origin/gh/StrongerXi/136/base 2025-09-07T09:36:19.2980154Z * [new branch] gh/StrongerXi/136/head -> origin/gh/StrongerXi/136/head 2025-09-07T09:36:19.2981774Z * [new branch] gh/StrongerXi/136/orig -> origin/gh/StrongerXi/136/orig 2025-09-07T09:36:19.2984004Z * [new branch] gh/StrongerXi/137/base -> origin/gh/StrongerXi/137/base 2025-09-07T09:36:19.2985773Z * [new branch] gh/StrongerXi/137/head -> origin/gh/StrongerXi/137/head 2025-09-07T09:36:19.2987390Z * [new branch] gh/StrongerXi/137/orig -> origin/gh/StrongerXi/137/orig 2025-09-07T09:36:19.2989540Z * [new branch] gh/StrongerXi/138/base -> origin/gh/StrongerXi/138/base 2025-09-07T09:36:19.2991059Z * [new branch] gh/StrongerXi/138/head -> origin/gh/StrongerXi/138/head 2025-09-07T09:36:19.2992565Z * [new branch] gh/StrongerXi/138/orig -> origin/gh/StrongerXi/138/orig 2025-09-07T09:36:19.2994696Z * [new branch] gh/StrongerXi/139/base -> origin/gh/StrongerXi/139/base 2025-09-07T09:36:19.2996555Z * [new branch] gh/StrongerXi/139/head -> origin/gh/StrongerXi/139/head 2025-09-07T09:36:19.2998231Z * [new branch] gh/StrongerXi/139/orig -> origin/gh/StrongerXi/139/orig 2025-09-07T09:36:19.3000281Z * [new branch] gh/StrongerXi/140/base -> origin/gh/StrongerXi/140/base 2025-09-07T09:36:19.3001831Z * [new branch] gh/StrongerXi/140/head -> origin/gh/StrongerXi/140/head 2025-09-07T09:36:19.3003485Z * [new branch] gh/StrongerXi/140/orig -> origin/gh/StrongerXi/140/orig 2025-09-07T09:36:19.3005962Z * [new branch] gh/StrongerXi/71/base -> origin/gh/StrongerXi/71/base 2025-09-07T09:36:19.3007459Z * [new branch] gh/StrongerXi/71/head -> origin/gh/StrongerXi/71/head 2025-09-07T09:36:19.3009506Z * [new branch] gh/StrongerXi/72/base -> origin/gh/StrongerXi/72/base 2025-09-07T09:36:19.3011163Z * [new branch] gh/StrongerXi/72/head -> origin/gh/StrongerXi/72/head 2025-09-07T09:36:19.3013783Z * [new branch] gh/XilunWu/133/base -> origin/gh/XilunWu/133/base 2025-09-07T09:36:19.3015575Z * [new branch] gh/XilunWu/133/head -> origin/gh/XilunWu/133/head 2025-09-07T09:36:19.3017275Z * [new branch] gh/XilunWu/133/orig -> origin/gh/XilunWu/133/orig 2025-09-07T09:36:19.3019501Z * [new branch] gh/XilunWu/139/base -> origin/gh/XilunWu/139/base 2025-09-07T09:36:19.3021020Z * [new branch] gh/XilunWu/139/head -> origin/gh/XilunWu/139/head 2025-09-07T09:36:19.3022661Z * [new branch] gh/XilunWu/139/orig -> origin/gh/XilunWu/139/orig 2025-09-07T09:36:19.3024870Z * [new branch] gh/XilunWu/143/base -> origin/gh/XilunWu/143/base 2025-09-07T09:36:19.3026740Z * [new branch] gh/XilunWu/143/head -> origin/gh/XilunWu/143/head 2025-09-07T09:36:19.3028263Z * [new branch] gh/XilunWu/143/orig -> origin/gh/XilunWu/143/orig 2025-09-07T09:36:19.3030540Z * [new branch] gh/XilunWu/144/base -> origin/gh/XilunWu/144/base 2025-09-07T09:36:19.3032130Z * [new branch] gh/XilunWu/144/head -> origin/gh/XilunWu/144/head 2025-09-07T09:36:19.3033767Z * [new branch] gh/XilunWu/144/orig -> origin/gh/XilunWu/144/orig 2025-09-07T09:36:19.3036323Z * [new branch] gh/XilunWu/145/base -> origin/gh/XilunWu/145/base 2025-09-07T09:36:19.3037817Z * [new branch] gh/XilunWu/145/head -> origin/gh/XilunWu/145/head 2025-09-07T09:36:19.3039688Z * [new branch] gh/XilunWu/145/orig -> origin/gh/XilunWu/145/orig 2025-09-07T09:36:19.3041585Z * [new branch] gh/XilunWu/146/base -> origin/gh/XilunWu/146/base 2025-09-07T09:36:19.3043064Z * [new branch] gh/XilunWu/146/head -> origin/gh/XilunWu/146/head 2025-09-07T09:36:19.3044588Z * [new branch] gh/XilunWu/146/orig -> origin/gh/XilunWu/146/orig 2025-09-07T09:36:19.3047088Z * [new branch] gh/XilunWu/147/base -> origin/gh/XilunWu/147/base 2025-09-07T09:36:19.3048602Z * [new branch] gh/XilunWu/147/head -> origin/gh/XilunWu/147/head 2025-09-07T09:36:19.3050147Z * [new branch] gh/XilunWu/147/orig -> origin/gh/XilunWu/147/orig 2025-09-07T09:36:19.3052250Z * [new branch] gh/XilunWu/148/base -> origin/gh/XilunWu/148/base 2025-09-07T09:36:19.3053846Z * [new branch] gh/XilunWu/148/head -> origin/gh/XilunWu/148/head 2025-09-07T09:36:19.3055572Z * [new branch] gh/XilunWu/148/orig -> origin/gh/XilunWu/148/orig 2025-09-07T09:36:19.3057756Z * [new branch] gh/XilunWu/149/base -> origin/gh/XilunWu/149/base 2025-09-07T09:36:19.3059304Z * [new branch] gh/XilunWu/149/head -> origin/gh/XilunWu/149/head 2025-09-07T09:36:19.3060834Z * [new branch] gh/XilunWu/149/orig -> origin/gh/XilunWu/149/orig 2025-09-07T09:36:19.3063098Z * [new branch] gh/XilunWu/150/base -> origin/gh/XilunWu/150/base 2025-09-07T09:36:19.3064566Z * [new branch] gh/XilunWu/150/head -> origin/gh/XilunWu/150/head 2025-09-07T09:36:19.3066517Z * [new branch] gh/XilunWu/150/orig -> origin/gh/XilunWu/150/orig 2025-09-07T09:36:19.3068810Z * [new branch] gh/XilunWu/151/base -> origin/gh/XilunWu/151/base 2025-09-07T09:36:19.3070462Z * [new branch] gh/XilunWu/151/head -> origin/gh/XilunWu/151/head 2025-09-07T09:36:19.3072062Z * [new branch] gh/XilunWu/151/orig -> origin/gh/XilunWu/151/orig 2025-09-07T09:36:19.3074166Z * [new branch] gh/XilunWu/152/base -> origin/gh/XilunWu/152/base 2025-09-07T09:36:19.3075830Z * [new branch] gh/XilunWu/152/head -> origin/gh/XilunWu/152/head 2025-09-07T09:36:19.3077352Z * [new branch] gh/XilunWu/152/orig -> origin/gh/XilunWu/152/orig 2025-09-07T09:36:19.3079708Z * [new branch] gh/XilunWu/153/base -> origin/gh/XilunWu/153/base 2025-09-07T09:36:19.3081343Z * [new branch] gh/XilunWu/153/head -> origin/gh/XilunWu/153/head 2025-09-07T09:36:19.3082785Z * [new branch] gh/XilunWu/153/orig -> origin/gh/XilunWu/153/orig 2025-09-07T09:36:19.3085248Z * [new branch] gh/XilunWu/160/base -> origin/gh/XilunWu/160/base 2025-09-07T09:36:19.3086866Z * [new branch] gh/XilunWu/160/head -> origin/gh/XilunWu/160/head 2025-09-07T09:36:19.3088535Z * [new branch] gh/XilunWu/160/orig -> origin/gh/XilunWu/160/orig 2025-09-07T09:36:19.3090807Z * [new branch] gh/XilunWu/161/base -> origin/gh/XilunWu/161/base 2025-09-07T09:36:19.3092331Z * [new branch] gh/XilunWu/161/head -> origin/gh/XilunWu/161/head 2025-09-07T09:36:19.3093808Z * [new branch] gh/XilunWu/161/orig -> origin/gh/XilunWu/161/orig 2025-09-07T09:36:19.3096437Z * [new branch] gh/XilunWu/163/base -> origin/gh/XilunWu/163/base 2025-09-07T09:36:19.3097876Z * [new branch] gh/XilunWu/163/head -> origin/gh/XilunWu/163/head 2025-09-07T09:36:19.3099427Z * [new branch] gh/XilunWu/163/orig -> origin/gh/XilunWu/163/orig 2025-09-07T09:36:19.3101804Z * [new branch] gh/XilunWu/164/base -> origin/gh/XilunWu/164/base 2025-09-07T09:36:19.3103639Z * [new branch] gh/XilunWu/164/head -> origin/gh/XilunWu/164/head 2025-09-07T09:36:19.3105163Z * [new branch] gh/XilunWu/164/orig -> origin/gh/XilunWu/164/orig 2025-09-07T09:36:19.3107613Z * [new branch] gh/XilunWu/165/base -> origin/gh/XilunWu/165/base 2025-09-07T09:36:19.3109265Z * [new branch] gh/XilunWu/165/head -> origin/gh/XilunWu/165/head 2025-09-07T09:36:19.3110801Z * [new branch] gh/XilunWu/165/orig -> origin/gh/XilunWu/165/orig 2025-09-07T09:36:19.3113088Z * [new branch] gh/XilunWu/166/base -> origin/gh/XilunWu/166/base 2025-09-07T09:36:19.3114719Z * [new branch] gh/XilunWu/166/head -> origin/gh/XilunWu/166/head 2025-09-07T09:36:19.3116550Z * [new branch] gh/XilunWu/166/orig -> origin/gh/XilunWu/166/orig 2025-09-07T09:36:19.3118731Z * [new branch] gh/XilunWu/167/base -> origin/gh/XilunWu/167/base 2025-09-07T09:36:19.3120291Z * [new branch] gh/XilunWu/167/head -> origin/gh/XilunWu/167/head 2025-09-07T09:36:19.3121830Z * [new branch] gh/XilunWu/167/orig -> origin/gh/XilunWu/167/orig 2025-09-07T09:36:19.3124084Z * [new branch] gh/XilunWu/168/base -> origin/gh/XilunWu/168/base 2025-09-07T09:36:19.3125830Z * [new branch] gh/XilunWu/168/head -> origin/gh/XilunWu/168/head 2025-09-07T09:36:19.3127536Z * [new branch] gh/XilunWu/168/orig -> origin/gh/XilunWu/168/orig 2025-09-07T09:36:19.3129636Z * [new branch] gh/XilunWu/169/base -> origin/gh/XilunWu/169/base 2025-09-07T09:36:19.3131384Z * [new branch] gh/XilunWu/169/head -> origin/gh/XilunWu/169/head 2025-09-07T09:36:19.3132904Z * [new branch] gh/XilunWu/169/orig -> origin/gh/XilunWu/169/orig 2025-09-07T09:36:19.3135116Z * [new branch] gh/XilunWu/170/base -> origin/gh/XilunWu/170/base 2025-09-07T09:36:19.3136808Z * [new branch] gh/XilunWu/170/head -> origin/gh/XilunWu/170/head 2025-09-07T09:36:19.3138285Z * [new branch] gh/XilunWu/170/orig -> origin/gh/XilunWu/170/orig 2025-09-07T09:36:19.3140985Z * [new branch] gh/XuehaiPan/14/base -> origin/gh/XuehaiPan/14/base 2025-09-07T09:36:19.3142839Z * [new branch] gh/XuehaiPan/14/head -> origin/gh/XuehaiPan/14/head 2025-09-07T09:36:19.3144362Z * [new branch] gh/XuehaiPan/14/orig -> origin/gh/XuehaiPan/14/orig 2025-09-07T09:36:19.3147091Z * [new branch] gh/XuehaiPan/179/base -> origin/gh/XuehaiPan/179/base 2025-09-07T09:36:19.3148646Z * [new branch] gh/XuehaiPan/179/head -> origin/gh/XuehaiPan/179/head 2025-09-07T09:36:19.3150266Z * [new branch] gh/XuehaiPan/179/orig -> origin/gh/XuehaiPan/179/orig 2025-09-07T09:36:19.3152531Z * [new branch] gh/XuehaiPan/189/base -> origin/gh/XuehaiPan/189/base 2025-09-07T09:36:19.3154097Z * [new branch] gh/XuehaiPan/189/head -> origin/gh/XuehaiPan/189/head 2025-09-07T09:36:19.3155866Z * [new branch] gh/XuehaiPan/189/orig -> origin/gh/XuehaiPan/189/orig 2025-09-07T09:36:19.3157975Z * [new branch] gh/XuehaiPan/232/base -> origin/gh/XuehaiPan/232/base 2025-09-07T09:36:19.3159582Z * [new branch] gh/XuehaiPan/232/head -> origin/gh/XuehaiPan/232/head 2025-09-07T09:36:19.3161072Z * [new branch] gh/XuehaiPan/232/orig -> origin/gh/XuehaiPan/232/orig 2025-09-07T09:36:19.3163301Z * [new branch] gh/XuehaiPan/249/base -> origin/gh/XuehaiPan/249/base 2025-09-07T09:36:19.3165067Z * [new branch] gh/XuehaiPan/249/head -> origin/gh/XuehaiPan/249/head 2025-09-07T09:36:19.3166710Z * [new branch] gh/XuehaiPan/249/orig -> origin/gh/XuehaiPan/249/orig 2025-09-07T09:36:19.3168736Z * [new branch] gh/XuehaiPan/253/base -> origin/gh/XuehaiPan/253/base 2025-09-07T09:36:19.3170461Z * [new branch] gh/XuehaiPan/253/head -> origin/gh/XuehaiPan/253/head 2025-09-07T09:36:19.3171831Z * [new branch] gh/XuehaiPan/253/orig -> origin/gh/XuehaiPan/253/orig 2025-09-07T09:36:19.3173883Z * [new branch] gh/XuehaiPan/254/base -> origin/gh/XuehaiPan/254/base 2025-09-07T09:36:19.3175759Z * [new branch] gh/XuehaiPan/254/head -> origin/gh/XuehaiPan/254/head 2025-09-07T09:36:19.3177378Z * [new branch] gh/XuehaiPan/254/orig -> origin/gh/XuehaiPan/254/orig 2025-09-07T09:36:19.3179543Z * [new branch] gh/XuehaiPan/255/base -> origin/gh/XuehaiPan/255/base 2025-09-07T09:36:19.3181146Z * [new branch] gh/XuehaiPan/255/head -> origin/gh/XuehaiPan/255/head 2025-09-07T09:36:19.3182786Z * [new branch] gh/XuehaiPan/255/orig -> origin/gh/XuehaiPan/255/orig 2025-09-07T09:36:19.3185049Z * [new branch] gh/XuehaiPan/257/base -> origin/gh/XuehaiPan/257/base 2025-09-07T09:36:19.3186747Z * [new branch] gh/XuehaiPan/257/head -> origin/gh/XuehaiPan/257/head 2025-09-07T09:36:19.3188271Z * [new branch] gh/XuehaiPan/257/orig -> origin/gh/XuehaiPan/257/orig 2025-09-07T09:36:19.3190434Z * [new branch] gh/XuehaiPan/271/base -> origin/gh/XuehaiPan/271/base 2025-09-07T09:36:19.3192035Z * [new branch] gh/XuehaiPan/271/head -> origin/gh/XuehaiPan/271/head 2025-09-07T09:36:19.3193539Z * [new branch] gh/XuehaiPan/271/orig -> origin/gh/XuehaiPan/271/orig 2025-09-07T09:36:19.3195986Z * [new branch] gh/XuehaiPan/290/base -> origin/gh/XuehaiPan/290/base 2025-09-07T09:36:19.3197728Z * [new branch] gh/XuehaiPan/290/head -> origin/gh/XuehaiPan/290/head 2025-09-07T09:36:19.3199130Z * [new branch] gh/XuehaiPan/290/orig -> origin/gh/XuehaiPan/290/orig 2025-09-07T09:36:19.3201378Z * [new branch] gh/XuehaiPan/343/base -> origin/gh/XuehaiPan/343/base 2025-09-07T09:36:19.3202972Z * [new branch] gh/XuehaiPan/343/head -> origin/gh/XuehaiPan/343/head 2025-09-07T09:36:19.3204422Z * [new branch] gh/XuehaiPan/343/orig -> origin/gh/XuehaiPan/343/orig 2025-09-07T09:36:19.3206945Z * [new branch] gh/XuehaiPan/347/base -> origin/gh/XuehaiPan/347/base 2025-09-07T09:36:19.3208520Z * [new branch] gh/XuehaiPan/347/head -> origin/gh/XuehaiPan/347/head 2025-09-07T09:36:19.3210344Z * [new branch] gh/XuehaiPan/347/orig -> origin/gh/XuehaiPan/347/orig 2025-09-07T09:36:19.3212256Z * [new branch] gh/XuehaiPan/348/base -> origin/gh/XuehaiPan/348/base 2025-09-07T09:36:19.3213816Z * [new branch] gh/XuehaiPan/348/head -> origin/gh/XuehaiPan/348/head 2025-09-07T09:36:19.3215493Z * [new branch] gh/XuehaiPan/348/orig -> origin/gh/XuehaiPan/348/orig 2025-09-07T09:36:19.3217878Z * [new branch] gh/XuehaiPan/350/base -> origin/gh/XuehaiPan/350/base 2025-09-07T09:36:19.3219471Z * [new branch] gh/XuehaiPan/350/head -> origin/gh/XuehaiPan/350/head 2025-09-07T09:36:19.3220987Z * [new branch] gh/XuehaiPan/350/orig -> origin/gh/XuehaiPan/350/orig 2025-09-07T09:36:19.3223589Z * [new branch] gh/XuehaiPan/356/base -> origin/gh/XuehaiPan/356/base 2025-09-07T09:36:19.3225197Z * [new branch] gh/XuehaiPan/356/head -> origin/gh/XuehaiPan/356/head 2025-09-07T09:36:19.3227000Z * [new branch] gh/XuehaiPan/356/orig -> origin/gh/XuehaiPan/356/orig 2025-09-07T09:36:19.3229180Z * [new branch] gh/XuehaiPan/357/base -> origin/gh/XuehaiPan/357/base 2025-09-07T09:36:19.3230701Z * [new branch] gh/XuehaiPan/357/head -> origin/gh/XuehaiPan/357/head 2025-09-07T09:36:19.3232434Z * [new branch] gh/XuehaiPan/357/orig -> origin/gh/XuehaiPan/357/orig 2025-09-07T09:36:19.3234403Z * [new branch] gh/XuehaiPan/358/base -> origin/gh/XuehaiPan/358/base 2025-09-07T09:36:19.3236251Z * [new branch] gh/XuehaiPan/358/head -> origin/gh/XuehaiPan/358/head 2025-09-07T09:36:19.3237772Z * [new branch] gh/XuehaiPan/358/orig -> origin/gh/XuehaiPan/358/orig 2025-09-07T09:36:19.3240006Z * [new branch] gh/XuehaiPan/359/base -> origin/gh/XuehaiPan/359/base 2025-09-07T09:36:19.3241553Z * [new branch] gh/XuehaiPan/359/head -> origin/gh/XuehaiPan/359/head 2025-09-07T09:36:19.3243075Z * [new branch] gh/XuehaiPan/359/orig -> origin/gh/XuehaiPan/359/orig 2025-09-07T09:36:19.3245360Z * [new branch] gh/XuehaiPan/360/base -> origin/gh/XuehaiPan/360/base 2025-09-07T09:36:19.3247044Z * [new branch] gh/XuehaiPan/360/head -> origin/gh/XuehaiPan/360/head 2025-09-07T09:36:19.3248554Z * [new branch] gh/XuehaiPan/360/orig -> origin/gh/XuehaiPan/360/orig 2025-09-07T09:36:19.3250775Z * [new branch] gh/XuehaiPan/365/base -> origin/gh/XuehaiPan/365/base 2025-09-07T09:36:19.3252335Z * [new branch] gh/XuehaiPan/365/head -> origin/gh/XuehaiPan/365/head 2025-09-07T09:36:19.3253901Z * [new branch] gh/XuehaiPan/365/orig -> origin/gh/XuehaiPan/365/orig 2025-09-07T09:36:19.3256502Z * [new branch] gh/XuehaiPan/366/base -> origin/gh/XuehaiPan/366/base 2025-09-07T09:36:19.3258047Z * [new branch] gh/XuehaiPan/366/head -> origin/gh/XuehaiPan/366/head 2025-09-07T09:36:19.3260227Z * [new branch] gh/XuehaiPan/369/base -> origin/gh/XuehaiPan/369/base 2025-09-07T09:36:19.3261939Z * [new branch] gh/XuehaiPan/369/head -> origin/gh/XuehaiPan/369/head 2025-09-07T09:36:19.3263523Z * [new branch] gh/XuehaiPan/369/orig -> origin/gh/XuehaiPan/369/orig 2025-09-07T09:36:19.3265912Z * [new branch] gh/XuehaiPan/370/base -> origin/gh/XuehaiPan/370/base 2025-09-07T09:36:19.3267476Z * [new branch] gh/XuehaiPan/370/head -> origin/gh/XuehaiPan/370/head 2025-09-07T09:36:19.3268954Z * [new branch] gh/XuehaiPan/370/orig -> origin/gh/XuehaiPan/370/orig 2025-09-07T09:36:19.3271185Z * [new branch] gh/XuehaiPan/380/base -> origin/gh/XuehaiPan/380/base 2025-09-07T09:36:19.3272690Z * [new branch] gh/XuehaiPan/380/head -> origin/gh/XuehaiPan/380/head 2025-09-07T09:36:19.3274298Z * [new branch] gh/XuehaiPan/380/orig -> origin/gh/XuehaiPan/380/orig 2025-09-07T09:36:19.3276798Z * [new branch] gh/XuehaiPan/381/base -> origin/gh/XuehaiPan/381/base 2025-09-07T09:36:19.3278315Z * [new branch] gh/XuehaiPan/381/head -> origin/gh/XuehaiPan/381/head 2025-09-07T09:36:19.3280573Z * [new branch] gh/XuehaiPan/382/base -> origin/gh/XuehaiPan/382/base 2025-09-07T09:36:19.3282112Z * [new branch] gh/XuehaiPan/382/head -> origin/gh/XuehaiPan/382/head 2025-09-07T09:36:19.3283666Z * [new branch] gh/XuehaiPan/382/orig -> origin/gh/XuehaiPan/382/orig 2025-09-07T09:36:19.3286189Z * [new branch] gh/XuehaiPan/383/base -> origin/gh/XuehaiPan/383/base 2025-09-07T09:36:19.3287712Z * [new branch] gh/XuehaiPan/383/head -> origin/gh/XuehaiPan/383/head 2025-09-07T09:36:19.3289265Z * [new branch] gh/XuehaiPan/383/orig -> origin/gh/XuehaiPan/383/orig 2025-09-07T09:36:19.3291508Z * [new branch] gh/XuehaiPan/384/base -> origin/gh/XuehaiPan/384/base 2025-09-07T09:36:19.3293016Z * [new branch] gh/XuehaiPan/384/head -> origin/gh/XuehaiPan/384/head 2025-09-07T09:36:19.3294546Z * [new branch] gh/XuehaiPan/384/orig -> origin/gh/XuehaiPan/384/orig 2025-09-07T09:36:19.3297381Z * [new branch] gh/XuehaiPan/385/base -> origin/gh/XuehaiPan/385/base 2025-09-07T09:36:19.3298685Z * [new branch] gh/XuehaiPan/385/head -> origin/gh/XuehaiPan/385/head 2025-09-07T09:36:19.3300102Z * [new branch] gh/XuehaiPan/385/orig -> origin/gh/XuehaiPan/385/orig 2025-09-07T09:36:19.3302379Z * [new branch] gh/XuehaiPan/386/base -> origin/gh/XuehaiPan/386/base 2025-09-07T09:36:19.3303954Z * [new branch] gh/XuehaiPan/386/head -> origin/gh/XuehaiPan/386/head 2025-09-07T09:36:19.3305822Z * [new branch] gh/XuehaiPan/386/orig -> origin/gh/XuehaiPan/386/orig 2025-09-07T09:36:19.3308076Z * [new branch] gh/XuehaiPan/387/base -> origin/gh/XuehaiPan/387/base 2025-09-07T09:36:19.3309590Z * [new branch] gh/XuehaiPan/387/head -> origin/gh/XuehaiPan/387/head 2025-09-07T09:36:19.3311212Z * [new branch] gh/XuehaiPan/387/orig -> origin/gh/XuehaiPan/387/orig 2025-09-07T09:36:19.3313870Z * [new branch] gh/ZainRizvi/1/base -> origin/gh/ZainRizvi/1/base 2025-09-07T09:36:19.3315817Z * [new branch] gh/ZainRizvi/1/head -> origin/gh/ZainRizvi/1/head 2025-09-07T09:36:19.3317943Z * [new branch] gh/ZainRizvi/2/base -> origin/gh/ZainRizvi/2/base 2025-09-07T09:36:19.3319443Z * [new branch] gh/ZainRizvi/2/head -> origin/gh/ZainRizvi/2/head 2025-09-07T09:36:19.3321566Z * [new branch] gh/ZainRizvi/3/base -> origin/gh/ZainRizvi/3/base 2025-09-07T09:36:19.3323094Z * [new branch] gh/ZainRizvi/3/head -> origin/gh/ZainRizvi/3/head 2025-09-07T09:36:19.3325346Z * [new branch] gh/ZainRizvi/4/base -> origin/gh/ZainRizvi/4/base 2025-09-07T09:36:19.3327096Z * [new branch] gh/ZainRizvi/4/head -> origin/gh/ZainRizvi/4/head 2025-09-07T09:36:19.3329168Z * [new branch] gh/ZainRizvi/5/base -> origin/gh/ZainRizvi/5/base 2025-09-07T09:36:19.3330608Z * [new branch] gh/ZainRizvi/5/head -> origin/gh/ZainRizvi/5/head 2025-09-07T09:36:19.3332765Z * [new branch] gh/ZainRizvi/6/base -> origin/gh/ZainRizvi/6/base 2025-09-07T09:36:19.3334252Z * [new branch] gh/ZainRizvi/6/head -> origin/gh/ZainRizvi/6/head 2025-09-07T09:36:19.3336128Z * [new branch] gh/ZainRizvi/6/orig -> origin/gh/ZainRizvi/6/orig 2025-09-07T09:36:19.3338215Z * [new branch] gh/ZainRizvi/7/base -> origin/gh/ZainRizvi/7/base 2025-09-07T09:36:19.3339746Z * [new branch] gh/ZainRizvi/7/head -> origin/gh/ZainRizvi/7/head 2025-09-07T09:36:19.3341276Z * [new branch] gh/ZainRizvi/7/orig -> origin/gh/ZainRizvi/7/orig 2025-09-07T09:36:19.3343625Z * [new branch] gh/ZainRizvi/8/base -> origin/gh/ZainRizvi/8/base 2025-09-07T09:36:19.3345502Z * [new branch] gh/ZainRizvi/8/head -> origin/gh/ZainRizvi/8/head 2025-09-07T09:36:19.3347786Z * [new branch] gh/ZainRizvi/9/base -> origin/gh/ZainRizvi/9/base 2025-09-07T09:36:19.3349298Z * [new branch] gh/ZainRizvi/9/head -> origin/gh/ZainRizvi/9/head 2025-09-07T09:36:19.3350834Z * [new branch] gh/ZainRizvi/9/orig -> origin/gh/ZainRizvi/9/orig 2025-09-07T09:36:19.3353502Z * [new branch] gh/ZhiweiYan-96/39/base -> origin/gh/ZhiweiYan-96/39/base 2025-09-07T09:36:19.3355186Z * [new branch] gh/ZhiweiYan-96/39/head -> origin/gh/ZhiweiYan-96/39/head 2025-09-07T09:36:19.3357014Z * [new branch] gh/ZhiweiYan-96/39/orig -> origin/gh/ZhiweiYan-96/39/orig 2025-09-07T09:36:19.3359257Z * [new branch] gh/ZhiweiYan-96/44/base -> origin/gh/ZhiweiYan-96/44/base 2025-09-07T09:36:19.3360805Z * [new branch] gh/ZhiweiYan-96/44/head -> origin/gh/ZhiweiYan-96/44/head 2025-09-07T09:36:19.3363117Z * [new branch] gh/ZhiweiYan-96/45/base -> origin/gh/ZhiweiYan-96/45/base 2025-09-07T09:36:19.3364502Z * [new branch] gh/ZhiweiYan-96/45/head -> origin/gh/ZhiweiYan-96/45/head 2025-09-07T09:36:19.3367062Z * [new branch] gh/ZhiweiYan-96/49/base -> origin/gh/ZhiweiYan-96/49/base 2025-09-07T09:36:19.3368601Z * [new branch] gh/ZhiweiYan-96/49/head -> origin/gh/ZhiweiYan-96/49/head 2025-09-07T09:36:19.3370664Z * [new branch] gh/ZhiweiYan-96/62/base -> origin/gh/ZhiweiYan-96/62/base 2025-09-07T09:36:19.3372216Z * [new branch] gh/ZhiweiYan-96/62/head -> origin/gh/ZhiweiYan-96/62/head 2025-09-07T09:36:19.3374422Z * [new branch] gh/ZhiweiYan-96/64/base -> origin/gh/ZhiweiYan-96/64/base 2025-09-07T09:36:19.3376227Z * [new branch] gh/ZhiweiYan-96/64/head -> origin/gh/ZhiweiYan-96/64/head 2025-09-07T09:36:19.3377799Z * [new branch] gh/ZhiweiYan-96/64/orig -> origin/gh/ZhiweiYan-96/64/orig 2025-09-07T09:36:19.3380013Z * [new branch] gh/ZhiweiYan-96/65/base -> origin/gh/ZhiweiYan-96/65/base 2025-09-07T09:36:19.3381635Z * [new branch] gh/ZhiweiYan-96/65/head -> origin/gh/ZhiweiYan-96/65/head 2025-09-07T09:36:19.3383302Z * [new branch] gh/ZhiweiYan-96/65/orig -> origin/gh/ZhiweiYan-96/65/orig 2025-09-07T09:36:19.3385795Z * [new branch] gh/ZhiweiYan-96/66/base -> origin/gh/ZhiweiYan-96/66/base 2025-09-07T09:36:19.3387417Z * [new branch] gh/ZhiweiYan-96/66/head -> origin/gh/ZhiweiYan-96/66/head 2025-09-07T09:36:19.3389576Z * [new branch] gh/ZhiweiYan-96/67/base -> origin/gh/ZhiweiYan-96/67/base 2025-09-07T09:36:19.3391106Z * [new branch] gh/ZhiweiYan-96/67/head -> origin/gh/ZhiweiYan-96/67/head 2025-09-07T09:36:19.3393230Z * [new branch] gh/ZhiweiYan-96/68/base -> origin/gh/ZhiweiYan-96/68/base 2025-09-07T09:36:19.3394711Z * [new branch] gh/ZhiweiYan-96/68/head -> origin/gh/ZhiweiYan-96/68/head 2025-09-07T09:36:19.3396604Z * [new branch] gh/ZhiweiYan-96/68/orig -> origin/gh/ZhiweiYan-96/68/orig 2025-09-07T09:36:19.3399375Z * [new branch] gh/aakhundov/1/base -> origin/gh/aakhundov/1/base 2025-09-07T09:36:19.3400958Z * [new branch] gh/aakhundov/1/head -> origin/gh/aakhundov/1/head 2025-09-07T09:36:19.3403002Z * [new branch] gh/aakhundov/2/base -> origin/gh/aakhundov/2/base 2025-09-07T09:36:19.3404570Z * [new branch] gh/aakhundov/2/head -> origin/gh/aakhundov/2/head 2025-09-07T09:36:19.3407093Z * [new branch] gh/aditew01/openblas -> origin/gh/aditew01/openblas 2025-09-07T09:36:19.3408632Z * [new branch] gh/aditew01/sbgemm -> origin/gh/aditew01/sbgemm 2025-09-07T09:36:19.3410330Z * [new branch] gh/aditew01/vecbf16 -> origin/gh/aditew01/vecbf16 2025-09-07T09:36:19.3412648Z * [new branch] gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init 2025-09-07T09:36:19.3415449Z * [new branch] gh/alexsamardzic/9/base -> origin/gh/alexsamardzic/9/base 2025-09-07T09:36:19.3417075Z * [new branch] gh/alexsamardzic/9/head -> origin/gh/alexsamardzic/9/head 2025-09-07T09:36:19.3418690Z * [new branch] gh/alexsamardzic/9/orig -> origin/gh/alexsamardzic/9/orig 2025-09-07T09:36:19.3421320Z * [new branch] gh/amjames/18/base -> origin/gh/amjames/18/base 2025-09-07T09:36:19.3423014Z * [new branch] gh/amjames/18/head -> origin/gh/amjames/18/head 2025-09-07T09:36:19.3424567Z * [new branch] gh/amjames/18/orig -> origin/gh/amjames/18/orig 2025-09-07T09:36:19.3427861Z * [new branch] gh/andrewor14/35/base -> origin/gh/andrewor14/35/base 2025-09-07T09:36:19.3429692Z * [new branch] gh/andrewor14/35/head -> origin/gh/andrewor14/35/head 2025-09-07T09:36:19.3431189Z * [new branch] gh/andrewor14/35/orig -> origin/gh/andrewor14/35/orig 2025-09-07T09:36:19.3433447Z * [new branch] gh/andrewor14/50/base -> origin/gh/andrewor14/50/base 2025-09-07T09:36:19.3435292Z * [new branch] gh/andrewor14/50/head -> origin/gh/andrewor14/50/head 2025-09-07T09:36:19.3437003Z * [new branch] gh/andrewor14/50/orig -> origin/gh/andrewor14/50/orig 2025-09-07T09:36:19.3439176Z * [new branch] gh/andrewor14/51/base -> origin/gh/andrewor14/51/base 2025-09-07T09:36:19.3440746Z * [new branch] gh/andrewor14/51/orig -> origin/gh/andrewor14/51/orig 2025-09-07T09:36:19.3443497Z * [new branch] gh/andyanwang/1/base -> origin/gh/andyanwang/1/base 2025-09-07T09:36:19.3445113Z * [new branch] gh/andyanwang/1/head -> origin/gh/andyanwang/1/head 2025-09-07T09:36:19.3446921Z * [new branch] gh/andyanwang/1/orig -> origin/gh/andyanwang/1/orig 2025-09-07T09:36:19.3449217Z * [new branch] gh/andyanwang/13/base -> origin/gh/andyanwang/13/base 2025-09-07T09:36:19.3450859Z * [new branch] gh/andyanwang/13/head -> origin/gh/andyanwang/13/head 2025-09-07T09:36:19.3452910Z * [new branch] gh/andyanwang/13/orig -> origin/gh/andyanwang/13/orig 2025-09-07T09:36:19.3455268Z * [new branch] gh/andyanwang/2/base -> origin/gh/andyanwang/2/base 2025-09-07T09:36:19.3456932Z * [new branch] gh/andyanwang/2/head -> origin/gh/andyanwang/2/head 2025-09-07T09:36:19.3458423Z * [new branch] gh/andyanwang/2/orig -> origin/gh/andyanwang/2/orig 2025-09-07T09:36:19.3460781Z * [new branch] gh/andyanwang/28/base -> origin/gh/andyanwang/28/base 2025-09-07T09:36:19.3462448Z * [new branch] gh/andyanwang/28/head -> origin/gh/andyanwang/28/head 2025-09-07T09:36:19.3463973Z * [new branch] gh/andyanwang/28/orig -> origin/gh/andyanwang/28/orig 2025-09-07T09:36:19.3466374Z * [new branch] gh/andyanwang/3/base -> origin/gh/andyanwang/3/base 2025-09-07T09:36:19.3467985Z * [new branch] gh/andyanwang/3/head -> origin/gh/andyanwang/3/head 2025-09-07T09:36:19.3469574Z * [new branch] gh/andyanwang/3/orig -> origin/gh/andyanwang/3/orig 2025-09-07T09:36:19.3471798Z * [new branch] gh/andyanwang/30/base -> origin/gh/andyanwang/30/base 2025-09-07T09:36:19.3473566Z * [new branch] gh/andyanwang/30/orig -> origin/gh/andyanwang/30/orig 2025-09-07T09:36:19.3475917Z * [new branch] gh/andyanwang/31/base -> origin/gh/andyanwang/31/base 2025-09-07T09:36:19.3477602Z * [new branch] gh/andyanwang/31/orig -> origin/gh/andyanwang/31/orig 2025-09-07T09:36:19.3480141Z * [new branch] gh/andyanwang/32/base -> origin/gh/andyanwang/32/base 2025-09-07T09:36:19.3481700Z * [new branch] gh/andyanwang/32/head -> origin/gh/andyanwang/32/head 2025-09-07T09:36:19.3483367Z * [new branch] gh/andyanwang/32/orig -> origin/gh/andyanwang/32/orig 2025-09-07T09:36:19.3485943Z * [new branch] gh/andyanwang/39/base -> origin/gh/andyanwang/39/base 2025-09-07T09:36:19.3487610Z * [new branch] gh/andyanwang/39/head -> origin/gh/andyanwang/39/head 2025-09-07T09:36:19.3489210Z * [new branch] gh/andyanwang/39/orig -> origin/gh/andyanwang/39/orig 2025-09-07T09:36:19.3491458Z * [new branch] gh/andyanwang/4/base -> origin/gh/andyanwang/4/base 2025-09-07T09:36:19.3492918Z * [new branch] gh/andyanwang/4/head -> origin/gh/andyanwang/4/head 2025-09-07T09:36:19.3494616Z * [new branch] gh/andyanwang/4/orig -> origin/gh/andyanwang/4/orig 2025-09-07T09:36:19.3497766Z * [new branch] gh/angelayi/107/base -> origin/gh/angelayi/107/base 2025-09-07T09:36:19.3499183Z * [new branch] gh/angelayi/107/head -> origin/gh/angelayi/107/head 2025-09-07T09:36:19.3501354Z * [new branch] gh/angelayi/111/base -> origin/gh/angelayi/111/base 2025-09-07T09:36:19.3503036Z * [new branch] gh/angelayi/111/head -> origin/gh/angelayi/111/head 2025-09-07T09:36:19.3504635Z * [new branch] gh/angelayi/111/orig -> origin/gh/angelayi/111/orig 2025-09-07T09:36:19.3507141Z * [new branch] gh/angelayi/112/base -> origin/gh/angelayi/112/base 2025-09-07T09:36:19.3509001Z * [new branch] gh/angelayi/112/head -> origin/gh/angelayi/112/head 2025-09-07T09:36:19.3510625Z * [new branch] gh/angelayi/112/orig -> origin/gh/angelayi/112/orig 2025-09-07T09:36:19.3512847Z * [new branch] gh/angelayi/113/base -> origin/gh/angelayi/113/base 2025-09-07T09:36:19.3514327Z * [new branch] gh/angelayi/113/head -> origin/gh/angelayi/113/head 2025-09-07T09:36:19.3516360Z * [new branch] gh/angelayi/113/orig -> origin/gh/angelayi/113/orig 2025-09-07T09:36:19.3518443Z * [new branch] gh/angelayi/114/base -> origin/gh/angelayi/114/base 2025-09-07T09:36:19.3519926Z * [new branch] gh/angelayi/114/head -> origin/gh/angelayi/114/head 2025-09-07T09:36:19.3521606Z * [new branch] gh/angelayi/114/orig -> origin/gh/angelayi/114/orig 2025-09-07T09:36:19.3523837Z * [new branch] gh/angelayi/115/base -> origin/gh/angelayi/115/base 2025-09-07T09:36:19.3525546Z * [new branch] gh/angelayi/115/head -> origin/gh/angelayi/115/head 2025-09-07T09:36:19.3527232Z * [new branch] gh/angelayi/115/orig -> origin/gh/angelayi/115/orig 2025-09-07T09:36:19.3530015Z * [new branch] gh/anijain2305/753/base -> origin/gh/anijain2305/753/base 2025-09-07T09:36:19.3531532Z * [new branch] gh/anijain2305/753/head -> origin/gh/anijain2305/753/head 2025-09-07T09:36:19.3533158Z * [new branch] gh/anijain2305/753/orig -> origin/gh/anijain2305/753/orig 2025-09-07T09:36:19.3535552Z * [new branch] gh/anijain2305/766/base -> origin/gh/anijain2305/766/base 2025-09-07T09:36:19.3537397Z * [new branch] gh/anijain2305/766/head -> origin/gh/anijain2305/766/head 2025-09-07T09:36:19.3538945Z * [new branch] gh/anijain2305/766/orig -> origin/gh/anijain2305/766/orig 2025-09-07T09:36:19.3541192Z * [new branch] gh/anijain2305/790/base -> origin/gh/anijain2305/790/base 2025-09-07T09:36:19.3542983Z * [new branch] gh/anijain2305/790/head -> origin/gh/anijain2305/790/head 2025-09-07T09:36:19.3544506Z * [new branch] gh/anijain2305/790/orig -> origin/gh/anijain2305/790/orig 2025-09-07T09:36:19.3547000Z * [new branch] gh/anijain2305/792/base -> origin/gh/anijain2305/792/base 2025-09-07T09:36:19.3548555Z * [new branch] gh/anijain2305/792/head -> origin/gh/anijain2305/792/head 2025-09-07T09:36:19.3550164Z * [new branch] gh/anijain2305/792/orig -> origin/gh/anijain2305/792/orig 2025-09-07T09:36:19.3552543Z * [new branch] gh/anijain2305/803/base -> origin/gh/anijain2305/803/base 2025-09-07T09:36:19.3553863Z * [new branch] gh/anijain2305/803/head -> origin/gh/anijain2305/803/head 2025-09-07T09:36:19.3555635Z * [new branch] gh/anijain2305/803/orig -> origin/gh/anijain2305/803/orig 2025-09-07T09:36:19.3557847Z * [new branch] gh/anijain2305/804/base -> origin/gh/anijain2305/804/base 2025-09-07T09:36:19.3559414Z * [new branch] gh/anijain2305/804/head -> origin/gh/anijain2305/804/head 2025-09-07T09:36:19.3561243Z * [new branch] gh/anijain2305/804/orig -> origin/gh/anijain2305/804/orig 2025-09-07T09:36:19.3563338Z * [new branch] gh/anijain2305/805/base -> origin/gh/anijain2305/805/base 2025-09-07T09:36:19.3564867Z * [new branch] gh/anijain2305/805/head -> origin/gh/anijain2305/805/head 2025-09-07T09:36:19.3566703Z * [new branch] gh/anijain2305/805/orig -> origin/gh/anijain2305/805/orig 2025-09-07T09:36:19.3568967Z * [new branch] gh/anijain2305/810/base -> origin/gh/anijain2305/810/base 2025-09-07T09:36:19.3573140Z * [new branch] gh/anijain2305/810/head -> origin/gh/anijain2305/810/head 2025-09-07T09:36:19.3574790Z * [new branch] gh/anijain2305/810/orig -> origin/gh/anijain2305/810/orig 2025-09-07T09:36:19.3577292Z * [new branch] gh/anijain2305/812/base -> origin/gh/anijain2305/812/base 2025-09-07T09:36:19.3578891Z * [new branch] gh/anijain2305/812/head -> origin/gh/anijain2305/812/head 2025-09-07T09:36:19.3580499Z * [new branch] gh/anijain2305/812/orig -> origin/gh/anijain2305/812/orig 2025-09-07T09:36:19.3582864Z * [new branch] gh/anijain2305/838/base -> origin/gh/anijain2305/838/base 2025-09-07T09:36:19.3584395Z * [new branch] gh/anijain2305/838/head -> origin/gh/anijain2305/838/head 2025-09-07T09:36:19.3586233Z * [new branch] gh/anijain2305/838/orig -> origin/gh/anijain2305/838/orig 2025-09-07T09:36:19.3588453Z * [new branch] gh/anijain2305/839/base -> origin/gh/anijain2305/839/base 2025-09-07T09:36:19.3589994Z * [new branch] gh/anijain2305/839/head -> origin/gh/anijain2305/839/head 2025-09-07T09:36:19.3591484Z * [new branch] gh/anijain2305/839/orig -> origin/gh/anijain2305/839/orig 2025-09-07T09:36:19.3593696Z * [new branch] gh/anijain2305/843/base -> origin/gh/anijain2305/843/base 2025-09-07T09:36:19.3595495Z * [new branch] gh/anijain2305/843/head -> origin/gh/anijain2305/843/head 2025-09-07T09:36:19.3597178Z * [new branch] gh/anijain2305/843/orig -> origin/gh/anijain2305/843/orig 2025-09-07T09:36:19.3599389Z * [new branch] gh/anijain2305/844/base -> origin/gh/anijain2305/844/base 2025-09-07T09:36:19.3601015Z * [new branch] gh/anijain2305/844/head -> origin/gh/anijain2305/844/head 2025-09-07T09:36:19.3602537Z * [new branch] gh/anijain2305/844/orig -> origin/gh/anijain2305/844/orig 2025-09-07T09:36:19.3605050Z * [new branch] gh/anijain2305/846/base -> origin/gh/anijain2305/846/base 2025-09-07T09:36:19.3606840Z * [new branch] gh/anijain2305/846/head -> origin/gh/anijain2305/846/head 2025-09-07T09:36:19.3608387Z * [new branch] gh/anijain2305/846/orig -> origin/gh/anijain2305/846/orig 2025-09-07T09:36:19.3610633Z * [new branch] gh/anijain2305/848/base -> origin/gh/anijain2305/848/base 2025-09-07T09:36:19.3612294Z * [new branch] gh/anijain2305/848/head -> origin/gh/anijain2305/848/head 2025-09-07T09:36:19.3614238Z * [new branch] gh/anijain2305/848/orig -> origin/gh/anijain2305/848/orig 2025-09-07T09:36:19.3616570Z * [new branch] gh/anijain2305/849/base -> origin/gh/anijain2305/849/base 2025-09-07T09:36:19.3618085Z * [new branch] gh/anijain2305/849/head -> origin/gh/anijain2305/849/head 2025-09-07T09:36:19.3619668Z * [new branch] gh/anijain2305/849/orig -> origin/gh/anijain2305/849/orig 2025-09-07T09:36:19.3622133Z * [new branch] gh/anijain2305/850/base -> origin/gh/anijain2305/850/base 2025-09-07T09:36:19.3623682Z * [new branch] gh/anijain2305/850/head -> origin/gh/anijain2305/850/head 2025-09-07T09:36:19.3625437Z * [new branch] gh/anijain2305/850/orig -> origin/gh/anijain2305/850/orig 2025-09-07T09:36:19.3627699Z * [new branch] gh/anijain2305/851/base -> origin/gh/anijain2305/851/base 2025-09-07T09:36:19.3629458Z * [new branch] gh/anijain2305/851/head -> origin/gh/anijain2305/851/head 2025-09-07T09:36:19.3630877Z * [new branch] gh/anijain2305/851/orig -> origin/gh/anijain2305/851/orig 2025-09-07T09:36:19.3633183Z * [new branch] gh/anijain2305/852/base -> origin/gh/anijain2305/852/base 2025-09-07T09:36:19.3634752Z * [new branch] gh/anijain2305/852/head -> origin/gh/anijain2305/852/head 2025-09-07T09:36:19.3636588Z * [new branch] gh/anijain2305/852/orig -> origin/gh/anijain2305/852/orig 2025-09-07T09:36:19.3638836Z * [new branch] gh/anijain2305/853/base -> origin/gh/anijain2305/853/base 2025-09-07T09:36:19.3640271Z * [new branch] gh/anijain2305/853/head -> origin/gh/anijain2305/853/head 2025-09-07T09:36:19.3641769Z * [new branch] gh/anijain2305/853/orig -> origin/gh/anijain2305/853/orig 2025-09-07T09:36:19.3644056Z * [new branch] gh/anijain2305/854/base -> origin/gh/anijain2305/854/base 2025-09-07T09:36:19.3645963Z * [new branch] gh/anijain2305/854/head -> origin/gh/anijain2305/854/head 2025-09-07T09:36:19.3647525Z * [new branch] gh/anijain2305/854/orig -> origin/gh/anijain2305/854/orig 2025-09-07T09:36:19.3649815Z * [new branch] gh/anijain2305/855/base -> origin/gh/anijain2305/855/base 2025-09-07T09:36:19.3651382Z * [new branch] gh/anijain2305/855/head -> origin/gh/anijain2305/855/head 2025-09-07T09:36:19.3652920Z * [new branch] gh/anijain2305/855/orig -> origin/gh/anijain2305/855/orig 2025-09-07T09:36:19.3655291Z * [new branch] gh/anijain2305/856/base -> origin/gh/anijain2305/856/base 2025-09-07T09:36:19.3657005Z * [new branch] gh/anijain2305/856/head -> origin/gh/anijain2305/856/head 2025-09-07T09:36:19.3658545Z * [new branch] gh/anijain2305/856/orig -> origin/gh/anijain2305/856/orig 2025-09-07T09:36:19.3660747Z * [new branch] gh/anijain2305/857/base -> origin/gh/anijain2305/857/base 2025-09-07T09:36:19.3662462Z * [new branch] gh/anijain2305/857/head -> origin/gh/anijain2305/857/head 2025-09-07T09:36:19.3663990Z * [new branch] gh/anijain2305/857/orig -> origin/gh/anijain2305/857/orig 2025-09-07T09:36:19.3666526Z * [new branch] gh/anijain2305/858/base -> origin/gh/anijain2305/858/base 2025-09-07T09:36:19.3668040Z * [new branch] gh/anijain2305/858/head -> origin/gh/anijain2305/858/head 2025-09-07T09:36:19.3669553Z * [new branch] gh/anijain2305/858/orig -> origin/gh/anijain2305/858/orig 2025-09-07T09:36:19.3671779Z * [new branch] gh/anijain2305/859/base -> origin/gh/anijain2305/859/base 2025-09-07T09:36:19.3673293Z * [new branch] gh/anijain2305/859/head -> origin/gh/anijain2305/859/head 2025-09-07T09:36:19.3674846Z * [new branch] gh/anijain2305/859/orig -> origin/gh/anijain2305/859/orig 2025-09-07T09:36:19.3677506Z * [new branch] gh/anijain2305/860/base -> origin/gh/anijain2305/860/base 2025-09-07T09:36:19.3679028Z * [new branch] gh/anijain2305/860/head -> origin/gh/anijain2305/860/head 2025-09-07T09:36:19.3680693Z * [new branch] gh/anijain2305/860/orig -> origin/gh/anijain2305/860/orig 2025-09-07T09:36:19.3682825Z * [new branch] gh/anijain2305/861/base -> origin/gh/anijain2305/861/base 2025-09-07T09:36:19.3684296Z * [new branch] gh/anijain2305/861/head -> origin/gh/anijain2305/861/head 2025-09-07T09:36:19.3686125Z * [new branch] gh/anijain2305/861/orig -> origin/gh/anijain2305/861/orig 2025-09-07T09:36:19.3688358Z * [new branch] gh/anijain2305/862/base -> origin/gh/anijain2305/862/base 2025-09-07T09:36:19.3689995Z * [new branch] gh/anijain2305/862/head -> origin/gh/anijain2305/862/head 2025-09-07T09:36:19.3691707Z * [new branch] gh/anijain2305/862/orig -> origin/gh/anijain2305/862/orig 2025-09-07T09:36:19.3693917Z * [new branch] gh/anijain2305/863/base -> origin/gh/anijain2305/863/base 2025-09-07T09:36:19.3695765Z * [new branch] gh/anijain2305/863/head -> origin/gh/anijain2305/863/head 2025-09-07T09:36:19.3697561Z * [new branch] gh/anijain2305/863/orig -> origin/gh/anijain2305/863/orig 2025-09-07T09:36:19.3699831Z * [new branch] gh/anijain2305/864/base -> origin/gh/anijain2305/864/base 2025-09-07T09:36:19.3701348Z * [new branch] gh/anijain2305/864/head -> origin/gh/anijain2305/864/head 2025-09-07T09:36:19.3703057Z * [new branch] gh/anijain2305/864/orig -> origin/gh/anijain2305/864/orig 2025-09-07T09:36:19.3705519Z * [new branch] gh/anijain2305/865/base -> origin/gh/anijain2305/865/base 2025-09-07T09:36:19.3707181Z * [new branch] gh/anijain2305/865/head -> origin/gh/anijain2305/865/head 2025-09-07T09:36:19.3708742Z * [new branch] gh/anijain2305/865/orig -> origin/gh/anijain2305/865/orig 2025-09-07T09:36:19.3710962Z * [new branch] gh/anijain2305/866/base -> origin/gh/anijain2305/866/base 2025-09-07T09:36:19.3712525Z * [new branch] gh/anijain2305/866/head -> origin/gh/anijain2305/866/head 2025-09-07T09:36:19.3714161Z * [new branch] gh/anijain2305/866/orig -> origin/gh/anijain2305/866/orig 2025-09-07T09:36:19.3717252Z * [new branch] gh/anjali411/216/base -> origin/gh/anjali411/216/base 2025-09-07T09:36:19.3718871Z * [new branch] gh/anjali411/216/head -> origin/gh/anjali411/216/head 2025-09-07T09:36:19.3720293Z * [new branch] gh/anjali411/216/orig -> origin/gh/anjali411/216/orig 2025-09-07T09:36:19.3722964Z * [new branch] gh/ankitageorge/13/base -> origin/gh/ankitageorge/13/base 2025-09-07T09:36:19.3724611Z * [new branch] gh/ankitageorge/13/head -> origin/gh/ankitageorge/13/head 2025-09-07T09:36:19.3726477Z * [new branch] gh/ankitageorge/13/orig -> origin/gh/ankitageorge/13/orig 2025-09-07T09:36:19.3728800Z * [new branch] gh/ankitageorge/14/base -> origin/gh/ankitageorge/14/base 2025-09-07T09:36:19.3730391Z * [new branch] gh/ankitageorge/14/head -> origin/gh/ankitageorge/14/head 2025-09-07T09:36:19.3732160Z * [new branch] gh/ankitageorge/14/orig -> origin/gh/ankitageorge/14/orig 2025-09-07T09:36:19.3734426Z * [new branch] gh/ankitageorge/15/base -> origin/gh/ankitageorge/15/base 2025-09-07T09:36:19.3736439Z * [new branch] gh/ankitageorge/15/head -> origin/gh/ankitageorge/15/head 2025-09-07T09:36:19.3737873Z * [new branch] gh/ankitageorge/15/orig -> origin/gh/ankitageorge/15/orig 2025-09-07T09:36:19.3740253Z * [new branch] gh/ankitageorge/16/base -> origin/gh/ankitageorge/16/base 2025-09-07T09:36:19.3741889Z * [new branch] gh/ankitageorge/16/head -> origin/gh/ankitageorge/16/head 2025-09-07T09:36:19.3743487Z * [new branch] gh/ankitageorge/16/orig -> origin/gh/ankitageorge/16/orig 2025-09-07T09:36:19.3746103Z * [new branch] gh/ankitageorge/17/base -> origin/gh/ankitageorge/17/base 2025-09-07T09:36:19.3747653Z * [new branch] gh/ankitageorge/17/head -> origin/gh/ankitageorge/17/head 2025-09-07T09:36:19.3749235Z * [new branch] gh/ankitageorge/17/orig -> origin/gh/ankitageorge/17/orig 2025-09-07T09:36:19.3751572Z * [new branch] gh/ankitageorge/21/base -> origin/gh/ankitageorge/21/base 2025-09-07T09:36:19.3753113Z * [new branch] gh/ankitageorge/21/head -> origin/gh/ankitageorge/21/head 2025-09-07T09:36:19.3754680Z * [new branch] gh/ankitageorge/21/orig -> origin/gh/ankitageorge/21/orig 2025-09-07T09:36:19.3758064Z * [new branch] gh/anshul-si/1/base -> origin/gh/anshul-si/1/base 2025-09-07T09:36:19.3759422Z * [new branch] gh/anshul-si/1/head -> origin/gh/anshul-si/1/head 2025-09-07T09:36:19.3761704Z * [new branch] gh/anshul-si/15/base -> origin/gh/anshul-si/15/base 2025-09-07T09:36:19.3763235Z * [new branch] gh/anshul-si/15/head -> origin/gh/anshul-si/15/head 2025-09-07T09:36:19.3764743Z * [new branch] gh/anshul-si/15/orig -> origin/gh/anshul-si/15/orig 2025-09-07T09:36:19.3767312Z * [new branch] gh/anshul-si/16/base -> origin/gh/anshul-si/16/base 2025-09-07T09:36:19.3768877Z * [new branch] gh/anshul-si/16/head -> origin/gh/anshul-si/16/head 2025-09-07T09:36:19.3770606Z * [new branch] gh/anshul-si/16/orig -> origin/gh/anshul-si/16/orig 2025-09-07T09:36:19.3790243Z * [new branch] gh/anshul-si/17/base -> origin/gh/anshul-si/17/base 2025-09-07T09:36:19.3790933Z * [new branch] gh/anshul-si/17/head -> origin/gh/anshul-si/17/head 2025-09-07T09:36:19.3791492Z * [new branch] gh/anshul-si/17/orig -> origin/gh/anshul-si/17/orig 2025-09-07T09:36:19.3791933Z * [new branch] gh/anshul-si/18/base -> origin/gh/anshul-si/18/base 2025-09-07T09:36:19.3792320Z * [new branch] gh/anshul-si/18/head -> origin/gh/anshul-si/18/head 2025-09-07T09:36:19.3792946Z * [new branch] gh/anshul-si/18/orig -> origin/gh/anshul-si/18/orig 2025-09-07T09:36:19.3793380Z * [new branch] gh/anshul-si/19/base -> origin/gh/anshul-si/19/base 2025-09-07T09:36:19.3793780Z * [new branch] gh/anshul-si/19/head -> origin/gh/anshul-si/19/head 2025-09-07T09:36:19.3794162Z * [new branch] gh/anshul-si/19/orig -> origin/gh/anshul-si/19/orig 2025-09-07T09:36:19.3794551Z * [new branch] gh/anshul-si/2/base -> origin/gh/anshul-si/2/base 2025-09-07T09:36:19.3795096Z * [new branch] gh/anshul-si/2/head -> origin/gh/anshul-si/2/head 2025-09-07T09:36:19.3795496Z * [new branch] gh/anshul-si/20/base -> origin/gh/anshul-si/20/base 2025-09-07T09:36:19.3796086Z * [new branch] gh/anshul-si/20/head -> origin/gh/anshul-si/20/head 2025-09-07T09:36:19.3797573Z * [new branch] gh/anshul-si/20/orig -> origin/gh/anshul-si/20/orig 2025-09-07T09:36:19.3799712Z * [new branch] gh/anshul-si/21/base -> origin/gh/anshul-si/21/base 2025-09-07T09:36:19.3801207Z * [new branch] gh/anshul-si/21/head -> origin/gh/anshul-si/21/head 2025-09-07T09:36:19.3802845Z * [new branch] gh/anshul-si/21/orig -> origin/gh/anshul-si/21/orig 2025-09-07T09:36:19.3805265Z * [new branch] gh/anshul-si/22/base -> origin/gh/anshul-si/22/base 2025-09-07T09:36:19.3807047Z * [new branch] gh/anshul-si/22/head -> origin/gh/anshul-si/22/head 2025-09-07T09:36:19.3808546Z * [new branch] gh/anshul-si/22/orig -> origin/gh/anshul-si/22/orig 2025-09-07T09:36:19.3810637Z * [new branch] gh/anshul-si/23/base -> origin/gh/anshul-si/23/base 2025-09-07T09:36:19.3812298Z * [new branch] gh/anshul-si/23/head -> origin/gh/anshul-si/23/head 2025-09-07T09:36:19.3813792Z * [new branch] gh/anshul-si/23/orig -> origin/gh/anshul-si/23/orig 2025-09-07T09:36:19.3816269Z * [new branch] gh/anshul-si/24/base -> origin/gh/anshul-si/24/base 2025-09-07T09:36:19.3817998Z * [new branch] gh/anshul-si/24/head -> origin/gh/anshul-si/24/head 2025-09-07T09:36:19.3819504Z * [new branch] gh/anshul-si/24/orig -> origin/gh/anshul-si/24/orig 2025-09-07T09:36:19.3821830Z * [new branch] gh/anshul-si/25/base -> origin/gh/anshul-si/25/base 2025-09-07T09:36:19.3823700Z * [new branch] gh/anshul-si/25/head -> origin/gh/anshul-si/25/head 2025-09-07T09:36:19.3825233Z * [new branch] gh/anshul-si/25/orig -> origin/gh/anshul-si/25/orig 2025-09-07T09:36:19.3827636Z * [new branch] gh/anshul-si/26/base -> origin/gh/anshul-si/26/base 2025-09-07T09:36:19.3829200Z * [new branch] gh/anshul-si/26/head -> origin/gh/anshul-si/26/head 2025-09-07T09:36:19.3830740Z * [new branch] gh/anshul-si/26/orig -> origin/gh/anshul-si/26/orig 2025-09-07T09:36:19.3833043Z * [new branch] gh/anshul-si/27/base -> origin/gh/anshul-si/27/base 2025-09-07T09:36:19.3834618Z * [new branch] gh/anshul-si/27/head -> origin/gh/anshul-si/27/head 2025-09-07T09:36:19.3836519Z * [new branch] gh/anshul-si/27/orig -> origin/gh/anshul-si/27/orig 2025-09-07T09:36:19.3838573Z * [new branch] gh/anshul-si/28/base -> origin/gh/anshul-si/28/base 2025-09-07T09:36:19.3840157Z * [new branch] gh/anshul-si/28/head -> origin/gh/anshul-si/28/head 2025-09-07T09:36:19.3841678Z * [new branch] gh/anshul-si/28/orig -> origin/gh/anshul-si/28/orig 2025-09-07T09:36:19.3843851Z * [new branch] gh/anshul-si/29/base -> origin/gh/anshul-si/29/base 2025-09-07T09:36:19.3845842Z * [new branch] gh/anshul-si/29/head -> origin/gh/anshul-si/29/head 2025-09-07T09:36:19.3847405Z * [new branch] gh/anshul-si/29/orig -> origin/gh/anshul-si/29/orig 2025-09-07T09:36:19.3849486Z * [new branch] gh/anshul-si/3/base -> origin/gh/anshul-si/3/base 2025-09-07T09:36:19.3851058Z * [new branch] gh/anshul-si/3/head -> origin/gh/anshul-si/3/head 2025-09-07T09:36:19.3853102Z * [new branch] gh/anshul-si/4/base -> origin/gh/anshul-si/4/base 2025-09-07T09:36:19.3854537Z * [new branch] gh/anshul-si/4/head -> origin/gh/anshul-si/4/head 2025-09-07T09:36:19.3856947Z * [new branch] gh/anshul-si/5/base -> origin/gh/anshul-si/5/base 2025-09-07T09:36:19.3858479Z * [new branch] gh/anshul-si/5/head -> origin/gh/anshul-si/5/head 2025-09-07T09:36:19.3861275Z * [new branch] gh/aorenste/132/base -> origin/gh/aorenste/132/base 2025-09-07T09:36:19.3863006Z * [new branch] gh/aorenste/132/head -> origin/gh/aorenste/132/head 2025-09-07T09:36:19.3865999Z * [new branch] gh/bdhirsh/650/base -> origin/gh/bdhirsh/650/base 2025-09-07T09:36:19.3867763Z * [new branch] gh/bdhirsh/650/head -> origin/gh/bdhirsh/650/head 2025-09-07T09:36:19.3869448Z * [new branch] gh/bdhirsh/650/orig -> origin/gh/bdhirsh/650/orig 2025-09-07T09:36:19.3871629Z * [new branch] gh/bdhirsh/663/base -> origin/gh/bdhirsh/663/base 2025-09-07T09:36:19.3873179Z * [new branch] gh/bdhirsh/663/head -> origin/gh/bdhirsh/663/head 2025-09-07T09:36:19.3874733Z * [new branch] gh/bdhirsh/663/orig -> origin/gh/bdhirsh/663/orig 2025-09-07T09:36:19.3877441Z * [new branch] gh/bdhirsh/665/base -> origin/gh/bdhirsh/665/base 2025-09-07T09:36:19.3878971Z * [new branch] gh/bdhirsh/665/head -> origin/gh/bdhirsh/665/head 2025-09-07T09:36:19.3880472Z * [new branch] gh/bdhirsh/665/orig -> origin/gh/bdhirsh/665/orig 2025-09-07T09:36:19.3882941Z * [new branch] gh/bdhirsh/666/base -> origin/gh/bdhirsh/666/base 2025-09-07T09:36:19.3884616Z * [new branch] gh/bdhirsh/666/head -> origin/gh/bdhirsh/666/head 2025-09-07T09:36:19.3886439Z * [new branch] gh/bdhirsh/666/orig -> origin/gh/bdhirsh/666/orig 2025-09-07T09:36:19.3889063Z * [new branch] gh/bdhirsh/667/base -> origin/gh/bdhirsh/667/base 2025-09-07T09:36:19.3890816Z * [new branch] gh/bdhirsh/667/head -> origin/gh/bdhirsh/667/head 2025-09-07T09:36:19.3892170Z * [new branch] gh/bdhirsh/667/orig -> origin/gh/bdhirsh/667/orig 2025-09-07T09:36:19.3894361Z * [new branch] gh/bdhirsh/668/base -> origin/gh/bdhirsh/668/base 2025-09-07T09:36:19.3896172Z * [new branch] gh/bdhirsh/668/head -> origin/gh/bdhirsh/668/head 2025-09-07T09:36:19.3897759Z * [new branch] gh/bdhirsh/668/orig -> origin/gh/bdhirsh/668/orig 2025-09-07T09:36:19.3900104Z * [new branch] gh/bdhirsh/669/base -> origin/gh/bdhirsh/669/base 2025-09-07T09:36:19.3901680Z * [new branch] gh/bdhirsh/669/head -> origin/gh/bdhirsh/669/head 2025-09-07T09:36:19.3903282Z * [new branch] gh/bdhirsh/669/orig -> origin/gh/bdhirsh/669/orig 2025-09-07T09:36:19.3905906Z * [new branch] gh/bdhirsh/670/base -> origin/gh/bdhirsh/670/base 2025-09-07T09:36:19.3907609Z * [new branch] gh/bdhirsh/670/head -> origin/gh/bdhirsh/670/head 2025-09-07T09:36:19.3909168Z * [new branch] gh/bdhirsh/670/orig -> origin/gh/bdhirsh/670/orig 2025-09-07T09:36:19.3911964Z * [new branch] gh/benjaminglass1/100/base -> origin/gh/benjaminglass1/100/base 2025-09-07T09:36:19.3913446Z * [new branch] gh/benjaminglass1/100/head -> origin/gh/benjaminglass1/100/head 2025-09-07T09:36:19.3915369Z * [new branch] gh/benjaminglass1/100/orig -> origin/gh/benjaminglass1/100/orig 2025-09-07T09:36:19.3917696Z * [new branch] gh/benjaminglass1/101/base -> origin/gh/benjaminglass1/101/base 2025-09-07T09:36:19.3919255Z * [new branch] gh/benjaminglass1/101/head -> origin/gh/benjaminglass1/101/head 2025-09-07T09:36:19.3920823Z * [new branch] gh/benjaminglass1/101/orig -> origin/gh/benjaminglass1/101/orig 2025-09-07T09:36:19.3922995Z * [new branch] gh/benjaminglass1/102/base -> origin/gh/benjaminglass1/102/base 2025-09-07T09:36:19.3924530Z * [new branch] gh/benjaminglass1/102/head -> origin/gh/benjaminglass1/102/head 2025-09-07T09:36:19.3926537Z * [new branch] gh/benjaminglass1/102/orig -> origin/gh/benjaminglass1/102/orig 2025-09-07T09:36:19.3928672Z * [new branch] gh/benjaminglass1/103/base -> origin/gh/benjaminglass1/103/base 2025-09-07T09:36:19.3930210Z * [new branch] gh/benjaminglass1/103/head -> origin/gh/benjaminglass1/103/head 2025-09-07T09:36:19.3931795Z * [new branch] gh/benjaminglass1/103/orig -> origin/gh/benjaminglass1/103/orig 2025-09-07T09:36:19.3934102Z * [new branch] gh/benjaminglass1/104/base -> origin/gh/benjaminglass1/104/base 2025-09-07T09:36:19.3935965Z * [new branch] gh/benjaminglass1/104/head -> origin/gh/benjaminglass1/104/head 2025-09-07T09:36:19.3937544Z * [new branch] gh/benjaminglass1/104/orig -> origin/gh/benjaminglass1/104/orig 2025-09-07T09:36:19.3939711Z * [new branch] gh/benjaminglass1/105/base -> origin/gh/benjaminglass1/105/base 2025-09-07T09:36:19.3941287Z * [new branch] gh/benjaminglass1/105/head -> origin/gh/benjaminglass1/105/head 2025-09-07T09:36:19.3943084Z * [new branch] gh/benjaminglass1/105/orig -> origin/gh/benjaminglass1/105/orig 2025-09-07T09:36:19.3945738Z * [new branch] gh/benjaminglass1/106/base -> origin/gh/benjaminglass1/106/base 2025-09-07T09:36:19.3947205Z * [new branch] gh/benjaminglass1/106/head -> origin/gh/benjaminglass1/106/head 2025-09-07T09:36:19.3948686Z * [new branch] gh/benjaminglass1/106/orig -> origin/gh/benjaminglass1/106/orig 2025-09-07T09:36:19.3950975Z * [new branch] gh/benjaminglass1/79/base -> origin/gh/benjaminglass1/79/base 2025-09-07T09:36:19.3952587Z * [new branch] gh/benjaminglass1/79/head -> origin/gh/benjaminglass1/79/head 2025-09-07T09:36:19.3954297Z * [new branch] gh/benjaminglass1/79/orig -> origin/gh/benjaminglass1/79/orig 2025-09-07T09:36:19.3956722Z * [new branch] gh/benjaminglass1/86/base -> origin/gh/benjaminglass1/86/base 2025-09-07T09:36:19.3958247Z * [new branch] gh/benjaminglass1/86/head -> origin/gh/benjaminglass1/86/head 2025-09-07T09:36:19.3959945Z * [new branch] gh/benjaminglass1/86/orig -> origin/gh/benjaminglass1/86/orig 2025-09-07T09:36:19.3961977Z * [new branch] gh/benjaminglass1/89/base -> origin/gh/benjaminglass1/89/base 2025-09-07T09:36:19.3963519Z * [new branch] gh/benjaminglass1/89/head -> origin/gh/benjaminglass1/89/head 2025-09-07T09:36:19.3965233Z * [new branch] gh/benjaminglass1/89/orig -> origin/gh/benjaminglass1/89/orig 2025-09-07T09:36:19.3967648Z * [new branch] gh/benjaminglass1/91/base -> origin/gh/benjaminglass1/91/base 2025-09-07T09:36:19.3969097Z * [new branch] gh/benjaminglass1/91/head -> origin/gh/benjaminglass1/91/head 2025-09-07T09:36:19.3970709Z * [new branch] gh/benjaminglass1/91/orig -> origin/gh/benjaminglass1/91/orig 2025-09-07T09:36:19.3973054Z * [new branch] gh/benjaminglass1/93/base -> origin/gh/benjaminglass1/93/base 2025-09-07T09:36:19.3974592Z * [new branch] gh/benjaminglass1/93/head -> origin/gh/benjaminglass1/93/head 2025-09-07T09:36:19.3976491Z * [new branch] gh/benjaminglass1/93/orig -> origin/gh/benjaminglass1/93/orig 2025-09-07T09:36:19.3978684Z * [new branch] gh/benjaminglass1/95/base -> origin/gh/benjaminglass1/95/base 2025-09-07T09:36:19.3980238Z * [new branch] gh/benjaminglass1/95/head -> origin/gh/benjaminglass1/95/head 2025-09-07T09:36:19.3982551Z * [new branch] gh/benjaminglass1/95/orig -> origin/gh/benjaminglass1/95/orig 2025-09-07T09:36:19.3984501Z * [new branch] gh/benjaminglass1/97/base -> origin/gh/benjaminglass1/97/base 2025-09-07T09:36:19.3986471Z * [new branch] gh/benjaminglass1/97/head -> origin/gh/benjaminglass1/97/head 2025-09-07T09:36:19.3989282Z * [new branch] gh/benjaminglass1/97/orig -> origin/gh/benjaminglass1/97/orig 2025-09-07T09:36:19.3990728Z * [new branch] gh/benjaminglass1/99/base -> origin/gh/benjaminglass1/99/base 2025-09-07T09:36:19.3991962Z * [new branch] gh/benjaminglass1/99/head -> origin/gh/benjaminglass1/99/head 2025-09-07T09:36:19.3993603Z * [new branch] gh/benjaminglass1/99/orig -> origin/gh/benjaminglass1/99/orig 2025-09-07T09:36:19.3996614Z * [new branch] gh/bobrenjc93/514/base -> origin/gh/bobrenjc93/514/base 2025-09-07T09:36:19.3998175Z * [new branch] gh/bobrenjc93/514/head -> origin/gh/bobrenjc93/514/head 2025-09-07T09:36:19.3999668Z * [new branch] gh/bobrenjc93/514/orig -> origin/gh/bobrenjc93/514/orig 2025-09-07T09:36:19.4001840Z * [new branch] gh/bobrenjc93/521/base -> origin/gh/bobrenjc93/521/base 2025-09-07T09:36:19.4003389Z * [new branch] gh/bobrenjc93/521/head -> origin/gh/bobrenjc93/521/head 2025-09-07T09:36:19.4005139Z * [new branch] gh/bobrenjc93/521/orig -> origin/gh/bobrenjc93/521/orig 2025-09-07T09:36:19.4007472Z * [new branch] gh/bobrenjc93/522/base -> origin/gh/bobrenjc93/522/base 2025-09-07T09:36:19.4009055Z * [new branch] gh/bobrenjc93/522/head -> origin/gh/bobrenjc93/522/head 2025-09-07T09:36:19.4010537Z * [new branch] gh/bobrenjc93/522/orig -> origin/gh/bobrenjc93/522/orig 2025-09-07T09:36:19.4012703Z * [new branch] gh/bobrenjc93/525/base -> origin/gh/bobrenjc93/525/base 2025-09-07T09:36:19.4014308Z * [new branch] gh/bobrenjc93/525/head -> origin/gh/bobrenjc93/525/head 2025-09-07T09:36:19.4016161Z * [new branch] gh/bobrenjc93/525/orig -> origin/gh/bobrenjc93/525/orig 2025-09-07T09:36:19.4018521Z * [new branch] gh/bobrenjc93/526/base -> origin/gh/bobrenjc93/526/base 2025-09-07T09:36:19.4019866Z * [new branch] gh/bobrenjc93/526/head -> origin/gh/bobrenjc93/526/head 2025-09-07T09:36:19.4021373Z * [new branch] gh/bobrenjc93/526/orig -> origin/gh/bobrenjc93/526/orig 2025-09-07T09:36:19.4023719Z * [new branch] gh/bobrenjc93/527/base -> origin/gh/bobrenjc93/527/base 2025-09-07T09:36:19.4025392Z * [new branch] gh/bobrenjc93/527/head -> origin/gh/bobrenjc93/527/head 2025-09-07T09:36:19.4027138Z * [new branch] gh/bobrenjc93/527/orig -> origin/gh/bobrenjc93/527/orig 2025-09-07T09:36:19.4029361Z * [new branch] gh/bobrenjc93/528/base -> origin/gh/bobrenjc93/528/base 2025-09-07T09:36:19.4030904Z * [new branch] gh/bobrenjc93/528/head -> origin/gh/bobrenjc93/528/head 2025-09-07T09:36:19.4032471Z * [new branch] gh/bobrenjc93/528/orig -> origin/gh/bobrenjc93/528/orig 2025-09-07T09:36:19.4034616Z * [new branch] gh/bobrenjc93/529/base -> origin/gh/bobrenjc93/529/base 2025-09-07T09:36:19.4036475Z * [new branch] gh/bobrenjc93/529/head -> origin/gh/bobrenjc93/529/head 2025-09-07T09:36:19.4037943Z * [new branch] gh/bobrenjc93/529/orig -> origin/gh/bobrenjc93/529/orig 2025-09-07T09:36:19.4040132Z * [new branch] gh/bobrenjc93/535/base -> origin/gh/bobrenjc93/535/base 2025-09-07T09:36:19.4041690Z * [new branch] gh/bobrenjc93/535/head -> origin/gh/bobrenjc93/535/head 2025-09-07T09:36:19.4043199Z * [new branch] gh/bobrenjc93/535/orig -> origin/gh/bobrenjc93/535/orig 2025-09-07T09:36:19.4045953Z * [new branch] gh/bobrenjc93/537/base -> origin/gh/bobrenjc93/537/base 2025-09-07T09:36:19.4047591Z * [new branch] gh/bobrenjc93/537/head -> origin/gh/bobrenjc93/537/head 2025-09-07T09:36:19.4049106Z * [new branch] gh/bobrenjc93/537/orig -> origin/gh/bobrenjc93/537/orig 2025-09-07T09:36:19.4051569Z * [new branch] gh/bobrenjc93/539/base -> origin/gh/bobrenjc93/539/base 2025-09-07T09:36:19.4053194Z * [new branch] gh/bobrenjc93/539/head -> origin/gh/bobrenjc93/539/head 2025-09-07T09:36:19.4054847Z * [new branch] gh/bobrenjc93/539/orig -> origin/gh/bobrenjc93/539/orig 2025-09-07T09:36:19.4057377Z * [new branch] gh/bobrenjc93/540/base -> origin/gh/bobrenjc93/540/base 2025-09-07T09:36:19.4058911Z * [new branch] gh/bobrenjc93/540/head -> origin/gh/bobrenjc93/540/head 2025-09-07T09:36:19.4060455Z * [new branch] gh/bobrenjc93/540/orig -> origin/gh/bobrenjc93/540/orig 2025-09-07T09:36:19.4062841Z * [new branch] gh/bobrenjc93/541/base -> origin/gh/bobrenjc93/541/base 2025-09-07T09:36:19.4064469Z * [new branch] gh/bobrenjc93/541/head -> origin/gh/bobrenjc93/541/head 2025-09-07T09:36:19.4066364Z * [new branch] gh/bobrenjc93/541/orig -> origin/gh/bobrenjc93/541/orig 2025-09-07T09:36:19.4068553Z * [new branch] gh/bobrenjc93/542/base -> origin/gh/bobrenjc93/542/base 2025-09-07T09:36:19.4070330Z * [new branch] gh/bobrenjc93/542/head -> origin/gh/bobrenjc93/542/head 2025-09-07T09:36:19.4072381Z * [new branch] gh/bobrenjc93/542/orig -> origin/gh/bobrenjc93/542/orig 2025-09-07T09:36:19.4075399Z * [new branch] gh/bobrenjc93/543/base -> origin/gh/bobrenjc93/543/base 2025-09-07T09:36:19.4077578Z * [new branch] gh/bobrenjc93/543/head -> origin/gh/bobrenjc93/543/head 2025-09-07T09:36:19.4079531Z * [new branch] gh/bobrenjc93/543/orig -> origin/gh/bobrenjc93/543/orig 2025-09-07T09:36:19.4082203Z * [new branch] gh/bobrenjc93/544/base -> origin/gh/bobrenjc93/544/base 2025-09-07T09:36:19.4084402Z * [new branch] gh/bobrenjc93/544/head -> origin/gh/bobrenjc93/544/head 2025-09-07T09:36:19.4086669Z * [new branch] gh/bobrenjc93/544/orig -> origin/gh/bobrenjc93/544/orig 2025-09-07T09:36:19.4089464Z * [new branch] gh/bobrenjc93/545/base -> origin/gh/bobrenjc93/545/base 2025-09-07T09:36:19.4091108Z * [new branch] gh/bobrenjc93/545/head -> origin/gh/bobrenjc93/545/head 2025-09-07T09:36:19.4092674Z * [new branch] gh/bobrenjc93/545/orig -> origin/gh/bobrenjc93/545/orig 2025-09-07T09:36:19.4095278Z * [new branch] gh/bobrenjc93/546/base -> origin/gh/bobrenjc93/546/base 2025-09-07T09:36:19.4097367Z * [new branch] gh/bobrenjc93/546/head -> origin/gh/bobrenjc93/546/head 2025-09-07T09:36:19.4099369Z * [new branch] gh/bobrenjc93/546/orig -> origin/gh/bobrenjc93/546/orig 2025-09-07T09:36:19.4103022Z * [new branch] gh/bobrenjc93/547/base -> origin/gh/bobrenjc93/547/base 2025-09-07T09:36:19.4104845Z * [new branch] gh/bobrenjc93/547/head -> origin/gh/bobrenjc93/547/head 2025-09-07T09:36:19.4107138Z * [new branch] gh/bobrenjc93/547/orig -> origin/gh/bobrenjc93/547/orig 2025-09-07T09:36:19.4109725Z * [new branch] gh/bobrenjc93/548/base -> origin/gh/bobrenjc93/548/base 2025-09-07T09:36:19.4111616Z * [new branch] gh/bobrenjc93/548/head -> origin/gh/bobrenjc93/548/head 2025-09-07T09:36:19.4113205Z * [new branch] gh/bobrenjc93/548/orig -> origin/gh/bobrenjc93/548/orig 2025-09-07T09:36:19.4115955Z * [new branch] gh/bobrenjc93/549/base -> origin/gh/bobrenjc93/549/base 2025-09-07T09:36:19.4117715Z * [new branch] gh/bobrenjc93/549/head -> origin/gh/bobrenjc93/549/head 2025-09-07T09:36:19.4119930Z * [new branch] gh/bobrenjc93/549/orig -> origin/gh/bobrenjc93/549/orig 2025-09-07T09:36:19.4123176Z * [new branch] gh/bobrenjc93/550/base -> origin/gh/bobrenjc93/550/base 2025-09-07T09:36:19.4126005Z * [new branch] gh/bobrenjc93/550/head -> origin/gh/bobrenjc93/550/head 2025-09-07T09:36:19.4128181Z * [new branch] gh/bobrenjc93/550/orig -> origin/gh/bobrenjc93/550/orig 2025-09-07T09:36:19.4131072Z * [new branch] gh/bobrenjc93/551/base -> origin/gh/bobrenjc93/551/base 2025-09-07T09:36:19.4132798Z * [new branch] gh/bobrenjc93/551/head -> origin/gh/bobrenjc93/551/head 2025-09-07T09:36:19.4135238Z * [new branch] gh/bobrenjc93/551/orig -> origin/gh/bobrenjc93/551/orig 2025-09-07T09:36:19.4137889Z * [new branch] gh/bobrenjc93/552/base -> origin/gh/bobrenjc93/552/base 2025-09-07T09:36:19.4139667Z * [new branch] gh/bobrenjc93/552/head -> origin/gh/bobrenjc93/552/head 2025-09-07T09:36:19.4141645Z * [new branch] gh/bobrenjc93/552/orig -> origin/gh/bobrenjc93/552/orig 2025-09-07T09:36:19.4144308Z * [new branch] gh/bobrenjc93/553/base -> origin/gh/bobrenjc93/553/base 2025-09-07T09:36:19.4146666Z * [new branch] gh/bobrenjc93/553/head -> origin/gh/bobrenjc93/553/head 2025-09-07T09:36:19.4148780Z * [new branch] gh/bobrenjc93/553/orig -> origin/gh/bobrenjc93/553/orig 2025-09-07T09:36:19.4151452Z * [new branch] gh/bobrenjc93/554/base -> origin/gh/bobrenjc93/554/base 2025-09-07T09:36:19.4153520Z * [new branch] gh/bobrenjc93/554/head -> origin/gh/bobrenjc93/554/head 2025-09-07T09:36:19.4155241Z * [new branch] gh/bobrenjc93/554/orig -> origin/gh/bobrenjc93/554/orig 2025-09-07T09:36:19.4158410Z * [new branch] gh/bobrenjc93/555/base -> origin/gh/bobrenjc93/555/base 2025-09-07T09:36:19.4159654Z * [new branch] gh/bobrenjc93/555/head -> origin/gh/bobrenjc93/555/head 2025-09-07T09:36:19.4161550Z * [new branch] gh/bobrenjc93/555/orig -> origin/gh/bobrenjc93/555/orig 2025-09-07T09:36:19.4164741Z * [new branch] gh/bobrenjc93/556/base -> origin/gh/bobrenjc93/556/base 2025-09-07T09:36:19.4166944Z * [new branch] gh/bobrenjc93/556/head -> origin/gh/bobrenjc93/556/head 2025-09-07T09:36:19.4168437Z * [new branch] gh/bobrenjc93/556/orig -> origin/gh/bobrenjc93/556/orig 2025-09-07T09:36:19.4171616Z * [new branch] gh/briancoutinho/2/base -> origin/gh/briancoutinho/2/base 2025-09-07T09:36:19.4173922Z * [new branch] gh/briancoutinho/2/head -> origin/gh/briancoutinho/2/head 2025-09-07T09:36:19.4177717Z * [new branch] gh/c00w/23/base -> origin/gh/c00w/23/base 2025-09-07T09:36:19.4179343Z * [new branch] gh/c00w/23/head -> origin/gh/c00w/23/head 2025-09-07T09:36:19.4182105Z * [new branch] gh/c00w/48/base -> origin/gh/c00w/48/base 2025-09-07T09:36:19.4184216Z * [new branch] gh/c00w/48/head -> origin/gh/c00w/48/head 2025-09-07T09:36:19.4186306Z * [new branch] gh/c00w/48/orig -> origin/gh/c00w/48/orig 2025-09-07T09:36:19.4189270Z * [new branch] gh/c00w/53/base -> origin/gh/c00w/53/base 2025-09-07T09:36:19.4190985Z * [new branch] gh/c00w/53/head -> origin/gh/c00w/53/head 2025-09-07T09:36:19.4193347Z * [new branch] gh/c00w/53/orig -> origin/gh/c00w/53/orig 2025-09-07T09:36:19.4196039Z * [new branch] gh/c00w/54/base -> origin/gh/c00w/54/base 2025-09-07T09:36:19.4197927Z * [new branch] gh/c00w/54/head -> origin/gh/c00w/54/head 2025-09-07T09:36:19.4199703Z * [new branch] gh/c00w/54/orig -> origin/gh/c00w/54/orig 2025-09-07T09:36:19.4202436Z * [new branch] gh/c00w/55/base -> origin/gh/c00w/55/base 2025-09-07T09:36:19.4204534Z * [new branch] gh/c00w/55/head -> origin/gh/c00w/55/head 2025-09-07T09:36:19.4206504Z * [new branch] gh/c00w/55/orig -> origin/gh/c00w/55/orig 2025-09-07T09:36:19.4209038Z * [new branch] gh/c00w/56/base -> origin/gh/c00w/56/base 2025-09-07T09:36:19.4211063Z * [new branch] gh/c00w/56/head -> origin/gh/c00w/56/head 2025-09-07T09:36:19.4212936Z * [new branch] gh/c00w/56/orig -> origin/gh/c00w/56/orig 2025-09-07T09:36:19.4216044Z * [new branch] gh/clee2000/1/base -> origin/gh/clee2000/1/base 2025-09-07T09:36:19.4218188Z * [new branch] gh/clee2000/1/head -> origin/gh/clee2000/1/head 2025-09-07T09:36:19.4220123Z * [new branch] gh/clee2000/1/orig -> origin/gh/clee2000/1/orig 2025-09-07T09:36:19.4223648Z * [new branch] gh/coconutruben/1/base -> origin/gh/coconutruben/1/base 2025-09-07T09:36:19.4226205Z * [new branch] gh/coconutruben/1/head -> origin/gh/coconutruben/1/head 2025-09-07T09:36:19.4229405Z * [new branch] gh/coconutruben/11/base -> origin/gh/coconutruben/11/base 2025-09-07T09:36:19.4231510Z * [new branch] gh/coconutruben/11/head -> origin/gh/coconutruben/11/head 2025-09-07T09:36:19.4233370Z * [new branch] gh/coconutruben/11/orig -> origin/gh/coconutruben/11/orig 2025-09-07T09:36:19.4237116Z * [new branch] gh/coconutruben/12/base -> origin/gh/coconutruben/12/base 2025-09-07T09:36:19.4239634Z * [new branch] gh/coconutruben/12/head -> origin/gh/coconutruben/12/head 2025-09-07T09:36:19.4241653Z * [new branch] gh/coconutruben/12/orig -> origin/gh/coconutruben/12/orig 2025-09-07T09:36:19.4244937Z * [new branch] gh/coconutruben/13/base -> origin/gh/coconutruben/13/base 2025-09-07T09:36:19.4247205Z * [new branch] gh/coconutruben/13/head -> origin/gh/coconutruben/13/head 2025-09-07T09:36:19.4249564Z * [new branch] gh/coconutruben/13/orig -> origin/gh/coconutruben/13/orig 2025-09-07T09:36:19.4252402Z * [new branch] gh/coconutruben/14/base -> origin/gh/coconutruben/14/base 2025-09-07T09:36:19.4254495Z * [new branch] gh/coconutruben/14/head -> origin/gh/coconutruben/14/head 2025-09-07T09:36:19.4256663Z * [new branch] gh/coconutruben/14/orig -> origin/gh/coconutruben/14/orig 2025-09-07T09:36:19.4259413Z * [new branch] gh/coconutruben/15/base -> origin/gh/coconutruben/15/base 2025-09-07T09:36:19.4261335Z * [new branch] gh/coconutruben/15/head -> origin/gh/coconutruben/15/head 2025-09-07T09:36:19.4263594Z * [new branch] gh/coconutruben/15/orig -> origin/gh/coconutruben/15/orig 2025-09-07T09:36:19.4266583Z * [new branch] gh/coconutruben/16/base -> origin/gh/coconutruben/16/base 2025-09-07T09:36:19.4268458Z * [new branch] gh/coconutruben/16/head -> origin/gh/coconutruben/16/head 2025-09-07T09:36:19.4270551Z * [new branch] gh/coconutruben/16/orig -> origin/gh/coconutruben/16/orig 2025-09-07T09:36:19.4273508Z * [new branch] gh/coconutruben/17/base -> origin/gh/coconutruben/17/base 2025-09-07T09:36:19.4275822Z * [new branch] gh/coconutruben/17/head -> origin/gh/coconutruben/17/head 2025-09-07T09:36:19.4277860Z * [new branch] gh/coconutruben/17/orig -> origin/gh/coconutruben/17/orig 2025-09-07T09:36:19.4280864Z * [new branch] gh/coconutruben/18/base -> origin/gh/coconutruben/18/base 2025-09-07T09:36:19.4282644Z * [new branch] gh/coconutruben/18/head -> origin/gh/coconutruben/18/head 2025-09-07T09:36:19.4284466Z * [new branch] gh/coconutruben/18/orig -> origin/gh/coconutruben/18/orig 2025-09-07T09:36:19.4287711Z * [new branch] gh/coconutruben/19/base -> origin/gh/coconutruben/19/base 2025-09-07T09:36:19.4289709Z * [new branch] gh/coconutruben/19/head -> origin/gh/coconutruben/19/head 2025-09-07T09:36:19.4291480Z * [new branch] gh/coconutruben/19/orig -> origin/gh/coconutruben/19/orig 2025-09-07T09:36:19.4294554Z * [new branch] gh/coconutruben/20/base -> origin/gh/coconutruben/20/base 2025-09-07T09:36:19.4297059Z * [new branch] gh/coconutruben/20/head -> origin/gh/coconutruben/20/head 2025-09-07T09:36:19.4298869Z * [new branch] gh/coconutruben/20/orig -> origin/gh/coconutruben/20/orig 2025-09-07T09:36:19.4301940Z * [new branch] gh/coconutruben/21/base -> origin/gh/coconutruben/21/base 2025-09-07T09:36:19.4303735Z * [new branch] gh/coconutruben/21/head -> origin/gh/coconutruben/21/head 2025-09-07T09:36:19.4305824Z * [new branch] gh/coconutruben/21/orig -> origin/gh/coconutruben/21/orig 2025-09-07T09:36:19.4308711Z * [new branch] gh/coconutruben/22/base -> origin/gh/coconutruben/22/base 2025-09-07T09:36:19.4310401Z * [new branch] gh/coconutruben/22/head -> origin/gh/coconutruben/22/head 2025-09-07T09:36:19.4312661Z * [new branch] gh/coconutruben/22/orig -> origin/gh/coconutruben/22/orig 2025-09-07T09:36:19.4316034Z * [new branch] gh/coconutruben/24/base -> origin/gh/coconutruben/24/base 2025-09-07T09:36:19.4318052Z * [new branch] gh/coconutruben/24/head -> origin/gh/coconutruben/24/head 2025-09-07T09:36:19.4319708Z * [new branch] gh/coconutruben/24/orig -> origin/gh/coconutruben/24/orig 2025-09-07T09:36:19.4322710Z * [new branch] gh/coconutruben/25/base -> origin/gh/coconutruben/25/base 2025-09-07T09:36:19.4325257Z * [new branch] gh/coconutruben/25/head -> origin/gh/coconutruben/25/head 2025-09-07T09:36:19.4327627Z * [new branch] gh/coconutruben/25/orig -> origin/gh/coconutruben/25/orig 2025-09-07T09:36:19.4330542Z * [new branch] gh/coconutruben/28/base -> origin/gh/coconutruben/28/base 2025-09-07T09:36:19.4332259Z * [new branch] gh/coconutruben/28/head -> origin/gh/coconutruben/28/head 2025-09-07T09:36:19.4334199Z * [new branch] gh/coconutruben/28/orig -> origin/gh/coconutruben/28/orig 2025-09-07T09:36:19.4337404Z * [new branch] gh/coconutruben/29/base -> origin/gh/coconutruben/29/base 2025-09-07T09:36:19.4339506Z * [new branch] gh/coconutruben/29/head -> origin/gh/coconutruben/29/head 2025-09-07T09:36:19.4341576Z * [new branch] gh/coconutruben/29/orig -> origin/gh/coconutruben/29/orig 2025-09-07T09:36:19.4344389Z * [new branch] gh/coconutruben/30/base -> origin/gh/coconutruben/30/base 2025-09-07T09:36:19.4346753Z * [new branch] gh/coconutruben/30/head -> origin/gh/coconutruben/30/head 2025-09-07T09:36:19.4348435Z * [new branch] gh/coconutruben/30/orig -> origin/gh/coconutruben/30/orig 2025-09-07T09:36:19.4351602Z * [new branch] gh/coconutruben/31/base -> origin/gh/coconutruben/31/base 2025-09-07T09:36:19.4353522Z * [new branch] gh/coconutruben/31/head -> origin/gh/coconutruben/31/head 2025-09-07T09:36:19.4355400Z * [new branch] gh/coconutruben/31/orig -> origin/gh/coconutruben/31/orig 2025-09-07T09:36:19.4358232Z * [new branch] gh/coconutruben/32/base -> origin/gh/coconutruben/32/base 2025-09-07T09:36:19.4360070Z * [new branch] gh/coconutruben/32/head -> origin/gh/coconutruben/32/head 2025-09-07T09:36:19.4361972Z * [new branch] gh/coconutruben/32/orig -> origin/gh/coconutruben/32/orig 2025-09-07T09:36:19.4365323Z * [new branch] gh/coconutruben/33/base -> origin/gh/coconutruben/33/base 2025-09-07T09:36:19.4367405Z * [new branch] gh/coconutruben/33/head -> origin/gh/coconutruben/33/head 2025-09-07T09:36:19.4369379Z * [new branch] gh/coconutruben/33/orig -> origin/gh/coconutruben/33/orig 2025-09-07T09:36:19.4371583Z * [new branch] gh/coconutruben/34/base -> origin/gh/coconutruben/34/base 2025-09-07T09:36:19.4373380Z * [new branch] gh/coconutruben/34/head -> origin/gh/coconutruben/34/head 2025-09-07T09:36:19.4375398Z * [new branch] gh/coconutruben/34/orig -> origin/gh/coconutruben/34/orig 2025-09-07T09:36:19.4378525Z * [new branch] gh/coconutruben/35/base -> origin/gh/coconutruben/35/base 2025-09-07T09:36:19.4380235Z * [new branch] gh/coconutruben/35/head -> origin/gh/coconutruben/35/head 2025-09-07T09:36:19.4382281Z * [new branch] gh/coconutruben/35/orig -> origin/gh/coconutruben/35/orig 2025-09-07T09:36:19.4387206Z * [new branch] gh/coconutruben/36/base -> origin/gh/coconutruben/36/base 2025-09-07T09:36:19.4389567Z * [new branch] gh/coconutruben/36/head -> origin/gh/coconutruben/36/head 2025-09-07T09:36:19.4392711Z * [new branch] gh/coconutruben/36/orig -> origin/gh/coconutruben/36/orig 2025-09-07T09:36:19.4395897Z * [new branch] gh/coconutruben/37/base -> origin/gh/coconutruben/37/base 2025-09-07T09:36:19.4397574Z * [new branch] gh/coconutruben/37/head -> origin/gh/coconutruben/37/head 2025-09-07T09:36:19.4399475Z * [new branch] gh/coconutruben/37/orig -> origin/gh/coconutruben/37/orig 2025-09-07T09:36:19.4402286Z * [new branch] gh/coconutruben/38/base -> origin/gh/coconutruben/38/base 2025-09-07T09:36:19.4404069Z * [new branch] gh/coconutruben/38/head -> origin/gh/coconutruben/38/head 2025-09-07T09:36:19.4406412Z * [new branch] gh/coconutruben/38/orig -> origin/gh/coconutruben/38/orig 2025-09-07T09:36:19.4409150Z * [new branch] gh/coconutruben/39/base -> origin/gh/coconutruben/39/base 2025-09-07T09:36:19.4410935Z * [new branch] gh/coconutruben/39/head -> origin/gh/coconutruben/39/head 2025-09-07T09:36:19.4412673Z * [new branch] gh/coconutruben/39/orig -> origin/gh/coconutruben/39/orig 2025-09-07T09:36:19.4416017Z * [new branch] gh/coconutruben/40/base -> origin/gh/coconutruben/40/base 2025-09-07T09:36:19.4417726Z * [new branch] gh/coconutruben/40/head -> origin/gh/coconutruben/40/head 2025-09-07T09:36:19.4419746Z * [new branch] gh/coconutruben/40/orig -> origin/gh/coconutruben/40/orig 2025-09-07T09:36:19.4422741Z * [new branch] gh/coconutruben/41/base -> origin/gh/coconutruben/41/base 2025-09-07T09:36:19.4424318Z * [new branch] gh/coconutruben/41/head -> origin/gh/coconutruben/41/head 2025-09-07T09:36:19.4426374Z * [new branch] gh/coconutruben/41/orig -> origin/gh/coconutruben/41/orig 2025-09-07T09:36:19.4429163Z * [new branch] gh/coconutruben/42/base -> origin/gh/coconutruben/42/base 2025-09-07T09:36:19.4431504Z * [new branch] gh/coconutruben/42/head -> origin/gh/coconutruben/42/head 2025-09-07T09:36:19.4433791Z * [new branch] gh/coconutruben/42/orig -> origin/gh/coconutruben/42/orig 2025-09-07T09:36:19.4437103Z * [new branch] gh/coconutruben/43/base -> origin/gh/coconutruben/43/base 2025-09-07T09:36:19.4438954Z * [new branch] gh/coconutruben/43/head -> origin/gh/coconutruben/43/head 2025-09-07T09:36:19.4441322Z * [new branch] gh/coconutruben/43/orig -> origin/gh/coconutruben/43/orig 2025-09-07T09:36:19.4444452Z * [new branch] gh/coconutruben/44/base -> origin/gh/coconutruben/44/base 2025-09-07T09:36:19.4446798Z * [new branch] gh/coconutruben/44/head -> origin/gh/coconutruben/44/head 2025-09-07T09:36:19.4448853Z * [new branch] gh/coconutruben/44/orig -> origin/gh/coconutruben/44/orig 2025-09-07T09:36:19.4451971Z * [new branch] gh/coconutruben/45/base -> origin/gh/coconutruben/45/base 2025-09-07T09:36:19.4453887Z * [new branch] gh/coconutruben/45/head -> origin/gh/coconutruben/45/head 2025-09-07T09:36:19.4456652Z * [new branch] gh/coconutruben/45/orig -> origin/gh/coconutruben/45/orig 2025-09-07T09:36:19.4459196Z * [new branch] gh/coconutruben/46/base -> origin/gh/coconutruben/46/base 2025-09-07T09:36:19.4461331Z * [new branch] gh/coconutruben/46/head -> origin/gh/coconutruben/46/head 2025-09-07T09:36:19.4463599Z * [new branch] gh/coconutruben/46/orig -> origin/gh/coconutruben/46/orig 2025-09-07T09:36:19.4467059Z * [new branch] gh/coconutruben/47/base -> origin/gh/coconutruben/47/base 2025-09-07T09:36:19.4469147Z * [new branch] gh/coconutruben/47/head -> origin/gh/coconutruben/47/head 2025-09-07T09:36:19.4471220Z * [new branch] gh/coconutruben/47/orig -> origin/gh/coconutruben/47/orig 2025-09-07T09:36:19.4474123Z * [new branch] gh/coconutruben/48/base -> origin/gh/coconutruben/48/base 2025-09-07T09:36:19.4476605Z * [new branch] gh/coconutruben/48/head -> origin/gh/coconutruben/48/head 2025-09-07T09:36:19.4478666Z * [new branch] gh/coconutruben/48/orig -> origin/gh/coconutruben/48/orig 2025-09-07T09:36:19.4482009Z * [new branch] gh/coconutruben/49/base -> origin/gh/coconutruben/49/base 2025-09-07T09:36:19.4484201Z * [new branch] gh/coconutruben/49/head -> origin/gh/coconutruben/49/head 2025-09-07T09:36:19.4486640Z * [new branch] gh/coconutruben/49/orig -> origin/gh/coconutruben/49/orig 2025-09-07T09:36:19.4489515Z * [new branch] gh/coconutruben/50/base -> origin/gh/coconutruben/50/base 2025-09-07T09:36:19.4491466Z * [new branch] gh/coconutruben/50/head -> origin/gh/coconutruben/50/head 2025-09-07T09:36:19.4493842Z * [new branch] gh/coconutruben/50/orig -> origin/gh/coconutruben/50/orig 2025-09-07T09:36:19.4497244Z * [new branch] gh/coconutruben/51/base -> origin/gh/coconutruben/51/base 2025-09-07T09:36:19.4498653Z * [new branch] gh/coconutruben/51/head -> origin/gh/coconutruben/51/head 2025-09-07T09:36:19.4500577Z * [new branch] gh/coconutruben/51/orig -> origin/gh/coconutruben/51/orig 2025-09-07T09:36:19.4503858Z * [new branch] gh/coconutruben/52/base -> origin/gh/coconutruben/52/base 2025-09-07T09:36:19.4506293Z * [new branch] gh/coconutruben/52/head -> origin/gh/coconutruben/52/head 2025-09-07T09:36:19.4508452Z * [new branch] gh/coconutruben/52/orig -> origin/gh/coconutruben/52/orig 2025-09-07T09:36:19.4511614Z * [new branch] gh/coconutruben/53/base -> origin/gh/coconutruben/53/base 2025-09-07T09:36:19.4513704Z * [new branch] gh/coconutruben/53/head -> origin/gh/coconutruben/53/head 2025-09-07T09:36:19.4515656Z * [new branch] gh/coconutruben/53/orig -> origin/gh/coconutruben/53/orig 2025-09-07T09:36:19.4518489Z * [new branch] gh/coconutruben/54/base -> origin/gh/coconutruben/54/base 2025-09-07T09:36:19.4520608Z * [new branch] gh/coconutruben/54/head -> origin/gh/coconutruben/54/head 2025-09-07T09:36:19.4522539Z * [new branch] gh/coconutruben/54/orig -> origin/gh/coconutruben/54/orig 2025-09-07T09:36:19.4525460Z * [new branch] gh/coconutruben/55/base -> origin/gh/coconutruben/55/base 2025-09-07T09:36:19.4527483Z * [new branch] gh/coconutruben/55/head -> origin/gh/coconutruben/55/head 2025-09-07T09:36:19.4529777Z * [new branch] gh/coconutruben/55/orig -> origin/gh/coconutruben/55/orig 2025-09-07T09:36:19.4532693Z * [new branch] gh/coconutruben/56/base -> origin/gh/coconutruben/56/base 2025-09-07T09:36:19.4534774Z * [new branch] gh/coconutruben/56/head -> origin/gh/coconutruben/56/head 2025-09-07T09:36:19.4537259Z * [new branch] gh/coconutruben/56/orig -> origin/gh/coconutruben/56/orig 2025-09-07T09:36:19.4539883Z * [new branch] gh/coconutruben/57/base -> origin/gh/coconutruben/57/base 2025-09-07T09:36:19.4541767Z * [new branch] gh/coconutruben/57/head -> origin/gh/coconutruben/57/head 2025-09-07T09:36:19.4543607Z * [new branch] gh/coconutruben/57/orig -> origin/gh/coconutruben/57/orig 2025-09-07T09:36:19.4547138Z * [new branch] gh/coconutruben/58/base -> origin/gh/coconutruben/58/base 2025-09-07T09:36:19.4549393Z * [new branch] gh/coconutruben/58/head -> origin/gh/coconutruben/58/head 2025-09-07T09:36:19.4551682Z * [new branch] gh/coconutruben/58/orig -> origin/gh/coconutruben/58/orig 2025-09-07T09:36:19.4555300Z * [new branch] gh/coconutruben/59/base -> origin/gh/coconutruben/59/base 2025-09-07T09:36:19.4556866Z * [new branch] gh/coconutruben/59/head -> origin/gh/coconutruben/59/head 2025-09-07T09:36:19.4558773Z * [new branch] gh/coconutruben/59/orig -> origin/gh/coconutruben/59/orig 2025-09-07T09:36:19.4561842Z * [new branch] gh/coconutruben/60/base -> origin/gh/coconutruben/60/base 2025-09-07T09:36:19.4563650Z * [new branch] gh/coconutruben/60/head -> origin/gh/coconutruben/60/head 2025-09-07T09:36:19.4565707Z * [new branch] gh/coconutruben/60/orig -> origin/gh/coconutruben/60/orig 2025-09-07T09:36:19.4568345Z * [new branch] gh/coconutruben/61/base -> origin/gh/coconutruben/61/base 2025-09-07T09:36:19.4570428Z * [new branch] gh/coconutruben/61/head -> origin/gh/coconutruben/61/head 2025-09-07T09:36:19.4572190Z * [new branch] gh/coconutruben/61/orig -> origin/gh/coconutruben/61/orig 2025-09-07T09:36:19.4575560Z * [new branch] gh/coconutruben/62/base -> origin/gh/coconutruben/62/base 2025-09-07T09:36:19.4577663Z * [new branch] gh/coconutruben/62/head -> origin/gh/coconutruben/62/head 2025-09-07T09:36:19.4579256Z * [new branch] gh/coconutruben/62/orig -> origin/gh/coconutruben/62/orig 2025-09-07T09:36:19.4582347Z * [new branch] gh/coconutruben/63/base -> origin/gh/coconutruben/63/base 2025-09-07T09:36:19.4584437Z * [new branch] gh/coconutruben/63/head -> origin/gh/coconutruben/63/head 2025-09-07T09:36:19.4586814Z * [new branch] gh/coconutruben/63/orig -> origin/gh/coconutruben/63/orig 2025-09-07T09:36:19.4589174Z * [new branch] gh/coconutruben/64/base -> origin/gh/coconutruben/64/base 2025-09-07T09:36:19.4591009Z * [new branch] gh/coconutruben/64/head -> origin/gh/coconutruben/64/head 2025-09-07T09:36:19.4592901Z * [new branch] gh/coconutruben/64/orig -> origin/gh/coconutruben/64/orig 2025-09-07T09:36:19.4595772Z * [new branch] gh/coconutruben/65/base -> origin/gh/coconutruben/65/base 2025-09-07T09:36:19.4597810Z * [new branch] gh/coconutruben/65/head -> origin/gh/coconutruben/65/head 2025-09-07T09:36:19.4599997Z * [new branch] gh/coconutruben/65/orig -> origin/gh/coconutruben/65/orig 2025-09-07T09:36:19.4602738Z * [new branch] gh/coconutruben/66/base -> origin/gh/coconutruben/66/base 2025-09-07T09:36:19.4604414Z * [new branch] gh/coconutruben/66/head -> origin/gh/coconutruben/66/head 2025-09-07T09:36:19.4606682Z * [new branch] gh/coconutruben/66/orig -> origin/gh/coconutruben/66/orig 2025-09-07T09:36:19.4610449Z * [new branch] gh/codingwithsurya/12/base -> origin/gh/codingwithsurya/12/base 2025-09-07T09:36:19.4612577Z * [new branch] gh/codingwithsurya/12/head -> origin/gh/codingwithsurya/12/head 2025-09-07T09:36:19.4614488Z * [new branch] gh/codingwithsurya/12/orig -> origin/gh/codingwithsurya/12/orig 2025-09-07T09:36:19.4617190Z * [new branch] gh/codingwithsurya/14/base -> origin/gh/codingwithsurya/14/base 2025-09-07T09:36:19.4618931Z * [new branch] gh/codingwithsurya/14/head -> origin/gh/codingwithsurya/14/head 2025-09-07T09:36:19.4620937Z * [new branch] gh/codingwithsurya/14/orig -> origin/gh/codingwithsurya/14/orig 2025-09-07T09:36:19.4623854Z * [new branch] gh/codingwithsurya/15/base -> origin/gh/codingwithsurya/15/base 2025-09-07T09:36:19.4625888Z * [new branch] gh/codingwithsurya/15/head -> origin/gh/codingwithsurya/15/head 2025-09-07T09:36:19.4627462Z * [new branch] gh/codingwithsurya/15/orig -> origin/gh/codingwithsurya/15/orig 2025-09-07T09:36:19.4630278Z * [new branch] gh/codingwithsurya/16/base -> origin/gh/codingwithsurya/16/base 2025-09-07T09:36:19.4632329Z * [new branch] gh/codingwithsurya/16/head -> origin/gh/codingwithsurya/16/head 2025-09-07T09:36:19.4634239Z * [new branch] gh/codingwithsurya/16/orig -> origin/gh/codingwithsurya/16/orig 2025-09-07T09:36:19.4637397Z * [new branch] gh/codingwithsurya/17/base -> origin/gh/codingwithsurya/17/base 2025-09-07T09:36:19.4639324Z * [new branch] gh/codingwithsurya/17/head -> origin/gh/codingwithsurya/17/head 2025-09-07T09:36:19.4641159Z * [new branch] gh/codingwithsurya/17/orig -> origin/gh/codingwithsurya/17/orig 2025-09-07T09:36:19.4643806Z * [new branch] gh/codingwithsurya/18/base -> origin/gh/codingwithsurya/18/base 2025-09-07T09:36:19.4646107Z * [new branch] gh/codingwithsurya/18/head -> origin/gh/codingwithsurya/18/head 2025-09-07T09:36:19.4648177Z * [new branch] gh/codingwithsurya/18/orig -> origin/gh/codingwithsurya/18/orig 2025-09-07T09:36:19.4651086Z * [new branch] gh/codingwithsurya/19/base -> origin/gh/codingwithsurya/19/base 2025-09-07T09:36:19.4653183Z * [new branch] gh/codingwithsurya/19/head -> origin/gh/codingwithsurya/19/head 2025-09-07T09:36:19.4654643Z * [new branch] gh/codingwithsurya/19/orig -> origin/gh/codingwithsurya/19/orig 2025-09-07T09:36:19.4657820Z * [new branch] gh/codingwithsurya/20/base -> origin/gh/codingwithsurya/20/base 2025-09-07T09:36:19.4659490Z * [new branch] gh/codingwithsurya/20/head -> origin/gh/codingwithsurya/20/head 2025-09-07T09:36:19.4661712Z * [new branch] gh/codingwithsurya/20/orig -> origin/gh/codingwithsurya/20/orig 2025-09-07T09:36:19.4664811Z * [new branch] gh/codingwithsurya/21/base -> origin/gh/codingwithsurya/21/base 2025-09-07T09:36:19.4667080Z * [new branch] gh/codingwithsurya/21/head -> origin/gh/codingwithsurya/21/head 2025-09-07T09:36:19.4669333Z * [new branch] gh/codingwithsurya/21/orig -> origin/gh/codingwithsurya/21/orig 2025-09-07T09:36:19.4672732Z * [new branch] gh/colinchan15/1/base -> origin/gh/colinchan15/1/base 2025-09-07T09:36:19.4674711Z * [new branch] gh/colinchan15/1/head -> origin/gh/colinchan15/1/head 2025-09-07T09:36:19.4677397Z * [new branch] gh/colinchan15/2/base -> origin/gh/colinchan15/2/base 2025-09-07T09:36:19.4679088Z * [new branch] gh/colinchan15/2/head -> origin/gh/colinchan15/2/head 2025-09-07T09:36:19.4682464Z * [new branch] gh/colinchan15/3/base -> origin/gh/colinchan15/3/base 2025-09-07T09:36:19.4683602Z * [new branch] gh/colinchan15/3/head -> origin/gh/colinchan15/3/head 2025-09-07T09:36:19.4686484Z * [new branch] gh/colinchan15/6/base -> origin/gh/colinchan15/6/base 2025-09-07T09:36:19.4688097Z * [new branch] gh/colinchan15/6/head -> origin/gh/colinchan15/6/head 2025-09-07T09:36:19.4691404Z * [new branch] gh/davidberard98/382/base -> origin/gh/davidberard98/382/base 2025-09-07T09:36:19.4693323Z * [new branch] gh/davidberard98/382/head -> origin/gh/davidberard98/382/head 2025-09-07T09:36:19.4695516Z * [new branch] gh/davidberard98/382/orig -> origin/gh/davidberard98/382/orig 2025-09-07T09:36:19.4698604Z * [new branch] gh/davidberard98/386/base -> origin/gh/davidberard98/386/base 2025-09-07T09:36:19.4700553Z * [new branch] gh/davidberard98/386/head -> origin/gh/davidberard98/386/head 2025-09-07T09:36:19.4702333Z * [new branch] gh/davidberard98/386/orig -> origin/gh/davidberard98/386/orig 2025-09-07T09:36:19.4704880Z * [new branch] gh/davidberard98/391/base -> origin/gh/davidberard98/391/base 2025-09-07T09:36:19.4706942Z * [new branch] gh/davidberard98/391/head -> origin/gh/davidberard98/391/head 2025-09-07T09:36:19.4709186Z * [new branch] gh/davidberard98/391/orig -> origin/gh/davidberard98/391/orig 2025-09-07T09:36:19.4711587Z * [new branch] gh/davidberard98/392/base -> origin/gh/davidberard98/392/base 2025-09-07T09:36:19.4713128Z * [new branch] gh/davidberard98/392/head -> origin/gh/davidberard98/392/head 2025-09-07T09:36:19.4715211Z * [new branch] gh/davidberard98/392/orig -> origin/gh/davidberard98/392/orig 2025-09-07T09:36:19.4718380Z * [new branch] gh/davidberard98/394/base -> origin/gh/davidberard98/394/base 2025-09-07T09:36:19.4720285Z * [new branch] gh/davidberard98/394/head -> origin/gh/davidberard98/394/head 2025-09-07T09:36:19.4721824Z * [new branch] gh/davidberard98/394/orig -> origin/gh/davidberard98/394/orig 2025-09-07T09:36:19.4724388Z * [new branch] gh/davidberard98/396/base -> origin/gh/davidberard98/396/base 2025-09-07T09:36:19.4726581Z * [new branch] gh/davidberard98/396/head -> origin/gh/davidberard98/396/head 2025-09-07T09:36:19.4728328Z * [new branch] gh/davidberard98/396/orig -> origin/gh/davidberard98/396/orig 2025-09-07T09:36:19.4731216Z * [new branch] gh/davidberard98/397/base -> origin/gh/davidberard98/397/base 2025-09-07T09:36:19.4733082Z * [new branch] gh/davidberard98/397/head -> origin/gh/davidberard98/397/head 2025-09-07T09:36:19.4734545Z * [new branch] gh/davidberard98/397/orig -> origin/gh/davidberard98/397/orig 2025-09-07T09:36:19.4737418Z * [new branch] gh/davidberard98/398/base -> origin/gh/davidberard98/398/base 2025-09-07T09:36:19.4741880Z * [new branch] gh/davidberard98/398/head -> origin/gh/davidberard98/398/head 2025-09-07T09:36:19.4743527Z * [new branch] gh/davidberard98/398/orig -> origin/gh/davidberard98/398/orig 2025-09-07T09:36:19.4746514Z * [new branch] gh/davidberard98/399/base -> origin/gh/davidberard98/399/base 2025-09-07T09:36:19.4748349Z * [new branch] gh/davidberard98/399/head -> origin/gh/davidberard98/399/head 2025-09-07T09:36:19.4750271Z * [new branch] gh/davidberard98/399/orig -> origin/gh/davidberard98/399/orig 2025-09-07T09:36:19.4753109Z * [new branch] gh/davidberard98/400/base -> origin/gh/davidberard98/400/base 2025-09-07T09:36:19.4755249Z * [new branch] gh/davidberard98/400/head -> origin/gh/davidberard98/400/head 2025-09-07T09:36:19.4757203Z * [new branch] gh/davidberard98/400/orig -> origin/gh/davidberard98/400/orig 2025-09-07T09:36:19.4760026Z * [new branch] gh/davidberard98/401/base -> origin/gh/davidberard98/401/base 2025-09-07T09:36:19.4761805Z * [new branch] gh/davidberard98/401/head -> origin/gh/davidberard98/401/head 2025-09-07T09:36:19.4763731Z * [new branch] gh/davidberard98/401/orig -> origin/gh/davidberard98/401/orig 2025-09-07T09:36:19.4766558Z * [new branch] gh/davidberard98/402/base -> origin/gh/davidberard98/402/base 2025-09-07T09:36:19.4768244Z * [new branch] gh/davidberard98/402/head -> origin/gh/davidberard98/402/head 2025-09-07T09:36:19.4769846Z * [new branch] gh/davidberard98/402/orig -> origin/gh/davidberard98/402/orig 2025-09-07T09:36:19.4772372Z * [new branch] gh/davidberard98/403/base -> origin/gh/davidberard98/403/base 2025-09-07T09:36:19.4774399Z * [new branch] gh/davidberard98/403/head -> origin/gh/davidberard98/403/head 2025-09-07T09:36:19.4776392Z * [new branch] gh/davidberard98/403/orig -> origin/gh/davidberard98/403/orig 2025-09-07T09:36:19.4779645Z * [new branch] gh/davidberard98/404/base -> origin/gh/davidberard98/404/base 2025-09-07T09:36:19.4781327Z * [new branch] gh/davidberard98/404/head -> origin/gh/davidberard98/404/head 2025-09-07T09:36:19.4783591Z * [new branch] gh/davidberard98/404/orig -> origin/gh/davidberard98/404/orig 2025-09-07T09:36:19.4786353Z * [new branch] gh/davidberard98/405/base -> origin/gh/davidberard98/405/base 2025-09-07T09:36:19.4788119Z * [new branch] gh/davidberard98/405/head -> origin/gh/davidberard98/405/head 2025-09-07T09:36:19.4789852Z * [new branch] gh/davidberard98/405/orig -> origin/gh/davidberard98/405/orig 2025-09-07T09:36:19.4792660Z * [new branch] gh/davidberard98/406/base -> origin/gh/davidberard98/406/base 2025-09-07T09:36:19.4794566Z * [new branch] gh/davidberard98/406/head -> origin/gh/davidberard98/406/head 2025-09-07T09:36:19.4796813Z * [new branch] gh/davidberard98/406/orig -> origin/gh/davidberard98/406/orig 2025-09-07T09:36:19.4799692Z * [new branch] gh/davidberard98/407/base -> origin/gh/davidberard98/407/base 2025-09-07T09:36:19.4801031Z * [new branch] gh/davidberard98/407/head -> origin/gh/davidberard98/407/head 2025-09-07T09:36:19.4802834Z * [new branch] gh/davidberard98/407/orig -> origin/gh/davidberard98/407/orig 2025-09-07T09:36:19.4805596Z * [new branch] gh/davidberard98/408/base -> origin/gh/davidberard98/408/base 2025-09-07T09:36:19.4807817Z * [new branch] gh/davidberard98/408/head -> origin/gh/davidberard98/408/head 2025-09-07T09:36:19.4809378Z * [new branch] gh/davidberard98/408/orig -> origin/gh/davidberard98/408/orig 2025-09-07T09:36:19.4811521Z * [new branch] gh/davidberard98/409/base -> origin/gh/davidberard98/409/base 2025-09-07T09:36:19.4813463Z * [new branch] gh/davidberard98/409/head -> origin/gh/davidberard98/409/head 2025-09-07T09:36:19.4815424Z * [new branch] gh/davidberard98/409/orig -> origin/gh/davidberard98/409/orig 2025-09-07T09:36:19.4818231Z * [new branch] gh/desertfire/594/base -> origin/gh/desertfire/594/base 2025-09-07T09:36:19.4819708Z * [new branch] gh/desertfire/594/head -> origin/gh/desertfire/594/head 2025-09-07T09:36:19.4821928Z * [new branch] gh/desertfire/594/orig -> origin/gh/desertfire/594/orig 2025-09-07T09:36:19.4824530Z * [new branch] gh/desertfire/595/base -> origin/gh/desertfire/595/base 2025-09-07T09:36:19.4826866Z * [new branch] gh/desertfire/595/head -> origin/gh/desertfire/595/head 2025-09-07T09:36:19.4828654Z * [new branch] gh/desertfire/595/orig -> origin/gh/desertfire/595/orig 2025-09-07T09:36:19.4831029Z * [new branch] gh/desertfire/597/base -> origin/gh/desertfire/597/base 2025-09-07T09:36:19.4833040Z * [new branch] gh/desertfire/597/head -> origin/gh/desertfire/597/head 2025-09-07T09:36:19.4834632Z * [new branch] gh/desertfire/597/orig -> origin/gh/desertfire/597/orig 2025-09-07T09:36:19.4837840Z * [new branch] gh/dharakk/1/base -> origin/gh/dharakk/1/base 2025-09-07T09:36:19.4839406Z * [new branch] gh/dharakk/1/head -> origin/gh/dharakk/1/head 2025-09-07T09:36:19.4842773Z * [new branch] gh/drisspg/149/base -> origin/gh/drisspg/149/base 2025-09-07T09:36:19.4844448Z * [new branch] gh/drisspg/149/head -> origin/gh/drisspg/149/head 2025-09-07T09:36:19.4846794Z * [new branch] gh/drisspg/149/orig -> origin/gh/drisspg/149/orig 2025-09-07T09:36:19.4849851Z * [new branch] gh/drisspg/159/base -> origin/gh/drisspg/159/base 2025-09-07T09:36:19.4851665Z * [new branch] gh/drisspg/159/head -> origin/gh/drisspg/159/head 2025-09-07T09:36:19.4853315Z * [new branch] gh/drisspg/159/orig -> origin/gh/drisspg/159/orig 2025-09-07T09:36:19.4856211Z * [new branch] gh/drisspg/166/base -> origin/gh/drisspg/166/base 2025-09-07T09:36:19.4858063Z * [new branch] gh/drisspg/166/head -> origin/gh/drisspg/166/head 2025-09-07T09:36:19.4859667Z * [new branch] gh/drisspg/166/orig -> origin/gh/drisspg/166/orig 2025-09-07T09:36:19.4861927Z * [new branch] gh/drisspg/170/base -> origin/gh/drisspg/170/base 2025-09-07T09:36:19.4864019Z * [new branch] gh/drisspg/170/head -> origin/gh/drisspg/170/head 2025-09-07T09:36:19.4866565Z * [new branch] gh/drisspg/170/orig -> origin/gh/drisspg/170/orig 2025-09-07T09:36:19.4869383Z * [new branch] gh/drisspg/173/base -> origin/gh/drisspg/173/base 2025-09-07T09:36:19.4870891Z * [new branch] gh/drisspg/173/head -> origin/gh/drisspg/173/head 2025-09-07T09:36:19.4873057Z * [new branch] gh/drisspg/173/orig -> origin/gh/drisspg/173/orig 2025-09-07T09:36:19.4876048Z * [new branch] gh/drisspg/177/base -> origin/gh/drisspg/177/base 2025-09-07T09:36:19.4877515Z * [new branch] gh/drisspg/177/head -> origin/gh/drisspg/177/head 2025-09-07T09:36:19.4879342Z * [new branch] gh/drisspg/177/orig -> origin/gh/drisspg/177/orig 2025-09-07T09:36:19.4882023Z * [new branch] gh/drisspg/178/base -> origin/gh/drisspg/178/base 2025-09-07T09:36:19.4884153Z * [new branch] gh/drisspg/178/head -> origin/gh/drisspg/178/head 2025-09-07T09:36:19.4885828Z * [new branch] gh/drisspg/178/orig -> origin/gh/drisspg/178/orig 2025-09-07T09:36:19.4888411Z * [new branch] gh/drisspg/180/base -> origin/gh/drisspg/180/base 2025-09-07T09:36:19.4890012Z * [new branch] gh/drisspg/180/head -> origin/gh/drisspg/180/head 2025-09-07T09:36:19.4891838Z * [new branch] gh/drisspg/180/orig -> origin/gh/drisspg/180/orig 2025-09-07T09:36:19.4894329Z * [new branch] gh/drisspg/181/base -> origin/gh/drisspg/181/base 2025-09-07T09:36:19.4896462Z * [new branch] gh/drisspg/181/head -> origin/gh/drisspg/181/head 2025-09-07T09:36:19.4898002Z * [new branch] gh/drisspg/181/orig -> origin/gh/drisspg/181/orig 2025-09-07T09:36:19.4900305Z * [new branch] gh/drisspg/182/base -> origin/gh/drisspg/182/base 2025-09-07T09:36:19.4902070Z * [new branch] gh/drisspg/182/head -> origin/gh/drisspg/182/head 2025-09-07T09:36:19.4904112Z * [new branch] gh/drisspg/183/base -> origin/gh/drisspg/183/base 2025-09-07T09:36:19.4906195Z * [new branch] gh/drisspg/183/head -> origin/gh/drisspg/183/head 2025-09-07T09:36:19.4908418Z * [new branch] gh/drisspg/184/base -> origin/gh/drisspg/184/base 2025-09-07T09:36:19.4910021Z * [new branch] gh/drisspg/184/head -> origin/gh/drisspg/184/head 2025-09-07T09:36:19.4912630Z * [new branch] gh/drisspg/185/base -> origin/gh/drisspg/185/base 2025-09-07T09:36:19.4914163Z * [new branch] gh/drisspg/185/head -> origin/gh/drisspg/185/head 2025-09-07T09:36:19.4916922Z * [new branch] gh/drisspg/186/base -> origin/gh/drisspg/186/base 2025-09-07T09:36:19.4918586Z * [new branch] gh/drisspg/186/head -> origin/gh/drisspg/186/head 2025-09-07T09:36:19.4920596Z * [new branch] gh/drisspg/186/orig -> origin/gh/drisspg/186/orig 2025-09-07T09:36:19.4923366Z * [new branch] gh/drisspg/187/base -> origin/gh/drisspg/187/base 2025-09-07T09:36:19.4925255Z * [new branch] gh/drisspg/187/head -> origin/gh/drisspg/187/head 2025-09-07T09:36:19.4927368Z * [new branch] gh/drisspg/187/orig -> origin/gh/drisspg/187/orig 2025-09-07T09:36:19.4929979Z * [new branch] gh/drisspg/188/base -> origin/gh/drisspg/188/base 2025-09-07T09:36:19.4931766Z * [new branch] gh/drisspg/188/head -> origin/gh/drisspg/188/head 2025-09-07T09:36:19.4933245Z * [new branch] gh/drisspg/188/orig -> origin/gh/drisspg/188/orig 2025-09-07T09:36:19.4935877Z * [new branch] gh/drisspg/189/base -> origin/gh/drisspg/189/base 2025-09-07T09:36:19.4937716Z * [new branch] gh/drisspg/189/head -> origin/gh/drisspg/189/head 2025-09-07T09:36:19.4939402Z * [new branch] gh/drisspg/189/orig -> origin/gh/drisspg/189/orig 2025-09-07T09:36:19.4942010Z * [new branch] gh/drisspg/190/base -> origin/gh/drisspg/190/base 2025-09-07T09:36:19.4944013Z * [new branch] gh/drisspg/190/head -> origin/gh/drisspg/190/head 2025-09-07T09:36:19.4946085Z * [new branch] gh/drisspg/190/orig -> origin/gh/drisspg/190/orig 2025-09-07T09:36:19.4948903Z * [new branch] gh/drisspg/191/base -> origin/gh/drisspg/191/base 2025-09-07T09:36:19.4950699Z * [new branch] gh/drisspg/191/head -> origin/gh/drisspg/191/head 2025-09-07T09:36:19.4952699Z * [new branch] gh/drisspg/191/orig -> origin/gh/drisspg/191/orig 2025-09-07T09:36:19.4955293Z * [new branch] gh/drisspg/192/base -> origin/gh/drisspg/192/base 2025-09-07T09:36:19.4957406Z * [new branch] gh/drisspg/192/head -> origin/gh/drisspg/192/head 2025-09-07T09:36:19.4958915Z * [new branch] gh/drisspg/192/orig -> origin/gh/drisspg/192/orig 2025-09-07T09:36:19.4961423Z * [new branch] gh/drisspg/193/base -> origin/gh/drisspg/193/base 2025-09-07T09:36:19.4963028Z * [new branch] gh/drisspg/193/head -> origin/gh/drisspg/193/head 2025-09-07T09:36:19.4964753Z * [new branch] gh/drisspg/193/orig -> origin/gh/drisspg/193/orig 2025-09-07T09:36:19.4967675Z * [new branch] gh/drisspg/194/base -> origin/gh/drisspg/194/base 2025-09-07T09:36:19.4969268Z * [new branch] gh/drisspg/194/head -> origin/gh/drisspg/194/head 2025-09-07T09:36:19.4971042Z * [new branch] gh/drisspg/194/orig -> origin/gh/drisspg/194/orig 2025-09-07T09:36:19.4973472Z * [new branch] gh/drisspg/195/base -> origin/gh/drisspg/195/base 2025-09-07T09:36:19.4975403Z * [new branch] gh/drisspg/195/head -> origin/gh/drisspg/195/head 2025-09-07T09:36:19.4977276Z * [new branch] gh/drisspg/195/orig -> origin/gh/drisspg/195/orig 2025-09-07T09:36:19.4979619Z * [new branch] gh/drisspg/196/base -> origin/gh/drisspg/196/base 2025-09-07T09:36:19.4981745Z * [new branch] gh/drisspg/196/head -> origin/gh/drisspg/196/head 2025-09-07T09:36:19.4983497Z * [new branch] gh/drisspg/196/orig -> origin/gh/drisspg/196/orig 2025-09-07T09:36:19.4986447Z * [new branch] gh/drisspg/197/base -> origin/gh/drisspg/197/base 2025-09-07T09:36:19.4988084Z * [new branch] gh/drisspg/197/head -> origin/gh/drisspg/197/head 2025-09-07T09:36:19.4990191Z * [new branch] gh/drisspg/197/orig -> origin/gh/drisspg/197/orig 2025-09-07T09:36:19.4992705Z * [new branch] gh/drisspg/198/base -> origin/gh/drisspg/198/base 2025-09-07T09:36:19.4994843Z * [new branch] gh/drisspg/198/head -> origin/gh/drisspg/198/head 2025-09-07T09:36:19.4997300Z * [new branch] gh/drisspg/198/orig -> origin/gh/drisspg/198/orig 2025-09-07T09:36:19.4999782Z * [new branch] gh/drisspg/199/base -> origin/gh/drisspg/199/base 2025-09-07T09:36:19.5001930Z * [new branch] gh/drisspg/199/head -> origin/gh/drisspg/199/head 2025-09-07T09:36:19.5003570Z * [new branch] gh/drisspg/199/orig -> origin/gh/drisspg/199/orig 2025-09-07T09:36:19.5007210Z * [new branch] gh/dsjohns2/1/base -> origin/gh/dsjohns2/1/base 2025-09-07T09:36:19.5008842Z * [new branch] gh/dsjohns2/1/head -> origin/gh/dsjohns2/1/head 2025-09-07T09:36:19.5011980Z * [new branch] gh/eellison/784/base -> origin/gh/eellison/784/base 2025-09-07T09:36:19.5013677Z * [new branch] gh/eellison/784/head -> origin/gh/eellison/784/head 2025-09-07T09:36:19.5016095Z * [new branch] gh/eellison/784/orig -> origin/gh/eellison/784/orig 2025-09-07T09:36:19.5018766Z * [new branch] gh/eellison/785/base -> origin/gh/eellison/785/base 2025-09-07T09:36:19.5020927Z * [new branch] gh/eellison/785/head -> origin/gh/eellison/785/head 2025-09-07T09:36:19.5023132Z * [new branch] gh/eellison/785/orig -> origin/gh/eellison/785/orig 2025-09-07T09:36:19.5026200Z * [new branch] gh/eellison/789/base -> origin/gh/eellison/789/base 2025-09-07T09:36:19.5027728Z * [new branch] gh/eellison/789/head -> origin/gh/eellison/789/head 2025-09-07T09:36:19.5029742Z * [new branch] gh/eellison/789/orig -> origin/gh/eellison/789/orig 2025-09-07T09:36:19.5032352Z * [new branch] gh/eellison/800/base -> origin/gh/eellison/800/base 2025-09-07T09:36:19.5034243Z * [new branch] gh/eellison/800/head -> origin/gh/eellison/800/head 2025-09-07T09:36:19.5036228Z * [new branch] gh/eellison/800/orig -> origin/gh/eellison/800/orig 2025-09-07T09:36:19.5038624Z * [new branch] gh/eellison/801/base -> origin/gh/eellison/801/base 2025-09-07T09:36:19.5040299Z * [new branch] gh/eellison/801/head -> origin/gh/eellison/801/head 2025-09-07T09:36:19.5041897Z * [new branch] gh/eellison/801/orig -> origin/gh/eellison/801/orig 2025-09-07T09:36:19.5045120Z * [new branch] gh/eellison/802/base -> origin/gh/eellison/802/base 2025-09-07T09:36:19.5047199Z * [new branch] gh/eellison/802/head -> origin/gh/eellison/802/head 2025-09-07T09:36:19.5048746Z * [new branch] gh/eellison/802/orig -> origin/gh/eellison/802/orig 2025-09-07T09:36:19.5051741Z * [new branch] gh/eellison/805/base -> origin/gh/eellison/805/base 2025-09-07T09:36:19.5053637Z * [new branch] gh/eellison/805/head -> origin/gh/eellison/805/head 2025-09-07T09:36:19.5055658Z * [new branch] gh/eellison/805/orig -> origin/gh/eellison/805/orig 2025-09-07T09:36:19.5058171Z * [new branch] gh/eellison/808/base -> origin/gh/eellison/808/base 2025-09-07T09:36:19.5060130Z * [new branch] gh/eellison/808/head -> origin/gh/eellison/808/head 2025-09-07T09:36:19.5062061Z * [new branch] gh/eellison/808/orig -> origin/gh/eellison/808/orig 2025-09-07T09:36:19.5064550Z * [new branch] gh/eellison/809/base -> origin/gh/eellison/809/base 2025-09-07T09:36:19.5067160Z * [new branch] gh/eellison/809/head -> origin/gh/eellison/809/head 2025-09-07T09:36:19.5068702Z * [new branch] gh/eellison/809/orig -> origin/gh/eellison/809/orig 2025-09-07T09:36:19.5071424Z * [new branch] gh/eellison/813/base -> origin/gh/eellison/813/base 2025-09-07T09:36:19.5073366Z * [new branch] gh/eellison/813/head -> origin/gh/eellison/813/head 2025-09-07T09:36:19.5075187Z * [new branch] gh/eellison/813/orig -> origin/gh/eellison/813/orig 2025-09-07T09:36:19.5077767Z * [new branch] gh/eellison/814/base -> origin/gh/eellison/814/base 2025-09-07T09:36:19.5079660Z * [new branch] gh/eellison/814/head -> origin/gh/eellison/814/head 2025-09-07T09:36:19.5081292Z * [new branch] gh/eellison/814/orig -> origin/gh/eellison/814/orig 2025-09-07T09:36:19.5084157Z * [new branch] gh/eellison/815/base -> origin/gh/eellison/815/base 2025-09-07T09:36:19.5086155Z * [new branch] gh/eellison/815/head -> origin/gh/eellison/815/head 2025-09-07T09:36:19.5087857Z * [new branch] gh/eellison/815/orig -> origin/gh/eellison/815/orig 2025-09-07T09:36:19.5090049Z * [new branch] gh/eellison/816/base -> origin/gh/eellison/816/base 2025-09-07T09:36:19.5092091Z * [new branch] gh/eellison/816/head -> origin/gh/eellison/816/head 2025-09-07T09:36:19.5093972Z * [new branch] gh/eellison/816/orig -> origin/gh/eellison/816/orig 2025-09-07T09:36:19.5096651Z * [new branch] gh/eellison/817/base -> origin/gh/eellison/817/base 2025-09-07T09:36:19.5098375Z * [new branch] gh/eellison/817/head -> origin/gh/eellison/817/head 2025-09-07T09:36:19.5100185Z * [new branch] gh/eellison/817/orig -> origin/gh/eellison/817/orig 2025-09-07T09:36:19.5102539Z * [new branch] gh/eellison/818/base -> origin/gh/eellison/818/base 2025-09-07T09:36:19.5104276Z * [new branch] gh/eellison/818/head -> origin/gh/eellison/818/head 2025-09-07T09:36:19.5106801Z * [new branch] gh/eellison/818/orig -> origin/gh/eellison/818/orig 2025-09-07T09:36:19.5109523Z * [new branch] gh/eellison/819/base -> origin/gh/eellison/819/base 2025-09-07T09:36:19.5110908Z * [new branch] gh/eellison/819/head -> origin/gh/eellison/819/head 2025-09-07T09:36:19.5112576Z * [new branch] gh/eellison/819/orig -> origin/gh/eellison/819/orig 2025-09-07T09:36:19.5115129Z * [new branch] gh/eellison/820/base -> origin/gh/eellison/820/base 2025-09-07T09:36:19.5117517Z * [new branch] gh/eellison/820/head -> origin/gh/eellison/820/head 2025-09-07T09:36:19.5119361Z * [new branch] gh/eellison/820/orig -> origin/gh/eellison/820/orig 2025-09-07T09:36:19.5122230Z * [new branch] gh/eellison/821/base -> origin/gh/eellison/821/base 2025-09-07T09:36:19.5124004Z * [new branch] gh/eellison/821/head -> origin/gh/eellison/821/head 2025-09-07T09:36:19.5126019Z * [new branch] gh/eellison/821/orig -> origin/gh/eellison/821/orig 2025-09-07T09:36:19.5128757Z * [new branch] gh/eellison/822/base -> origin/gh/eellison/822/base 2025-09-07T09:36:19.5130450Z * [new branch] gh/eellison/822/head -> origin/gh/eellison/822/head 2025-09-07T09:36:19.5132183Z * [new branch] gh/eellison/822/orig -> origin/gh/eellison/822/orig 2025-09-07T09:36:19.5134711Z * [new branch] gh/eellison/823/base -> origin/gh/eellison/823/base 2025-09-07T09:36:19.5136667Z * [new branch] gh/eellison/823/head -> origin/gh/eellison/823/head 2025-09-07T09:36:19.5138522Z * [new branch] gh/eellison/823/orig -> origin/gh/eellison/823/orig 2025-09-07T09:36:19.5141917Z * [new branch] gh/etaf/132/base -> origin/gh/etaf/132/base 2025-09-07T09:36:19.5143953Z * [new branch] gh/etaf/132/head -> origin/gh/etaf/132/head 2025-09-07T09:36:19.5146253Z * [new branch] gh/etaf/132/orig -> origin/gh/etaf/132/orig 2025-09-07T09:36:19.5148910Z * [new branch] gh/etaf/138/base -> origin/gh/etaf/138/base 2025-09-07T09:36:19.5150698Z * [new branch] gh/etaf/138/head -> origin/gh/etaf/138/head 2025-09-07T09:36:19.5152417Z * [new branch] gh/etaf/138/orig -> origin/gh/etaf/138/orig 2025-09-07T09:36:19.5155533Z * [new branch] gh/etaf/140/base -> origin/gh/etaf/140/base 2025-09-07T09:36:19.5157156Z * [new branch] gh/etaf/140/head -> origin/gh/etaf/140/head 2025-09-07T09:36:19.5158712Z * [new branch] gh/etaf/140/orig -> origin/gh/etaf/140/orig 2025-09-07T09:36:19.5161322Z * [new branch] gh/etaf/143/base -> origin/gh/etaf/143/base 2025-09-07T09:36:19.5162928Z * [new branch] gh/etaf/143/head -> origin/gh/etaf/143/head 2025-09-07T09:36:19.5164882Z * [new branch] gh/etaf/143/orig -> origin/gh/etaf/143/orig 2025-09-07T09:36:19.5167728Z * [new branch] gh/etaf/147/base -> origin/gh/etaf/147/base 2025-09-07T09:36:19.5169687Z * [new branch] gh/etaf/147/head -> origin/gh/etaf/147/head 2025-09-07T09:36:19.5172321Z * [new branch] gh/etaf/151/base -> origin/gh/etaf/151/base 2025-09-07T09:36:19.5174043Z * [new branch] gh/etaf/151/head -> origin/gh/etaf/151/head 2025-09-07T09:36:19.5176120Z * [new branch] gh/etaf/151/orig -> origin/gh/etaf/151/orig 2025-09-07T09:36:19.5178826Z * [new branch] gh/etaf/152/base -> origin/gh/etaf/152/base 2025-09-07T09:36:19.5180809Z * [new branch] gh/etaf/152/head -> origin/gh/etaf/152/head 2025-09-07T09:36:19.5182664Z * [new branch] gh/etaf/152/orig -> origin/gh/etaf/152/orig 2025-09-07T09:36:19.5185438Z * [new branch] gh/etaf/153/base -> origin/gh/etaf/153/base 2025-09-07T09:36:19.5187619Z * [new branch] gh/etaf/153/head -> origin/gh/etaf/153/head 2025-09-07T09:36:19.5189254Z * [new branch] gh/etaf/153/orig -> origin/gh/etaf/153/orig 2025-09-07T09:36:19.5192098Z * [new branch] gh/etaf/154/base -> origin/gh/etaf/154/base 2025-09-07T09:36:19.5193914Z * [new branch] gh/etaf/154/head -> origin/gh/etaf/154/head 2025-09-07T09:36:19.5195945Z * [new branch] gh/etaf/154/orig -> origin/gh/etaf/154/orig 2025-09-07T09:36:19.5198472Z * [new branch] gh/etaf/155/base -> origin/gh/etaf/155/base 2025-09-07T09:36:19.5200514Z * [new branch] gh/etaf/155/head -> origin/gh/etaf/155/head 2025-09-07T09:36:19.5202145Z * [new branch] gh/etaf/155/orig -> origin/gh/etaf/155/orig 2025-09-07T09:36:19.5204497Z * [new branch] gh/etaf/156/base -> origin/gh/etaf/156/base 2025-09-07T09:36:19.5206721Z * [new branch] gh/etaf/156/head -> origin/gh/etaf/156/head 2025-09-07T09:36:19.5208393Z * [new branch] gh/etaf/156/orig -> origin/gh/etaf/156/orig 2025-09-07T09:36:19.5210752Z * [new branch] gh/etaf/157/base -> origin/gh/etaf/157/base 2025-09-07T09:36:19.5212890Z * [new branch] gh/etaf/157/head -> origin/gh/etaf/157/head 2025-09-07T09:36:19.5214831Z * [new branch] gh/etaf/157/orig -> origin/gh/etaf/157/orig 2025-09-07T09:36:19.5217938Z * [new branch] gh/etaf/158/base -> origin/gh/etaf/158/base 2025-09-07T09:36:19.5220176Z * [new branch] gh/etaf/158/head -> origin/gh/etaf/158/head 2025-09-07T09:36:19.5221296Z * [new branch] gh/etaf/158/orig -> origin/gh/etaf/158/orig 2025-09-07T09:36:19.5223818Z * [new branch] gh/etaf/159/base -> origin/gh/etaf/159/base 2025-09-07T09:36:19.5226398Z * [new branch] gh/etaf/159/head -> origin/gh/etaf/159/head 2025-09-07T09:36:19.5228034Z * [new branch] gh/etaf/159/orig -> origin/gh/etaf/159/orig 2025-09-07T09:36:19.5230471Z * [new branch] gh/etaf/160/base -> origin/gh/etaf/160/base 2025-09-07T09:36:19.5232485Z * [new branch] gh/etaf/160/head -> origin/gh/etaf/160/head 2025-09-07T09:36:19.5234529Z * [new branch] gh/etaf/160/orig -> origin/gh/etaf/160/orig 2025-09-07T09:36:19.5237374Z * [new branch] gh/etaf/161/base -> origin/gh/etaf/161/base 2025-09-07T09:36:19.5238921Z * [new branch] gh/etaf/161/head -> origin/gh/etaf/161/head 2025-09-07T09:36:19.5241004Z * [new branch] gh/etaf/161/orig -> origin/gh/etaf/161/orig 2025-09-07T09:36:19.5244041Z * [new branch] gh/etaf/162/base -> origin/gh/etaf/162/base 2025-09-07T09:36:19.5245880Z * [new branch] gh/etaf/162/head -> origin/gh/etaf/162/head 2025-09-07T09:36:19.5247740Z * [new branch] gh/etaf/162/orig -> origin/gh/etaf/162/orig 2025-09-07T09:36:19.5250313Z * [new branch] gh/etaf/163/base -> origin/gh/etaf/163/base 2025-09-07T09:36:19.5251848Z * [new branch] gh/etaf/163/head -> origin/gh/etaf/163/head 2025-09-07T09:36:19.5253539Z * [new branch] gh/etaf/163/orig -> origin/gh/etaf/163/orig 2025-09-07T09:36:19.5256180Z * [new branch] gh/etaf/164/base -> origin/gh/etaf/164/base 2025-09-07T09:36:19.5257926Z * [new branch] gh/etaf/164/head -> origin/gh/etaf/164/head 2025-09-07T09:36:19.5259895Z * [new branch] gh/etaf/164/orig -> origin/gh/etaf/164/orig 2025-09-07T09:36:19.5262879Z * [new branch] gh/etaf/165/base -> origin/gh/etaf/165/base 2025-09-07T09:36:19.5264506Z * [new branch] gh/etaf/165/orig -> origin/gh/etaf/165/orig 2025-09-07T09:36:19.5266944Z * [new branch] gh/etaf/166/base -> origin/gh/etaf/166/base 2025-09-07T09:36:19.5268655Z * [new branch] gh/etaf/166/head -> origin/gh/etaf/166/head 2025-09-07T09:36:19.5270323Z * [new branch] gh/etaf/166/orig -> origin/gh/etaf/166/orig 2025-09-07T09:36:19.5273361Z * [new branch] gh/etaf/167/base -> origin/gh/etaf/167/base 2025-09-07T09:36:19.5274738Z * [new branch] gh/etaf/167/head -> origin/gh/etaf/167/head 2025-09-07T09:36:19.5276805Z * [new branch] gh/etaf/167/orig -> origin/gh/etaf/167/orig 2025-09-07T09:36:19.5278908Z * [new branch] gh/etaf/168/base -> origin/gh/etaf/168/base 2025-09-07T09:36:19.5280641Z * [new branch] gh/etaf/168/head -> origin/gh/etaf/168/head 2025-09-07T09:36:19.5282799Z * [new branch] gh/etaf/168/orig -> origin/gh/etaf/168/orig 2025-09-07T09:36:19.5285865Z * [new branch] gh/etaf/169/base -> origin/gh/etaf/169/base 2025-09-07T09:36:19.5287439Z * [new branch] gh/etaf/169/head -> origin/gh/etaf/169/head 2025-09-07T09:36:19.5289009Z * [new branch] gh/etaf/169/orig -> origin/gh/etaf/169/orig 2025-09-07T09:36:19.5292812Z * [new branch] gh/exclamaforte/1/base -> origin/gh/exclamaforte/1/base 2025-09-07T09:36:19.5294425Z * [new branch] gh/exclamaforte/1/head -> origin/gh/exclamaforte/1/head 2025-09-07T09:36:19.5297179Z * [new branch] gh/exclamaforte/2/base -> origin/gh/exclamaforte/2/base 2025-09-07T09:36:19.5298645Z * [new branch] gh/exclamaforte/2/head -> origin/gh/exclamaforte/2/head 2025-09-07T09:36:19.5301572Z * [new branch] gh/exclamaforte/3/base -> origin/gh/exclamaforte/3/base 2025-09-07T09:36:19.5303365Z * [new branch] gh/exclamaforte/3/head -> origin/gh/exclamaforte/3/head 2025-09-07T09:36:19.5306271Z * [new branch] gh/exclamaforte/4/base -> origin/gh/exclamaforte/4/base 2025-09-07T09:36:19.5308366Z * [new branch] gh/exclamaforte/4/head -> origin/gh/exclamaforte/4/head 2025-09-07T09:36:19.5311465Z * [new branch] gh/ezyang/2374/base -> origin/gh/ezyang/2374/base 2025-09-07T09:36:19.5313225Z * [new branch] gh/ezyang/2374/head -> origin/gh/ezyang/2374/head 2025-09-07T09:36:19.5314860Z * [new branch] gh/ezyang/2374/orig -> origin/gh/ezyang/2374/orig 2025-09-07T09:36:19.5317632Z * [new branch] gh/ezyang/2973/base -> origin/gh/ezyang/2973/base 2025-09-07T09:36:19.5319729Z * [new branch] gh/ezyang/2973/head -> origin/gh/ezyang/2973/head 2025-09-07T09:36:19.5321371Z * [new branch] gh/ezyang/2973/orig -> origin/gh/ezyang/2973/orig 2025-09-07T09:36:19.5323958Z * [new branch] gh/ezyang/2974/base -> origin/gh/ezyang/2974/base 2025-09-07T09:36:19.5326368Z * [new branch] gh/ezyang/2974/head -> origin/gh/ezyang/2974/head 2025-09-07T09:36:19.5328466Z * [new branch] gh/ezyang/2974/orig -> origin/gh/ezyang/2974/orig 2025-09-07T09:36:19.5330248Z * [new branch] gh/ezyang/3074/base -> origin/gh/ezyang/3074/base 2025-09-07T09:36:19.5331899Z * [new branch] gh/ezyang/3074/head -> origin/gh/ezyang/3074/head 2025-09-07T09:36:19.5333453Z * [new branch] gh/ezyang/3074/orig -> origin/gh/ezyang/3074/orig 2025-09-07T09:36:19.5336673Z * [new branch] gh/ezyang/3088/base -> origin/gh/ezyang/3088/base 2025-09-07T09:36:19.5338638Z * [new branch] gh/ezyang/3088/head -> origin/gh/ezyang/3088/head 2025-09-07T09:36:19.5340274Z * [new branch] gh/ezyang/3088/orig -> origin/gh/ezyang/3088/orig 2025-09-07T09:36:19.5342918Z * [new branch] gh/ezyang/3092/base -> origin/gh/ezyang/3092/base 2025-09-07T09:36:19.5344322Z * [new branch] gh/ezyang/3092/head -> origin/gh/ezyang/3092/head 2025-09-07T09:36:19.5346534Z * [new branch] gh/ezyang/3092/orig -> origin/gh/ezyang/3092/orig 2025-09-07T09:36:19.5349114Z * [new branch] gh/ezyang/3103/base -> origin/gh/ezyang/3103/base 2025-09-07T09:36:19.5350921Z * [new branch] gh/ezyang/3103/head -> origin/gh/ezyang/3103/head 2025-09-07T09:36:19.5352465Z * [new branch] gh/ezyang/3103/orig -> origin/gh/ezyang/3103/orig 2025-09-07T09:36:19.5354767Z * [new branch] gh/ezyang/3105/base -> origin/gh/ezyang/3105/base 2025-09-07T09:36:19.5356621Z * [new branch] gh/ezyang/3105/head -> origin/gh/ezyang/3105/head 2025-09-07T09:36:19.5358542Z * [new branch] gh/ezyang/3105/orig -> origin/gh/ezyang/3105/orig 2025-09-07T09:36:19.5361122Z * [new branch] gh/ezyang/3114/base -> origin/gh/ezyang/3114/base 2025-09-07T09:36:19.5362839Z * [new branch] gh/ezyang/3114/head -> origin/gh/ezyang/3114/head 2025-09-07T09:36:19.5364859Z * [new branch] gh/ezyang/3114/orig -> origin/gh/ezyang/3114/orig 2025-09-07T09:36:19.5368105Z * [new branch] gh/ezyang/3116/base -> origin/gh/ezyang/3116/base 2025-09-07T09:36:19.5369639Z * [new branch] gh/ezyang/3116/head -> origin/gh/ezyang/3116/head 2025-09-07T09:36:19.5371297Z * [new branch] gh/ezyang/3116/orig -> origin/gh/ezyang/3116/orig 2025-09-07T09:36:19.5374211Z * [new branch] gh/ezyang/3120/base -> origin/gh/ezyang/3120/base 2025-09-07T09:36:19.5376547Z * [new branch] gh/ezyang/3120/head -> origin/gh/ezyang/3120/head 2025-09-07T09:36:19.5378635Z * [new branch] gh/ezyang/3120/orig -> origin/gh/ezyang/3120/orig 2025-09-07T09:36:19.5381716Z * [new branch] gh/ezyang/3122/base -> origin/gh/ezyang/3122/base 2025-09-07T09:36:19.5383412Z * [new branch] gh/ezyang/3122/head -> origin/gh/ezyang/3122/head 2025-09-07T09:36:19.5385733Z * [new branch] gh/ezyang/3122/orig -> origin/gh/ezyang/3122/orig 2025-09-07T09:36:19.5388560Z * [new branch] gh/ezyang/3123/base -> origin/gh/ezyang/3123/base 2025-09-07T09:36:19.5390454Z * [new branch] gh/ezyang/3123/head -> origin/gh/ezyang/3123/head 2025-09-07T09:36:19.5391947Z * [new branch] gh/ezyang/3123/orig -> origin/gh/ezyang/3123/orig 2025-09-07T09:36:19.5394179Z * [new branch] gh/ezyang/3125/base -> origin/gh/ezyang/3125/base 2025-09-07T09:36:19.5396211Z * [new branch] gh/ezyang/3125/head -> origin/gh/ezyang/3125/head 2025-09-07T09:36:19.5397917Z * [new branch] gh/ezyang/3125/orig -> origin/gh/ezyang/3125/orig 2025-09-07T09:36:19.5400336Z * [new branch] gh/ezyang/3126/base -> origin/gh/ezyang/3126/base 2025-09-07T09:36:19.5401780Z * [new branch] gh/ezyang/3126/head -> origin/gh/ezyang/3126/head 2025-09-07T09:36:19.5403396Z * [new branch] gh/ezyang/3126/orig -> origin/gh/ezyang/3126/orig 2025-09-07T09:36:19.5406137Z * [new branch] gh/ezyang/3127/base -> origin/gh/ezyang/3127/base 2025-09-07T09:36:19.5408073Z * [new branch] gh/ezyang/3127/head -> origin/gh/ezyang/3127/head 2025-09-07T09:36:19.5410060Z * [new branch] gh/ezyang/3127/orig -> origin/gh/ezyang/3127/orig 2025-09-07T09:36:19.5412691Z * [new branch] gh/ezyang/3128/base -> origin/gh/ezyang/3128/base 2025-09-07T09:36:19.5414200Z * [new branch] gh/ezyang/3128/head -> origin/gh/ezyang/3128/head 2025-09-07T09:36:19.5416300Z * [new branch] gh/ezyang/3128/orig -> origin/gh/ezyang/3128/orig 2025-09-07T09:36:19.5419048Z * [new branch] gh/ezyang/3129/base -> origin/gh/ezyang/3129/base 2025-09-07T09:36:19.5421186Z * [new branch] gh/ezyang/3129/head -> origin/gh/ezyang/3129/head 2025-09-07T09:36:19.5423303Z * [new branch] gh/ezyang/3129/orig -> origin/gh/ezyang/3129/orig 2025-09-07T09:36:19.5426553Z * [new branch] gh/ezyang/3130/base -> origin/gh/ezyang/3130/base 2025-09-07T09:36:19.5428582Z * [new branch] gh/ezyang/3130/head -> origin/gh/ezyang/3130/head 2025-09-07T09:36:19.5430697Z * [new branch] gh/ezyang/3130/orig -> origin/gh/ezyang/3130/orig 2025-09-07T09:36:19.5433047Z * [new branch] gh/ezyang/3131/base -> origin/gh/ezyang/3131/base 2025-09-07T09:36:19.5434821Z * [new branch] gh/ezyang/3131/head -> origin/gh/ezyang/3131/head 2025-09-07T09:36:19.5437121Z * [new branch] gh/ezyang/3131/orig -> origin/gh/ezyang/3131/orig 2025-09-07T09:36:19.5439372Z * [new branch] gh/ezyang/3132/base -> origin/gh/ezyang/3132/base 2025-09-07T09:36:19.5441355Z * [new branch] gh/ezyang/3132/head -> origin/gh/ezyang/3132/head 2025-09-07T09:36:19.5443180Z * [new branch] gh/ezyang/3132/orig -> origin/gh/ezyang/3132/orig 2025-09-07T09:36:19.5445700Z * [new branch] gh/ezyang/3133/base -> origin/gh/ezyang/3133/base 2025-09-07T09:36:19.5447682Z * [new branch] gh/ezyang/3133/head -> origin/gh/ezyang/3133/head 2025-09-07T09:36:19.5449618Z * [new branch] gh/ezyang/3133/orig -> origin/gh/ezyang/3133/orig 2025-09-07T09:36:19.5451906Z * [new branch] gh/ezyang/3134/base -> origin/gh/ezyang/3134/base 2025-09-07T09:36:19.5453988Z * [new branch] gh/ezyang/3134/head -> origin/gh/ezyang/3134/head 2025-09-07T09:36:19.5456402Z * [new branch] gh/ezyang/3134/orig -> origin/gh/ezyang/3134/orig 2025-09-07T09:36:19.5458722Z * [new branch] gh/ezyang/3135/base -> origin/gh/ezyang/3135/base 2025-09-07T09:36:19.5460583Z * [new branch] gh/ezyang/3135/head -> origin/gh/ezyang/3135/head 2025-09-07T09:36:19.5462417Z * [new branch] gh/ezyang/3135/orig -> origin/gh/ezyang/3135/orig 2025-09-07T09:36:19.5465701Z * [new branch] gh/ezyang/3136/base -> origin/gh/ezyang/3136/base 2025-09-07T09:36:19.5467247Z * [new branch] gh/ezyang/3136/head -> origin/gh/ezyang/3136/head 2025-09-07T09:36:19.5469112Z * [new branch] gh/ezyang/3136/orig -> origin/gh/ezyang/3136/orig 2025-09-07T09:36:19.5471697Z * [new branch] gh/ezyang/3137/base -> origin/gh/ezyang/3137/base 2025-09-07T09:36:19.5473206Z * [new branch] gh/ezyang/3137/head -> origin/gh/ezyang/3137/head 2025-09-07T09:36:19.5474826Z * [new branch] gh/ezyang/3137/orig -> origin/gh/ezyang/3137/orig 2025-09-07T09:36:19.5477917Z * [new branch] gh/ezyang/3138/base -> origin/gh/ezyang/3138/base 2025-09-07T09:36:19.5479487Z * [new branch] gh/ezyang/3138/head -> origin/gh/ezyang/3138/head 2025-09-07T09:36:19.5481204Z * [new branch] gh/ezyang/3138/orig -> origin/gh/ezyang/3138/orig 2025-09-07T09:36:19.5483752Z * [new branch] gh/ezyang/3139/base -> origin/gh/ezyang/3139/base 2025-09-07T09:36:19.5485511Z * [new branch] gh/ezyang/3139/head -> origin/gh/ezyang/3139/head 2025-09-07T09:36:19.5487393Z * [new branch] gh/ezyang/3139/orig -> origin/gh/ezyang/3139/orig 2025-09-07T09:36:19.5489903Z * [new branch] gh/ezyang/3140/base -> origin/gh/ezyang/3140/base 2025-09-07T09:36:19.5491908Z * [new branch] gh/ezyang/3140/head -> origin/gh/ezyang/3140/head 2025-09-07T09:36:19.5493285Z * [new branch] gh/ezyang/3140/orig -> origin/gh/ezyang/3140/orig 2025-09-07T09:36:19.5496101Z * [new branch] gh/ezyang/3141/base -> origin/gh/ezyang/3141/base 2025-09-07T09:36:19.5497768Z * [new branch] gh/ezyang/3141/head -> origin/gh/ezyang/3141/head 2025-09-07T09:36:19.5499450Z * [new branch] gh/ezyang/3141/orig -> origin/gh/ezyang/3141/orig 2025-09-07T09:36:19.5501734Z * [new branch] gh/ezyang/3142/base -> origin/gh/ezyang/3142/base 2025-09-07T09:36:19.5503634Z * [new branch] gh/ezyang/3142/head -> origin/gh/ezyang/3142/head 2025-09-07T09:36:19.5505852Z * [new branch] gh/ezyang/3142/orig -> origin/gh/ezyang/3142/orig 2025-09-07T09:36:19.5508269Z * [new branch] gh/ezyang/3143/base -> origin/gh/ezyang/3143/base 2025-09-07T09:36:19.5510245Z * [new branch] gh/ezyang/3143/head -> origin/gh/ezyang/3143/head 2025-09-07T09:36:19.5511990Z * [new branch] gh/ezyang/3143/orig -> origin/gh/ezyang/3143/orig 2025-09-07T09:36:19.5515147Z * [new branch] gh/fadara01/1/base -> origin/gh/fadara01/1/base 2025-09-07T09:36:19.5519029Z * [new branch] gh/fadara01/1/head -> origin/gh/fadara01/1/head 2025-09-07T09:36:19.5520814Z * [new branch] gh/fadara01/1/orig -> origin/gh/fadara01/1/orig 2025-09-07T09:36:19.5523586Z * [new branch] gh/fduwjj/171/base -> origin/gh/fduwjj/171/base 2025-09-07T09:36:19.5525664Z * [new branch] gh/fduwjj/171/head -> origin/gh/fduwjj/171/head 2025-09-07T09:36:19.5527779Z * [new branch] gh/fduwjj/171/orig -> origin/gh/fduwjj/171/orig 2025-09-07T09:36:19.5530514Z * [new branch] gh/fduwjj/175/base -> origin/gh/fduwjj/175/base 2025-09-07T09:36:19.5532382Z * [new branch] gh/fduwjj/175/head -> origin/gh/fduwjj/175/head 2025-09-07T09:36:19.5534321Z * [new branch] gh/fduwjj/175/orig -> origin/gh/fduwjj/175/orig 2025-09-07T09:36:19.5537226Z * [new branch] gh/fduwjj/176/base -> origin/gh/fduwjj/176/base 2025-09-07T09:36:19.5559632Z * [new branch] gh/fduwjj/176/head -> origin/gh/fduwjj/176/head 2025-09-07T09:36:19.5560076Z * [new branch] gh/fduwjj/176/orig -> origin/gh/fduwjj/176/orig 2025-09-07T09:36:19.5560455Z * [new branch] gh/fduwjj/177/base -> origin/gh/fduwjj/177/base 2025-09-07T09:36:19.5560811Z * [new branch] gh/fduwjj/177/head -> origin/gh/fduwjj/177/head 2025-09-07T09:36:19.5561169Z * [new branch] gh/fduwjj/177/orig -> origin/gh/fduwjj/177/orig 2025-09-07T09:36:19.5561526Z * [new branch] gh/fduwjj/178/base -> origin/gh/fduwjj/178/base 2025-09-07T09:36:19.5561885Z * [new branch] gh/fduwjj/178/head -> origin/gh/fduwjj/178/head 2025-09-07T09:36:19.5562240Z * [new branch] gh/fduwjj/178/orig -> origin/gh/fduwjj/178/orig 2025-09-07T09:36:19.5562586Z * [new branch] gh/fduwjj/179/base -> origin/gh/fduwjj/179/base 2025-09-07T09:36:19.5562945Z * [new branch] gh/fduwjj/179/head -> origin/gh/fduwjj/179/head 2025-09-07T09:36:19.5563306Z * [new branch] gh/fduwjj/179/orig -> origin/gh/fduwjj/179/orig 2025-09-07T09:36:19.5563657Z * [new branch] gh/fduwjj/180/base -> origin/gh/fduwjj/180/base 2025-09-07T09:36:19.5564555Z * [new branch] gh/fduwjj/180/head -> origin/gh/fduwjj/180/head 2025-09-07T09:36:19.5566548Z * [new branch] gh/fduwjj/180/orig -> origin/gh/fduwjj/180/orig 2025-09-07T09:36:19.5568718Z * [new branch] gh/fduwjj/181/base -> origin/gh/fduwjj/181/base 2025-09-07T09:36:19.5570889Z * [new branch] gh/fduwjj/181/head -> origin/gh/fduwjj/181/head 2025-09-07T09:36:19.5572610Z * [new branch] gh/fduwjj/181/orig -> origin/gh/fduwjj/181/orig 2025-09-07T09:36:19.5575539Z * [new branch] gh/fduwjj/182/base -> origin/gh/fduwjj/182/base 2025-09-07T09:36:19.5577314Z * [new branch] gh/fduwjj/182/head -> origin/gh/fduwjj/182/head 2025-09-07T09:36:19.5579368Z * [new branch] gh/fduwjj/182/orig -> origin/gh/fduwjj/182/orig 2025-09-07T09:36:19.5581972Z * [new branch] gh/fduwjj/183/base -> origin/gh/fduwjj/183/base 2025-09-07T09:36:19.5583780Z * [new branch] gh/fduwjj/183/head -> origin/gh/fduwjj/183/head 2025-09-07T09:36:19.5586137Z * [new branch] gh/fduwjj/183/orig -> origin/gh/fduwjj/183/orig 2025-09-07T09:36:19.5588744Z * [new branch] gh/fduwjj/184/base -> origin/gh/fduwjj/184/base 2025-09-07T09:36:19.5590236Z * [new branch] gh/fduwjj/184/head -> origin/gh/fduwjj/184/head 2025-09-07T09:36:19.5591728Z * [new branch] gh/fduwjj/184/orig -> origin/gh/fduwjj/184/orig 2025-09-07T09:36:19.5594120Z * [new branch] gh/fduwjj/185/base -> origin/gh/fduwjj/185/base 2025-09-07T09:36:19.5595919Z * [new branch] gh/fduwjj/185/head -> origin/gh/fduwjj/185/head 2025-09-07T09:36:19.5597489Z * [new branch] gh/fduwjj/185/orig -> origin/gh/fduwjj/185/orig 2025-09-07T09:36:19.5599578Z * [new branch] gh/fduwjj/186/base -> origin/gh/fduwjj/186/base 2025-09-07T09:36:19.5601385Z * [new branch] gh/fduwjj/186/head -> origin/gh/fduwjj/186/head 2025-09-07T09:36:19.5603030Z * [new branch] gh/fduwjj/186/orig -> origin/gh/fduwjj/186/orig 2025-09-07T09:36:19.5605373Z * [new branch] gh/fduwjj/187/base -> origin/gh/fduwjj/187/base 2025-09-07T09:36:19.5607028Z * [new branch] gh/fduwjj/187/head -> origin/gh/fduwjj/187/head 2025-09-07T09:36:19.5608990Z * [new branch] gh/fduwjj/187/orig -> origin/gh/fduwjj/187/orig 2025-09-07T09:36:19.5611393Z * [new branch] gh/fduwjj/188/base -> origin/gh/fduwjj/188/base 2025-09-07T09:36:19.5612898Z * [new branch] gh/fduwjj/188/head -> origin/gh/fduwjj/188/head 2025-09-07T09:36:19.5614544Z * [new branch] gh/fduwjj/188/orig -> origin/gh/fduwjj/188/orig 2025-09-07T09:36:19.5617664Z * [new branch] gh/fduwjj/189/base -> origin/gh/fduwjj/189/base 2025-09-07T09:36:19.5618918Z * [new branch] gh/fduwjj/189/head -> origin/gh/fduwjj/189/head 2025-09-07T09:36:19.5620506Z * [new branch] gh/fduwjj/189/orig -> origin/gh/fduwjj/189/orig 2025-09-07T09:36:19.5623498Z * [new branch] gh/fduwjj/190/base -> origin/gh/fduwjj/190/base 2025-09-07T09:36:19.5625105Z * [new branch] gh/fduwjj/190/head -> origin/gh/fduwjj/190/head 2025-09-07T09:36:19.5626974Z * [new branch] gh/fduwjj/190/orig -> origin/gh/fduwjj/190/orig 2025-09-07T09:36:19.5628873Z * [new branch] gh/fduwjj/191/base -> origin/gh/fduwjj/191/base 2025-09-07T09:36:19.5630700Z * [new branch] gh/fduwjj/191/head -> origin/gh/fduwjj/191/head 2025-09-07T09:36:19.5632744Z * [new branch] gh/fduwjj/191/orig -> origin/gh/fduwjj/191/orig 2025-09-07T09:36:19.5636517Z * [new branch] gh/fegin/306/base -> origin/gh/fegin/306/base 2025-09-07T09:36:19.5638116Z * [new branch] gh/fegin/306/head -> origin/gh/fegin/306/head 2025-09-07T09:36:19.5639535Z * [new branch] gh/fegin/306/orig -> origin/gh/fegin/306/orig 2025-09-07T09:36:19.5643140Z * [new branch] gh/fegin/307/base -> origin/gh/fegin/307/base 2025-09-07T09:36:19.5643528Z * [new branch] gh/fegin/307/head -> origin/gh/fegin/307/head 2025-09-07T09:36:19.5645230Z * [new branch] gh/fegin/307/orig -> origin/gh/fegin/307/orig 2025-09-07T09:36:19.5647456Z * [new branch] gh/fegin/308/base -> origin/gh/fegin/308/base 2025-09-07T09:36:19.5649018Z * [new branch] gh/fegin/308/head -> origin/gh/fegin/308/head 2025-09-07T09:36:19.5650994Z * [new branch] gh/fegin/308/orig -> origin/gh/fegin/308/orig 2025-09-07T09:36:19.5652867Z * [new branch] gh/fegin/309/base -> origin/gh/fegin/309/base 2025-09-07T09:36:19.5654446Z * [new branch] gh/fegin/309/head -> origin/gh/fegin/309/head 2025-09-07T09:36:19.5656351Z * [new branch] gh/fegin/309/orig -> origin/gh/fegin/309/orig 2025-09-07T09:36:19.5658437Z * [new branch] gh/fegin/310/base -> origin/gh/fegin/310/base 2025-09-07T09:36:19.5659951Z * [new branch] gh/fegin/310/head -> origin/gh/fegin/310/head 2025-09-07T09:36:19.5661606Z * [new branch] gh/fegin/310/orig -> origin/gh/fegin/310/orig 2025-09-07T09:36:19.5663838Z * [new branch] gh/fegin/311/base -> origin/gh/fegin/311/base 2025-09-07T09:36:19.5665747Z * [new branch] gh/fegin/311/head -> origin/gh/fegin/311/head 2025-09-07T09:36:19.5667342Z * [new branch] gh/fegin/311/orig -> origin/gh/fegin/311/orig 2025-09-07T09:36:19.5669569Z * [new branch] gh/fegin/312/base -> origin/gh/fegin/312/base 2025-09-07T09:36:19.5670996Z * [new branch] gh/fegin/312/head -> origin/gh/fegin/312/head 2025-09-07T09:36:19.5672606Z * [new branch] gh/fegin/312/orig -> origin/gh/fegin/312/orig 2025-09-07T09:36:19.5674770Z * [new branch] gh/fegin/313/base -> origin/gh/fegin/313/base 2025-09-07T09:36:19.5676663Z * [new branch] gh/fegin/313/head -> origin/gh/fegin/313/head 2025-09-07T09:36:19.5678187Z * [new branch] gh/fegin/313/orig -> origin/gh/fegin/313/orig 2025-09-07T09:36:19.5680862Z * [new branch] gh/fffrog/124/base -> origin/gh/fffrog/124/base 2025-09-07T09:36:19.5684588Z * [new branch] gh/fffrog/124/head -> origin/gh/fffrog/124/head 2025-09-07T09:36:19.5686728Z * [new branch] gh/fffrog/124/orig -> origin/gh/fffrog/124/orig 2025-09-07T09:36:19.5687143Z * [new branch] gh/fffrog/129/base -> origin/gh/fffrog/129/base 2025-09-07T09:36:19.5687944Z * [new branch] gh/fffrog/129/head -> origin/gh/fffrog/129/head 2025-09-07T09:36:19.5691529Z * [new branch] gh/fffrog/129/orig -> origin/gh/fffrog/129/orig 2025-09-07T09:36:19.5693746Z * [new branch] gh/fffrog/130/base -> origin/gh/fffrog/130/base 2025-09-07T09:36:19.5695334Z * [new branch] gh/fffrog/130/head -> origin/gh/fffrog/130/head 2025-09-07T09:36:19.5697100Z * [new branch] gh/fffrog/130/orig -> origin/gh/fffrog/130/orig 2025-09-07T09:36:19.5699263Z * [new branch] gh/fffrog/131/base -> origin/gh/fffrog/131/base 2025-09-07T09:36:19.5700822Z * [new branch] gh/fffrog/131/head -> origin/gh/fffrog/131/head 2025-09-07T09:36:19.5702563Z * [new branch] gh/fffrog/131/orig -> origin/gh/fffrog/131/orig 2025-09-07T09:36:19.5704705Z * [new branch] gh/fffrog/132/base -> origin/gh/fffrog/132/base 2025-09-07T09:36:19.5706540Z * [new branch] gh/fffrog/132/head -> origin/gh/fffrog/132/head 2025-09-07T09:36:19.5708143Z * [new branch] gh/fffrog/132/orig -> origin/gh/fffrog/132/orig 2025-09-07T09:36:19.5710479Z * [new branch] gh/fffrog/133/base -> origin/gh/fffrog/133/base 2025-09-07T09:36:19.5711859Z * [new branch] gh/fffrog/133/head -> origin/gh/fffrog/133/head 2025-09-07T09:36:19.5713450Z * [new branch] gh/fffrog/133/orig -> origin/gh/fffrog/133/orig 2025-09-07T09:36:19.5715860Z * [new branch] gh/fffrog/134/base -> origin/gh/fffrog/134/base 2025-09-07T09:36:19.5717417Z * [new branch] gh/fffrog/134/head -> origin/gh/fffrog/134/head 2025-09-07T09:36:19.5719201Z * [new branch] gh/fffrog/134/orig -> origin/gh/fffrog/134/orig 2025-09-07T09:36:19.5721201Z * [new branch] gh/fffrog/135/base -> origin/gh/fffrog/135/base 2025-09-07T09:36:19.5722949Z * [new branch] gh/fffrog/135/head -> origin/gh/fffrog/135/head 2025-09-07T09:36:19.5724286Z * [new branch] gh/fffrog/135/orig -> origin/gh/fffrog/135/orig 2025-09-07T09:36:19.5726755Z * [new branch] gh/fffrog/136/base -> origin/gh/fffrog/136/base 2025-09-07T09:36:19.5728246Z * [new branch] gh/fffrog/136/head -> origin/gh/fffrog/136/head 2025-09-07T09:36:19.5729838Z * [new branch] gh/fffrog/136/orig -> origin/gh/fffrog/136/orig 2025-09-07T09:36:19.5732038Z * [new branch] gh/fffrog/137/base -> origin/gh/fffrog/137/base 2025-09-07T09:36:19.5733530Z * [new branch] gh/fffrog/137/head -> origin/gh/fffrog/137/head 2025-09-07T09:36:19.5735417Z * [new branch] gh/fffrog/137/orig -> origin/gh/fffrog/137/orig 2025-09-07T09:36:19.5737772Z * [new branch] gh/fffrog/138/base -> origin/gh/fffrog/138/base 2025-09-07T09:36:19.5739337Z * [new branch] gh/fffrog/138/head -> origin/gh/fffrog/138/head 2025-09-07T09:36:19.5740943Z * [new branch] gh/fffrog/138/orig -> origin/gh/fffrog/138/orig 2025-09-07T09:36:19.5743264Z * [new branch] gh/fffrog/139/base -> origin/gh/fffrog/139/base 2025-09-07T09:36:19.5745132Z * [new branch] gh/fffrog/139/head -> origin/gh/fffrog/139/head 2025-09-07T09:36:19.5746999Z * [new branch] gh/fffrog/139/orig -> origin/gh/fffrog/139/orig 2025-09-07T09:36:19.5749236Z * [new branch] gh/fffrog/140/base -> origin/gh/fffrog/140/base 2025-09-07T09:36:19.5750826Z * [new branch] gh/fffrog/140/head -> origin/gh/fffrog/140/head 2025-09-07T09:36:19.5752271Z * [new branch] gh/fffrog/140/orig -> origin/gh/fffrog/140/orig 2025-09-07T09:36:19.5754425Z * [new branch] gh/fffrog/141/base -> origin/gh/fffrog/141/base 2025-09-07T09:36:19.5756348Z * [new branch] gh/fffrog/141/head -> origin/gh/fffrog/141/head 2025-09-07T09:36:19.5757766Z * [new branch] gh/fffrog/141/orig -> origin/gh/fffrog/141/orig 2025-09-07T09:36:19.5759971Z * [new branch] gh/fffrog/142/base -> origin/gh/fffrog/142/base 2025-09-07T09:36:19.5761506Z * [new branch] gh/fffrog/142/head -> origin/gh/fffrog/142/head 2025-09-07T09:36:19.5763093Z * [new branch] gh/fffrog/142/orig -> origin/gh/fffrog/142/orig 2025-09-07T09:36:19.5765899Z * [new branch] gh/fffrog/143/base -> origin/gh/fffrog/143/base 2025-09-07T09:36:19.5767181Z * [new branch] gh/fffrog/143/head -> origin/gh/fffrog/143/head 2025-09-07T09:36:19.5768709Z * [new branch] gh/fffrog/143/orig -> origin/gh/fffrog/143/orig 2025-09-07T09:36:19.5770883Z * [new branch] gh/fffrog/144/base -> origin/gh/fffrog/144/base 2025-09-07T09:36:19.5772402Z * [new branch] gh/fffrog/144/head -> origin/gh/fffrog/144/head 2025-09-07T09:36:19.5773923Z * [new branch] gh/fffrog/144/orig -> origin/gh/fffrog/144/orig 2025-09-07T09:36:19.5776612Z * [new branch] gh/fffrog/145/base -> origin/gh/fffrog/145/base 2025-09-07T09:36:19.5777884Z * [new branch] gh/fffrog/145/head -> origin/gh/fffrog/145/head 2025-09-07T09:36:19.5779479Z * [new branch] gh/fffrog/145/orig -> origin/gh/fffrog/145/orig 2025-09-07T09:36:19.5781682Z * [new branch] gh/fffrog/146/base -> origin/gh/fffrog/146/base 2025-09-07T09:36:19.5783362Z * [new branch] gh/fffrog/146/head -> origin/gh/fffrog/146/head 2025-09-07T09:36:19.5784911Z * [new branch] gh/fffrog/146/orig -> origin/gh/fffrog/146/orig 2025-09-07T09:36:19.5787381Z * [new branch] gh/fffrog/147/base -> origin/gh/fffrog/147/base 2025-09-07T09:36:19.5788850Z * [new branch] gh/fffrog/147/head -> origin/gh/fffrog/147/head 2025-09-07T09:36:19.5790420Z * [new branch] gh/fffrog/147/orig -> origin/gh/fffrog/147/orig 2025-09-07T09:36:19.5792631Z * [new branch] gh/fffrog/148/base -> origin/gh/fffrog/148/base 2025-09-07T09:36:19.5794168Z * [new branch] gh/fffrog/148/head -> origin/gh/fffrog/148/head 2025-09-07T09:36:19.5795987Z * [new branch] gh/fffrog/148/orig -> origin/gh/fffrog/148/orig 2025-09-07T09:36:19.5798275Z * [new branch] gh/fffrog/149/base -> origin/gh/fffrog/149/base 2025-09-07T09:36:19.5799874Z * [new branch] gh/fffrog/149/head -> origin/gh/fffrog/149/head 2025-09-07T09:36:19.5801564Z * [new branch] gh/fffrog/149/orig -> origin/gh/fffrog/149/orig 2025-09-07T09:36:19.5803616Z * [new branch] gh/fffrog/150/base -> origin/gh/fffrog/150/base 2025-09-07T09:36:19.5805380Z * [new branch] gh/fffrog/150/head -> origin/gh/fffrog/150/head 2025-09-07T09:36:19.5807139Z * [new branch] gh/fffrog/150/orig -> origin/gh/fffrog/150/orig 2025-09-07T09:36:19.5809249Z * [new branch] gh/fffrog/151/base -> origin/gh/fffrog/151/base 2025-09-07T09:36:19.5810763Z * [new branch] gh/fffrog/151/head -> origin/gh/fffrog/151/head 2025-09-07T09:36:19.5812333Z * [new branch] gh/fffrog/151/orig -> origin/gh/fffrog/151/orig 2025-09-07T09:36:19.5814552Z * [new branch] gh/fffrog/152/base -> origin/gh/fffrog/152/base 2025-09-07T09:36:19.5816323Z * [new branch] gh/fffrog/152/head -> origin/gh/fffrog/152/head 2025-09-07T09:36:19.5818589Z * [new branch] gh/fffrog/153/base -> origin/gh/fffrog/153/base 2025-09-07T09:36:19.5820193Z * [new branch] gh/fffrog/153/head -> origin/gh/fffrog/153/head 2025-09-07T09:36:19.5821868Z * [new branch] gh/fffrog/153/orig -> origin/gh/fffrog/153/orig 2025-09-07T09:36:19.5824563Z * [new branch] gh/gmagogsfm/1/base -> origin/gh/gmagogsfm/1/base 2025-09-07T09:36:19.5826431Z * [new branch] gh/gmagogsfm/1/head -> origin/gh/gmagogsfm/1/head 2025-09-07T09:36:19.5828152Z * [new branch] gh/gmagogsfm/1/orig -> origin/gh/gmagogsfm/1/orig 2025-09-07T09:36:19.5830251Z * [new branch] gh/gmagogsfm/2/base -> origin/gh/gmagogsfm/2/base 2025-09-07T09:36:19.5831745Z * [new branch] gh/gmagogsfm/2/head -> origin/gh/gmagogsfm/2/head 2025-09-07T09:36:19.5833283Z * [new branch] gh/gmagogsfm/2/orig -> origin/gh/gmagogsfm/2/orig 2025-09-07T09:36:19.5835793Z * [new branch] gh/gmagogsfm/3/base -> origin/gh/gmagogsfm/3/base 2025-09-07T09:36:19.5837324Z * [new branch] gh/gmagogsfm/3/head -> origin/gh/gmagogsfm/3/head 2025-09-07T09:36:19.5838898Z * [new branch] gh/gmagogsfm/3/orig -> origin/gh/gmagogsfm/3/orig 2025-09-07T09:36:19.5841738Z * [new branch] gh/guangyey/134/base -> origin/gh/guangyey/134/base 2025-09-07T09:36:19.5843122Z * [new branch] gh/guangyey/134/head -> origin/gh/guangyey/134/head 2025-09-07T09:36:19.5844709Z * [new branch] gh/guangyey/134/orig -> origin/gh/guangyey/134/orig 2025-09-07T09:36:19.5847225Z * [new branch] gh/guangyey/135/base -> origin/gh/guangyey/135/base 2025-09-07T09:36:19.5848709Z * [new branch] gh/guangyey/135/head -> origin/gh/guangyey/135/head 2025-09-07T09:36:19.5850338Z * [new branch] gh/guangyey/135/orig -> origin/gh/guangyey/135/orig 2025-09-07T09:36:19.5852469Z * [new branch] gh/guangyey/139/base -> origin/gh/guangyey/139/base 2025-09-07T09:36:19.5854127Z * [new branch] gh/guangyey/139/head -> origin/gh/guangyey/139/head 2025-09-07T09:36:19.5855954Z * [new branch] gh/guangyey/139/orig -> origin/gh/guangyey/139/orig 2025-09-07T09:36:19.5858083Z * [new branch] gh/guangyey/140/base -> origin/gh/guangyey/140/base 2025-09-07T09:36:19.5859612Z * [new branch] gh/guangyey/140/head -> origin/gh/guangyey/140/head 2025-09-07T09:36:19.5861323Z * [new branch] gh/guangyey/140/orig -> origin/gh/guangyey/140/orig 2025-09-07T09:36:19.5863630Z * [new branch] gh/guangyey/142/base -> origin/gh/guangyey/142/base 2025-09-07T09:36:19.5865220Z * [new branch] gh/guangyey/142/head -> origin/gh/guangyey/142/head 2025-09-07T09:36:19.5866980Z * [new branch] gh/guangyey/142/orig -> origin/gh/guangyey/142/orig 2025-09-07T09:36:19.5869122Z * [new branch] gh/guangyey/145/base -> origin/gh/guangyey/145/base 2025-09-07T09:36:19.5870693Z * [new branch] gh/guangyey/145/head -> origin/gh/guangyey/145/head 2025-09-07T09:36:19.5872260Z * [new branch] gh/guangyey/145/orig -> origin/gh/guangyey/145/orig 2025-09-07T09:36:19.5874520Z * [new branch] gh/guangyey/153/base -> origin/gh/guangyey/153/base 2025-09-07T09:36:19.5876326Z * [new branch] gh/guangyey/153/head -> origin/gh/guangyey/153/head 2025-09-07T09:36:19.5877854Z * [new branch] gh/guangyey/153/orig -> origin/gh/guangyey/153/orig 2025-09-07T09:36:19.5880035Z * [new branch] gh/guangyey/159/base -> origin/gh/guangyey/159/base 2025-09-07T09:36:19.5881624Z * [new branch] gh/guangyey/159/head -> origin/gh/guangyey/159/head 2025-09-07T09:36:19.5883176Z * [new branch] gh/guangyey/159/orig -> origin/gh/guangyey/159/orig 2025-09-07T09:36:19.5885611Z * [new branch] gh/guangyey/163/base -> origin/gh/guangyey/163/base 2025-09-07T09:36:19.5887210Z * [new branch] gh/guangyey/163/head -> origin/gh/guangyey/163/head 2025-09-07T09:36:19.5888775Z * [new branch] gh/guangyey/163/orig -> origin/gh/guangyey/163/orig 2025-09-07T09:36:19.5891075Z * [new branch] gh/guangyey/168/base -> origin/gh/guangyey/168/base 2025-09-07T09:36:19.5892590Z * [new branch] gh/guangyey/168/head -> origin/gh/guangyey/168/head 2025-09-07T09:36:19.5894036Z * [new branch] gh/guangyey/168/orig -> origin/gh/guangyey/168/orig 2025-09-07T09:36:19.5896609Z * [new branch] gh/guangyey/169/base -> origin/gh/guangyey/169/base 2025-09-07T09:36:19.5898090Z * [new branch] gh/guangyey/169/head -> origin/gh/guangyey/169/head 2025-09-07T09:36:19.5899749Z * [new branch] gh/guangyey/169/orig -> origin/gh/guangyey/169/orig 2025-09-07T09:36:19.5902126Z * [new branch] gh/guangyey/170/base -> origin/gh/guangyey/170/base 2025-09-07T09:36:19.5903658Z * [new branch] gh/guangyey/170/head -> origin/gh/guangyey/170/head 2025-09-07T09:36:19.5905627Z * [new branch] gh/guangyey/170/orig -> origin/gh/guangyey/170/orig 2025-09-07T09:36:19.5907690Z * [new branch] gh/guangyey/171/base -> origin/gh/guangyey/171/base 2025-09-07T09:36:19.5909329Z * [new branch] gh/guangyey/171/head -> origin/gh/guangyey/171/head 2025-09-07T09:36:19.5910839Z * [new branch] gh/guangyey/171/orig -> origin/gh/guangyey/171/orig 2025-09-07T09:36:19.5913086Z * [new branch] gh/guangyey/174/base -> origin/gh/guangyey/174/base 2025-09-07T09:36:19.5914620Z * [new branch] gh/guangyey/174/head -> origin/gh/guangyey/174/head 2025-09-07T09:36:19.5916473Z * [new branch] gh/guangyey/174/orig -> origin/gh/guangyey/174/orig 2025-09-07T09:36:19.5918596Z * [new branch] gh/guangyey/176/base -> origin/gh/guangyey/176/base 2025-09-07T09:36:19.5920303Z * [new branch] gh/guangyey/176/head -> origin/gh/guangyey/176/head 2025-09-07T09:36:19.5921957Z * [new branch] gh/guangyey/176/orig -> origin/gh/guangyey/176/orig 2025-09-07T09:36:19.5924129Z * [new branch] gh/guangyey/178/base -> origin/gh/guangyey/178/base 2025-09-07T09:36:19.5926128Z * [new branch] gh/guangyey/178/head -> origin/gh/guangyey/178/head 2025-09-07T09:36:19.5927642Z * [new branch] gh/guangyey/178/orig -> origin/gh/guangyey/178/orig 2025-09-07T09:36:19.5929905Z * [new branch] gh/guangyey/181/base -> origin/gh/guangyey/181/base 2025-09-07T09:36:19.5931509Z * [new branch] gh/guangyey/181/head -> origin/gh/guangyey/181/head 2025-09-07T09:36:19.5932952Z * [new branch] gh/guangyey/181/orig -> origin/gh/guangyey/181/orig 2025-09-07T09:36:19.5935255Z * [new branch] gh/guangyey/182/base -> origin/gh/guangyey/182/base 2025-09-07T09:36:19.5936925Z * [new branch] gh/guangyey/182/head -> origin/gh/guangyey/182/head 2025-09-07T09:36:19.5938645Z * [new branch] gh/guangyey/182/orig -> origin/gh/guangyey/182/orig 2025-09-07T09:36:19.5940630Z * [new branch] gh/guangyey/183/base -> origin/gh/guangyey/183/base 2025-09-07T09:36:19.5942249Z * [new branch] gh/guangyey/183/head -> origin/gh/guangyey/183/head 2025-09-07T09:36:19.5943870Z * [new branch] gh/guangyey/183/orig -> origin/gh/guangyey/183/orig 2025-09-07T09:36:19.5946400Z * [new branch] gh/guangyey/184/base -> origin/gh/guangyey/184/base 2025-09-07T09:36:19.5947885Z * [new branch] gh/guangyey/184/head -> origin/gh/guangyey/184/head 2025-09-07T09:36:19.5949428Z * [new branch] gh/guangyey/184/orig -> origin/gh/guangyey/184/orig 2025-09-07T09:36:19.5951662Z * [new branch] gh/guangyey/185/base -> origin/gh/guangyey/185/base 2025-09-07T09:36:19.5953203Z * [new branch] gh/guangyey/185/head -> origin/gh/guangyey/185/head 2025-09-07T09:36:19.5954797Z * [new branch] gh/guangyey/185/orig -> origin/gh/guangyey/185/orig 2025-09-07T09:36:19.5957276Z * [new branch] gh/guangyey/186/base -> origin/gh/guangyey/186/base 2025-09-07T09:36:19.5958810Z * [new branch] gh/guangyey/186/head -> origin/gh/guangyey/186/head 2025-09-07T09:36:19.5960190Z * [new branch] gh/guangyey/186/orig -> origin/gh/guangyey/186/orig 2025-09-07T09:36:19.5962306Z * [new branch] gh/guangyey/187/base -> origin/gh/guangyey/187/base 2025-09-07T09:36:19.5963885Z * [new branch] gh/guangyey/187/head -> origin/gh/guangyey/187/head 2025-09-07T09:36:19.5965927Z * [new branch] gh/guangyey/187/orig -> origin/gh/guangyey/187/orig 2025-09-07T09:36:19.5968040Z * [new branch] gh/guangyey/188/base -> origin/gh/guangyey/188/base 2025-09-07T09:36:19.5969758Z * [new branch] gh/guangyey/188/head -> origin/gh/guangyey/188/head 2025-09-07T09:36:19.5971151Z * [new branch] gh/guangyey/188/orig -> origin/gh/guangyey/188/orig 2025-09-07T09:36:19.5973429Z * [new branch] gh/guangyey/189/base -> origin/gh/guangyey/189/base 2025-09-07T09:36:19.5974915Z * [new branch] gh/guangyey/189/head -> origin/gh/guangyey/189/head 2025-09-07T09:36:19.5977244Z * [new branch] gh/guangyey/189/orig -> origin/gh/guangyey/189/orig 2025-09-07T09:36:19.5979434Z * [new branch] gh/guangyey/190/base -> origin/gh/guangyey/190/base 2025-09-07T09:36:19.5981048Z * [new branch] gh/guangyey/190/head -> origin/gh/guangyey/190/head 2025-09-07T09:36:19.5982752Z * [new branch] gh/guangyey/190/orig -> origin/gh/guangyey/190/orig 2025-09-07T09:36:19.5985186Z * [new branch] gh/guangyey/191/base -> origin/gh/guangyey/191/base 2025-09-07T09:36:19.5987276Z * [new branch] gh/guangyey/191/head -> origin/gh/guangyey/191/head 2025-09-07T09:36:19.5989188Z * [new branch] gh/guangyey/191/orig -> origin/gh/guangyey/191/orig 2025-09-07T09:36:19.5991495Z * [new branch] gh/guangyey/192/base -> origin/gh/guangyey/192/base 2025-09-07T09:36:19.5993040Z * [new branch] gh/guangyey/192/head -> origin/gh/guangyey/192/head 2025-09-07T09:36:19.5994686Z * [new branch] gh/guangyey/192/orig -> origin/gh/guangyey/192/orig 2025-09-07T09:36:19.5997587Z * [new branch] gh/guangyey/193/base -> origin/gh/guangyey/193/base 2025-09-07T09:36:19.5998971Z * [new branch] gh/guangyey/193/head -> origin/gh/guangyey/193/head 2025-09-07T09:36:19.6000577Z * [new branch] gh/guangyey/193/orig -> origin/gh/guangyey/193/orig 2025-09-07T09:36:19.6002884Z * [new branch] gh/guangyey/194/base -> origin/gh/guangyey/194/base 2025-09-07T09:36:19.6004453Z * [new branch] gh/guangyey/194/head -> origin/gh/guangyey/194/head 2025-09-07T09:36:19.6006786Z * [new branch] gh/guangyey/194/orig -> origin/gh/guangyey/194/orig 2025-09-07T09:36:19.6008944Z * [new branch] gh/guangyey/195/base -> origin/gh/guangyey/195/base 2025-09-07T09:36:19.6010599Z * [new branch] gh/guangyey/195/head -> origin/gh/guangyey/195/head 2025-09-07T09:36:19.6012189Z * [new branch] gh/guangyey/195/orig -> origin/gh/guangyey/195/orig 2025-09-07T09:36:19.6014780Z * [new branch] gh/guangyey/196/base -> origin/gh/guangyey/196/base 2025-09-07T09:36:19.6016906Z * [new branch] gh/guangyey/196/head -> origin/gh/guangyey/196/head 2025-09-07T09:36:19.6018462Z * [new branch] gh/guangyey/196/orig -> origin/gh/guangyey/196/orig 2025-09-07T09:36:19.6020812Z * [new branch] gh/guangyey/197/base -> origin/gh/guangyey/197/base 2025-09-07T09:36:19.6022529Z * [new branch] gh/guangyey/197/head -> origin/gh/guangyey/197/head 2025-09-07T09:36:19.6024157Z * [new branch] gh/guangyey/197/orig -> origin/gh/guangyey/197/orig 2025-09-07T09:36:19.6026699Z * [new branch] gh/guangyey/198/base -> origin/gh/guangyey/198/base 2025-09-07T09:36:19.6028215Z * [new branch] gh/guangyey/198/head -> origin/gh/guangyey/198/head 2025-09-07T09:36:19.6029780Z * [new branch] gh/guangyey/198/orig -> origin/gh/guangyey/198/orig 2025-09-07T09:36:19.6032006Z * [new branch] gh/guangyey/199/base -> origin/gh/guangyey/199/base 2025-09-07T09:36:19.6033778Z * [new branch] gh/guangyey/199/head -> origin/gh/guangyey/199/head 2025-09-07T09:36:19.6035205Z * [new branch] gh/guangyey/199/orig -> origin/gh/guangyey/199/orig 2025-09-07T09:36:19.6037759Z * [new branch] gh/guangyey/200/base -> origin/gh/guangyey/200/base 2025-09-07T09:36:19.6039047Z * [new branch] gh/guangyey/200/head -> origin/gh/guangyey/200/head 2025-09-07T09:36:19.6040620Z * [new branch] gh/guangyey/200/orig -> origin/gh/guangyey/200/orig 2025-09-07T09:36:19.6042799Z * [new branch] gh/guangyey/201/base -> origin/gh/guangyey/201/base 2025-09-07T09:36:19.6044644Z * [new branch] gh/guangyey/201/head -> origin/gh/guangyey/201/head 2025-09-07T09:36:19.6046477Z * [new branch] gh/guangyey/201/orig -> origin/gh/guangyey/201/orig 2025-09-07T09:36:19.6048626Z * [new branch] gh/guangyey/202/base -> origin/gh/guangyey/202/base 2025-09-07T09:36:19.6050106Z * [new branch] gh/guangyey/202/head -> origin/gh/guangyey/202/head 2025-09-07T09:36:19.6051655Z * [new branch] gh/guangyey/202/orig -> origin/gh/guangyey/202/orig 2025-09-07T09:36:19.6053801Z * [new branch] gh/guangyey/203/base -> origin/gh/guangyey/203/base 2025-09-07T09:36:19.6055737Z * [new branch] gh/guangyey/203/head -> origin/gh/guangyey/203/head 2025-09-07T09:36:19.6057211Z * [new branch] gh/guangyey/203/orig -> origin/gh/guangyey/203/orig 2025-09-07T09:36:19.6059356Z * [new branch] gh/guangyey/204/base -> origin/gh/guangyey/204/base 2025-09-07T09:36:19.6060922Z * [new branch] gh/guangyey/204/head -> origin/gh/guangyey/204/head 2025-09-07T09:36:19.6062682Z * [new branch] gh/guangyey/204/orig -> origin/gh/guangyey/204/orig 2025-09-07T09:36:19.6065197Z * [new branch] gh/guangyey/205/base -> origin/gh/guangyey/205/base 2025-09-07T09:36:19.6066777Z * [new branch] gh/guangyey/205/head -> origin/gh/guangyey/205/head 2025-09-07T09:36:19.6068309Z * [new branch] gh/guangyey/205/orig -> origin/gh/guangyey/205/orig 2025-09-07T09:36:19.6070493Z * [new branch] gh/guangyey/206/base -> origin/gh/guangyey/206/base 2025-09-07T09:36:19.6072022Z * [new branch] gh/guangyey/206/head -> origin/gh/guangyey/206/head 2025-09-07T09:36:19.6073536Z * [new branch] gh/guangyey/206/orig -> origin/gh/guangyey/206/orig 2025-09-07T09:36:19.6076137Z * [new branch] gh/guangyey/207/base -> origin/gh/guangyey/207/base 2025-09-07T09:36:19.6077633Z * [new branch] gh/guangyey/207/head -> origin/gh/guangyey/207/head 2025-09-07T09:36:19.6079116Z * [new branch] gh/guangyey/207/orig -> origin/gh/guangyey/207/orig 2025-09-07T09:36:19.6081394Z * [new branch] gh/guangyey/79/base -> origin/gh/guangyey/79/base 2025-09-07T09:36:19.6083022Z * [new branch] gh/guangyey/79/head -> origin/gh/guangyey/79/head 2025-09-07T09:36:19.6084442Z * [new branch] gh/guangyey/79/orig -> origin/gh/guangyey/79/orig 2025-09-07T09:36:19.6087047Z * [new branch] gh/guangyey/89/base -> origin/gh/guangyey/89/base 2025-09-07T09:36:19.6088478Z * [new branch] gh/guangyey/89/head -> origin/gh/guangyey/89/head 2025-09-07T09:36:19.6090208Z * [new branch] gh/guangyey/89/orig -> origin/gh/guangyey/89/orig 2025-09-07T09:36:19.6092797Z * [new branch] gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base 2025-09-07T09:36:19.6094310Z * [new branch] gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head 2025-09-07T09:36:19.6096323Z * [new branch] gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig 2025-09-07T09:36:19.6098476Z * [new branch] gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base 2025-09-07T09:36:19.6099968Z * [new branch] gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head 2025-09-07T09:36:19.6101866Z * [new branch] gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig 2025-09-07T09:36:19.6103898Z * [new branch] gh/guilhermeleobas/124/base -> origin/gh/guilhermeleobas/124/base 2025-09-07T09:36:19.6105662Z * [new branch] gh/guilhermeleobas/124/head -> origin/gh/guilhermeleobas/124/head 2025-09-07T09:36:19.6107451Z * [new branch] gh/guilhermeleobas/124/orig -> origin/gh/guilhermeleobas/124/orig 2025-09-07T09:36:19.6109697Z * [new branch] gh/guilhermeleobas/147/base -> origin/gh/guilhermeleobas/147/base 2025-09-07T09:36:19.6111234Z * [new branch] gh/guilhermeleobas/147/head -> origin/gh/guilhermeleobas/147/head 2025-09-07T09:36:19.6112781Z * [new branch] gh/guilhermeleobas/147/orig -> origin/gh/guilhermeleobas/147/orig 2025-09-07T09:36:19.6115659Z * [new branch] gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base 2025-09-07T09:36:19.6116901Z * [new branch] gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head 2025-09-07T09:36:19.6118402Z * [new branch] gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig 2025-09-07T09:36:19.6120635Z * [new branch] gh/guilhermeleobas/163/base -> origin/gh/guilhermeleobas/163/base 2025-09-07T09:36:19.6122257Z * [new branch] gh/guilhermeleobas/163/head -> origin/gh/guilhermeleobas/163/head 2025-09-07T09:36:19.6123791Z * [new branch] gh/guilhermeleobas/163/orig -> origin/gh/guilhermeleobas/163/orig 2025-09-07T09:36:19.6126335Z * [new branch] gh/guilhermeleobas/164/base -> origin/gh/guilhermeleobas/164/base 2025-09-07T09:36:19.6127842Z * [new branch] gh/guilhermeleobas/164/head -> origin/gh/guilhermeleobas/164/head 2025-09-07T09:36:19.6129322Z * [new branch] gh/guilhermeleobas/164/orig -> origin/gh/guilhermeleobas/164/orig 2025-09-07T09:36:19.6131471Z * [new branch] gh/guilhermeleobas/165/base -> origin/gh/guilhermeleobas/165/base 2025-09-07T09:36:19.6133018Z * [new branch] gh/guilhermeleobas/165/head -> origin/gh/guilhermeleobas/165/head 2025-09-07T09:36:19.6134908Z * [new branch] gh/guilhermeleobas/165/orig -> origin/gh/guilhermeleobas/165/orig 2025-09-07T09:36:19.6137629Z * [new branch] gh/guilhermeleobas/166/base -> origin/gh/guilhermeleobas/166/base 2025-09-07T09:36:19.6139120Z * [new branch] gh/guilhermeleobas/166/head -> origin/gh/guilhermeleobas/166/head 2025-09-07T09:36:19.6140588Z * [new branch] gh/guilhermeleobas/166/orig -> origin/gh/guilhermeleobas/166/orig 2025-09-07T09:36:19.6142915Z * [new branch] gh/guilhermeleobas/167/base -> origin/gh/guilhermeleobas/167/base 2025-09-07T09:36:19.6144584Z * [new branch] gh/guilhermeleobas/167/head -> origin/gh/guilhermeleobas/167/head 2025-09-07T09:36:19.6146429Z * [new branch] gh/guilhermeleobas/167/orig -> origin/gh/guilhermeleobas/167/orig 2025-09-07T09:36:19.6148537Z * [new branch] gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base 2025-09-07T09:36:19.6150103Z * [new branch] gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head 2025-09-07T09:36:19.6151545Z * [new branch] gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig 2025-09-07T09:36:19.6153709Z * [new branch] gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base 2025-09-07T09:36:19.6155621Z * [new branch] gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head 2025-09-07T09:36:19.6157242Z * [new branch] gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig 2025-09-07T09:36:19.6159342Z * [new branch] gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base 2025-09-07T09:36:19.6160936Z * [new branch] gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head 2025-09-07T09:36:19.6162612Z * [new branch] gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig 2025-09-07T09:36:19.6164723Z * [new branch] gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base 2025-09-07T09:36:19.6166600Z * [new branch] gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head 2025-09-07T09:36:19.6168069Z * [new branch] gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig 2025-09-07T09:36:19.6170203Z * [new branch] gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base 2025-09-07T09:36:19.6171850Z * [new branch] gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head 2025-09-07T09:36:19.6173325Z * [new branch] gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig 2025-09-07T09:36:19.6175795Z * [new branch] gh/guilhermeleobas/192/base -> origin/gh/guilhermeleobas/192/base 2025-09-07T09:36:19.6177539Z * [new branch] gh/guilhermeleobas/192/head -> origin/gh/guilhermeleobas/192/head 2025-09-07T09:36:19.6178934Z * [new branch] gh/guilhermeleobas/192/orig -> origin/gh/guilhermeleobas/192/orig 2025-09-07T09:36:19.6181239Z * [new branch] gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base 2025-09-07T09:36:19.6182946Z * [new branch] gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head 2025-09-07T09:36:19.6184479Z * [new branch] gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig 2025-09-07T09:36:19.6186949Z * [new branch] gh/guilhermeleobas/194/base -> origin/gh/guilhermeleobas/194/base 2025-09-07T09:36:19.6188478Z * [new branch] gh/guilhermeleobas/194/head -> origin/gh/guilhermeleobas/194/head 2025-09-07T09:36:19.6189967Z * [new branch] gh/guilhermeleobas/194/orig -> origin/gh/guilhermeleobas/194/orig 2025-09-07T09:36:19.6192140Z * [new branch] gh/guilhermeleobas/203/base -> origin/gh/guilhermeleobas/203/base 2025-09-07T09:36:19.6193810Z * [new branch] gh/guilhermeleobas/203/head -> origin/gh/guilhermeleobas/203/head 2025-09-07T09:36:19.6195418Z * [new branch] gh/guilhermeleobas/203/orig -> origin/gh/guilhermeleobas/203/orig 2025-09-07T09:36:19.6197669Z * [new branch] gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base 2025-09-07T09:36:19.6199445Z * [new branch] gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head 2025-09-07T09:36:19.6200984Z * [new branch] gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig 2025-09-07T09:36:19.6203113Z * [new branch] gh/guilhermeleobas/205/base -> origin/gh/guilhermeleobas/205/base 2025-09-07T09:36:19.6204664Z * [new branch] gh/guilhermeleobas/205/head -> origin/gh/guilhermeleobas/205/head 2025-09-07T09:36:19.6206495Z * [new branch] gh/guilhermeleobas/205/orig -> origin/gh/guilhermeleobas/205/orig 2025-09-07T09:36:19.6208872Z * [new branch] gh/guilhermeleobas/209/base -> origin/gh/guilhermeleobas/209/base 2025-09-07T09:36:19.6210393Z * [new branch] gh/guilhermeleobas/209/head -> origin/gh/guilhermeleobas/209/head 2025-09-07T09:36:19.6211912Z * [new branch] gh/guilhermeleobas/209/orig -> origin/gh/guilhermeleobas/209/orig 2025-09-07T09:36:19.6214168Z * [new branch] gh/guilhermeleobas/210/base -> origin/gh/guilhermeleobas/210/base 2025-09-07T09:36:19.6216184Z * [new branch] gh/guilhermeleobas/210/head -> origin/gh/guilhermeleobas/210/head 2025-09-07T09:36:19.6217727Z * [new branch] gh/guilhermeleobas/210/orig -> origin/gh/guilhermeleobas/210/orig 2025-09-07T09:36:19.6219892Z * [new branch] gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base 2025-09-07T09:36:19.6221396Z * [new branch] gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head 2025-09-07T09:36:19.6223287Z * [new branch] gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig 2025-09-07T09:36:19.6225664Z * [new branch] gh/guilhermeleobas/214/base -> origin/gh/guilhermeleobas/214/base 2025-09-07T09:36:19.6227201Z * [new branch] gh/guilhermeleobas/214/head -> origin/gh/guilhermeleobas/214/head 2025-09-07T09:36:19.6228777Z * [new branch] gh/guilhermeleobas/214/orig -> origin/gh/guilhermeleobas/214/orig 2025-09-07T09:36:19.6230965Z * [new branch] gh/guilhermeleobas/215/base -> origin/gh/guilhermeleobas/215/base 2025-09-07T09:36:19.6232546Z * [new branch] gh/guilhermeleobas/215/head -> origin/gh/guilhermeleobas/215/head 2025-09-07T09:36:19.6234066Z * [new branch] gh/guilhermeleobas/215/orig -> origin/gh/guilhermeleobas/215/orig 2025-09-07T09:36:19.6236743Z * [new branch] gh/guilhermeleobas/216/base -> origin/gh/guilhermeleobas/216/base 2025-09-07T09:36:19.6238179Z * [new branch] gh/guilhermeleobas/216/head -> origin/gh/guilhermeleobas/216/head 2025-09-07T09:36:19.6239740Z * [new branch] gh/guilhermeleobas/216/orig -> origin/gh/guilhermeleobas/216/orig 2025-09-07T09:36:19.6242004Z * [new branch] gh/guilhermeleobas/217/base -> origin/gh/guilhermeleobas/217/base 2025-09-07T09:36:19.6243536Z * [new branch] gh/guilhermeleobas/217/head -> origin/gh/guilhermeleobas/217/head 2025-09-07T09:36:19.6245275Z * [new branch] gh/guilhermeleobas/217/orig -> origin/gh/guilhermeleobas/217/orig 2025-09-07T09:36:19.6247666Z * [new branch] gh/guilhermeleobas/219/base -> origin/gh/guilhermeleobas/219/base 2025-09-07T09:36:19.6249176Z * [new branch] gh/guilhermeleobas/219/head -> origin/gh/guilhermeleobas/219/head 2025-09-07T09:36:19.6250814Z * [new branch] gh/guilhermeleobas/219/orig -> origin/gh/guilhermeleobas/219/orig 2025-09-07T09:36:19.6252871Z * [new branch] gh/guilhermeleobas/220/base -> origin/gh/guilhermeleobas/220/base 2025-09-07T09:36:19.6254450Z * [new branch] gh/guilhermeleobas/220/head -> origin/gh/guilhermeleobas/220/head 2025-09-07T09:36:19.6256261Z * [new branch] gh/guilhermeleobas/220/orig -> origin/gh/guilhermeleobas/220/orig 2025-09-07T09:36:19.6258489Z * [new branch] gh/guilhermeleobas/221/base -> origin/gh/guilhermeleobas/221/base 2025-09-07T09:36:19.6259978Z * [new branch] gh/guilhermeleobas/221/head -> origin/gh/guilhermeleobas/221/head 2025-09-07T09:36:19.6261636Z * [new branch] gh/guilhermeleobas/221/orig -> origin/gh/guilhermeleobas/221/orig 2025-09-07T09:36:19.6263947Z * [new branch] gh/guilhermeleobas/222/base -> origin/gh/guilhermeleobas/222/base 2025-09-07T09:36:19.6265847Z * [new branch] gh/guilhermeleobas/222/head -> origin/gh/guilhermeleobas/222/head 2025-09-07T09:36:19.6267358Z * [new branch] gh/guilhermeleobas/222/orig -> origin/gh/guilhermeleobas/222/orig 2025-09-07T09:36:19.6269653Z * [new branch] gh/guilhermeleobas/223/base -> origin/gh/guilhermeleobas/223/base 2025-09-07T09:36:19.6271142Z * [new branch] gh/guilhermeleobas/223/head -> origin/gh/guilhermeleobas/223/head 2025-09-07T09:36:19.6272714Z * [new branch] gh/guilhermeleobas/223/orig -> origin/gh/guilhermeleobas/223/orig 2025-09-07T09:36:19.6274918Z * [new branch] gh/guilhermeleobas/224/base -> origin/gh/guilhermeleobas/224/base 2025-09-07T09:36:19.6276841Z * [new branch] gh/guilhermeleobas/224/head -> origin/gh/guilhermeleobas/224/head 2025-09-07T09:36:19.6278407Z * [new branch] gh/guilhermeleobas/224/orig -> origin/gh/guilhermeleobas/224/orig 2025-09-07T09:36:19.6280617Z * [new branch] gh/guilhermeleobas/225/base -> origin/gh/guilhermeleobas/225/base 2025-09-07T09:36:19.6282143Z * [new branch] gh/guilhermeleobas/225/head -> origin/gh/guilhermeleobas/225/head 2025-09-07T09:36:19.6283957Z * [new branch] gh/guilhermeleobas/225/orig -> origin/gh/guilhermeleobas/225/orig 2025-09-07T09:36:19.6286218Z * [new branch] gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base 2025-09-07T09:36:19.6287770Z * [new branch] gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head 2025-09-07T09:36:19.6289251Z * [new branch] gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig 2025-09-07T09:36:19.6291681Z * [new branch] gh/guilhermeleobas/227/base -> origin/gh/guilhermeleobas/227/base 2025-09-07T09:36:19.6293328Z * [new branch] gh/guilhermeleobas/227/head -> origin/gh/guilhermeleobas/227/head 2025-09-07T09:36:19.6294783Z * [new branch] gh/guilhermeleobas/227/orig -> origin/gh/guilhermeleobas/227/orig 2025-09-07T09:36:19.6297453Z * [new branch] gh/guilhermeleobas/228/base -> origin/gh/guilhermeleobas/228/base 2025-09-07T09:36:19.6298925Z * [new branch] gh/guilhermeleobas/228/head -> origin/gh/guilhermeleobas/228/head 2025-09-07T09:36:19.6300330Z * [new branch] gh/guilhermeleobas/228/orig -> origin/gh/guilhermeleobas/228/orig 2025-09-07T09:36:19.6302858Z * [new branch] gh/guilhermeleobas/229/base -> origin/gh/guilhermeleobas/229/base 2025-09-07T09:36:19.6304482Z * [new branch] gh/guilhermeleobas/229/head -> origin/gh/guilhermeleobas/229/head 2025-09-07T09:36:19.6306459Z * [new branch] gh/guilhermeleobas/229/orig -> origin/gh/guilhermeleobas/229/orig 2025-09-07T09:36:19.6308700Z * [new branch] gh/guilhermeleobas/230/base -> origin/gh/guilhermeleobas/230/base 2025-09-07T09:36:19.6310257Z * [new branch] gh/guilhermeleobas/230/head -> origin/gh/guilhermeleobas/230/head 2025-09-07T09:36:19.6311846Z * [new branch] gh/guilhermeleobas/230/orig -> origin/gh/guilhermeleobas/230/orig 2025-09-07T09:36:19.6314057Z * [new branch] gh/guilhermeleobas/231/base -> origin/gh/guilhermeleobas/231/base 2025-09-07T09:36:19.6315873Z * [new branch] gh/guilhermeleobas/231/head -> origin/gh/guilhermeleobas/231/head 2025-09-07T09:36:19.6317357Z * [new branch] gh/guilhermeleobas/231/orig -> origin/gh/guilhermeleobas/231/orig 2025-09-07T09:36:19.6319547Z * [new branch] gh/guilhermeleobas/232/base -> origin/gh/guilhermeleobas/232/base 2025-09-07T09:36:19.6321225Z * [new branch] gh/guilhermeleobas/232/head -> origin/gh/guilhermeleobas/232/head 2025-09-07T09:36:19.6322757Z * [new branch] gh/guilhermeleobas/232/orig -> origin/gh/guilhermeleobas/232/orig 2025-09-07T09:36:19.6324872Z * [new branch] gh/guilhermeleobas/233/base -> origin/gh/guilhermeleobas/233/base 2025-09-07T09:36:19.6326630Z * [new branch] gh/guilhermeleobas/233/head -> origin/gh/guilhermeleobas/233/head 2025-09-07T09:36:19.6328203Z * [new branch] gh/guilhermeleobas/233/orig -> origin/gh/guilhermeleobas/233/orig 2025-09-07T09:36:19.6330531Z * [new branch] gh/guilhermeleobas/234/base -> origin/gh/guilhermeleobas/234/base 2025-09-07T09:36:19.6332042Z * [new branch] gh/guilhermeleobas/234/head -> origin/gh/guilhermeleobas/234/head 2025-09-07T09:36:19.6333506Z * [new branch] gh/guilhermeleobas/234/orig -> origin/gh/guilhermeleobas/234/orig 2025-09-07T09:36:19.6336020Z * [new branch] gh/guilhermeleobas/235/base -> origin/gh/guilhermeleobas/235/base 2025-09-07T09:36:19.6337644Z * [new branch] gh/guilhermeleobas/235/head -> origin/gh/guilhermeleobas/235/head 2025-09-07T09:36:19.6339219Z * [new branch] gh/guilhermeleobas/235/orig -> origin/gh/guilhermeleobas/235/orig 2025-09-07T09:36:19.6341531Z * [new branch] gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base 2025-09-07T09:36:19.6343239Z * [new branch] gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head 2025-09-07T09:36:19.6344845Z * [new branch] gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig 2025-09-07T09:36:19.6347268Z * [new branch] gh/guilhermeleobas/237/base -> origin/gh/guilhermeleobas/237/base 2025-09-07T09:36:19.6348685Z * [new branch] gh/guilhermeleobas/237/head -> origin/gh/guilhermeleobas/237/head 2025-09-07T09:36:19.6350283Z * [new branch] gh/guilhermeleobas/237/orig -> origin/gh/guilhermeleobas/237/orig 2025-09-07T09:36:19.6352548Z * [new branch] gh/guilhermeleobas/238/base -> origin/gh/guilhermeleobas/238/base 2025-09-07T09:36:19.6354146Z * [new branch] gh/guilhermeleobas/238/head -> origin/gh/guilhermeleobas/238/head 2025-09-07T09:36:19.6355916Z * [new branch] gh/guilhermeleobas/238/orig -> origin/gh/guilhermeleobas/238/orig 2025-09-07T09:36:19.6358243Z * [new branch] gh/guilhermeleobas/239/base -> origin/gh/guilhermeleobas/239/base 2025-09-07T09:36:19.6359826Z * [new branch] gh/guilhermeleobas/239/head -> origin/gh/guilhermeleobas/239/head 2025-09-07T09:36:19.6361350Z * [new branch] gh/guilhermeleobas/239/orig -> origin/gh/guilhermeleobas/239/orig 2025-09-07T09:36:19.6363617Z * [new branch] gh/guilhermeleobas/240/base -> origin/gh/guilhermeleobas/240/base 2025-09-07T09:36:19.6365285Z * [new branch] gh/guilhermeleobas/240/head -> origin/gh/guilhermeleobas/240/head 2025-09-07T09:36:19.6367030Z * [new branch] gh/guilhermeleobas/240/orig -> origin/gh/guilhermeleobas/240/orig 2025-09-07T09:36:19.6369291Z * [new branch] gh/guilhermeleobas/241/base -> origin/gh/guilhermeleobas/241/base 2025-09-07T09:36:19.6370833Z * [new branch] gh/guilhermeleobas/241/head -> origin/gh/guilhermeleobas/241/head 2025-09-07T09:36:19.6372392Z * [new branch] gh/guilhermeleobas/241/orig -> origin/gh/guilhermeleobas/241/orig 2025-09-07T09:36:19.6374662Z * [new branch] gh/guilhermeleobas/242/base -> origin/gh/guilhermeleobas/242/base 2025-09-07T09:36:19.6376591Z * [new branch] gh/guilhermeleobas/242/head -> origin/gh/guilhermeleobas/242/head 2025-09-07T09:36:19.6378143Z * [new branch] gh/guilhermeleobas/242/orig -> origin/gh/guilhermeleobas/242/orig 2025-09-07T09:36:19.6380286Z * [new branch] gh/guilhermeleobas/243/base -> origin/gh/guilhermeleobas/243/base 2025-09-07T09:36:19.6381991Z * [new branch] gh/guilhermeleobas/243/head -> origin/gh/guilhermeleobas/243/head 2025-09-07T09:36:19.6383755Z * [new branch] gh/guilhermeleobas/243/orig -> origin/gh/guilhermeleobas/243/orig 2025-09-07T09:36:19.6386379Z * [new branch] gh/guilhermeleobas/244/base -> origin/gh/guilhermeleobas/244/base 2025-09-07T09:36:19.6387953Z * [new branch] gh/guilhermeleobas/244/head -> origin/gh/guilhermeleobas/244/head 2025-09-07T09:36:19.6389509Z * [new branch] gh/guilhermeleobas/244/orig -> origin/gh/guilhermeleobas/244/orig 2025-09-07T09:36:19.6391751Z * [new branch] gh/guilhermeleobas/245/base -> origin/gh/guilhermeleobas/245/base 2025-09-07T09:36:19.6393257Z * [new branch] gh/guilhermeleobas/245/head -> origin/gh/guilhermeleobas/245/head 2025-09-07T09:36:19.6394826Z * [new branch] gh/guilhermeleobas/245/orig -> origin/gh/guilhermeleobas/245/orig 2025-09-07T09:36:19.6397372Z * [new branch] gh/guilhermeleobas/73/base -> origin/gh/guilhermeleobas/73/base 2025-09-07T09:36:19.6398955Z * [new branch] gh/guilhermeleobas/73/head -> origin/gh/guilhermeleobas/73/head 2025-09-07T09:36:19.6400432Z * [new branch] gh/guilhermeleobas/73/orig -> origin/gh/guilhermeleobas/73/orig 2025-09-07T09:36:19.6403197Z * [new branch] gh/henrylhtsang/140/base -> origin/gh/henrylhtsang/140/base 2025-09-07T09:36:19.6404794Z * [new branch] gh/henrylhtsang/140/head -> origin/gh/henrylhtsang/140/head 2025-09-07T09:36:19.6406751Z * [new branch] gh/henrylhtsang/140/orig -> origin/gh/henrylhtsang/140/orig 2025-09-07T09:36:19.6408788Z * [new branch] gh/henrylhtsang/141/base -> origin/gh/henrylhtsang/141/base 2025-09-07T09:36:19.6410378Z * [new branch] gh/henrylhtsang/141/head -> origin/gh/henrylhtsang/141/head 2025-09-07T09:36:19.6411878Z * [new branch] gh/henrylhtsang/141/orig -> origin/gh/henrylhtsang/141/orig 2025-09-07T09:36:19.6414374Z * [new branch] gh/henrylhtsang/142/base -> origin/gh/henrylhtsang/142/base 2025-09-07T09:36:19.6416418Z * [new branch] gh/henrylhtsang/142/head -> origin/gh/henrylhtsang/142/head 2025-09-07T09:36:19.6418008Z * [new branch] gh/henrylhtsang/142/orig -> origin/gh/henrylhtsang/142/orig 2025-09-07T09:36:19.6420254Z * [new branch] gh/henrylhtsang/143/base -> origin/gh/henrylhtsang/143/base 2025-09-07T09:36:19.6421923Z * [new branch] gh/henrylhtsang/143/head -> origin/gh/henrylhtsang/143/head 2025-09-07T09:36:19.6423417Z * [new branch] gh/henrylhtsang/143/orig -> origin/gh/henrylhtsang/143/orig 2025-09-07T09:36:19.6425877Z * [new branch] gh/henrylhtsang/144/base -> origin/gh/henrylhtsang/144/base 2025-09-07T09:36:19.6427395Z * [new branch] gh/henrylhtsang/144/head -> origin/gh/henrylhtsang/144/head 2025-09-07T09:36:19.6428904Z * [new branch] gh/henrylhtsang/144/orig -> origin/gh/henrylhtsang/144/orig 2025-09-07T09:36:19.6431190Z * [new branch] gh/henrylhtsang/145/base -> origin/gh/henrylhtsang/145/base 2025-09-07T09:36:19.6432842Z * [new branch] gh/henrylhtsang/145/head -> origin/gh/henrylhtsang/145/head 2025-09-07T09:36:19.6434353Z * [new branch] gh/henrylhtsang/145/orig -> origin/gh/henrylhtsang/145/orig 2025-09-07T09:36:19.6436865Z * [new branch] gh/henrylhtsang/146/base -> origin/gh/henrylhtsang/146/base 2025-09-07T09:36:19.6438526Z * [new branch] gh/henrylhtsang/146/head -> origin/gh/henrylhtsang/146/head 2025-09-07T09:36:19.6440062Z * [new branch] gh/henrylhtsang/146/orig -> origin/gh/henrylhtsang/146/orig 2025-09-07T09:36:19.6442173Z * [new branch] gh/henrylhtsang/147/base -> origin/gh/henrylhtsang/147/base 2025-09-07T09:36:19.6443810Z * [new branch] gh/henrylhtsang/147/head -> origin/gh/henrylhtsang/147/head 2025-09-07T09:36:19.6445531Z * [new branch] gh/henrylhtsang/147/orig -> origin/gh/henrylhtsang/147/orig 2025-09-07T09:36:19.6447960Z * [new branch] gh/henrylhtsang/148/base -> origin/gh/henrylhtsang/148/base 2025-09-07T09:36:19.6449713Z * [new branch] gh/henrylhtsang/148/head -> origin/gh/henrylhtsang/148/head 2025-09-07T09:36:19.6451224Z * [new branch] gh/henrylhtsang/148/orig -> origin/gh/henrylhtsang/148/orig 2025-09-07T09:36:19.6453499Z * [new branch] gh/henrylhtsang/149/base -> origin/gh/henrylhtsang/149/base 2025-09-07T09:36:19.6455216Z * [new branch] gh/henrylhtsang/149/head -> origin/gh/henrylhtsang/149/head 2025-09-07T09:36:19.6456811Z * [new branch] gh/henrylhtsang/149/orig -> origin/gh/henrylhtsang/149/orig 2025-09-07T09:36:19.6459506Z * [new branch] gh/huydhn/1/next -> origin/gh/huydhn/1/next 2025-09-07T09:36:19.6461644Z * [new branch] gh/huydhn/2/next -> origin/gh/huydhn/2/next 2025-09-07T09:36:19.6463921Z * [new branch] gh/huydhn/3/next -> origin/gh/huydhn/3/next 2025-09-07T09:36:19.6466334Z * [new branch] gh/huydhn/4/next -> origin/gh/huydhn/4/next 2025-09-07T09:36:19.6468500Z * [new branch] gh/huydhn/5/next -> origin/gh/huydhn/5/next 2025-09-07T09:36:19.6470773Z * [new branch] gh/huydhn/6/next -> origin/gh/huydhn/6/next 2025-09-07T09:36:19.6473499Z * [new branch] gh/int3/97/base -> origin/gh/int3/97/base 2025-09-07T09:36:19.6475360Z * [new branch] gh/int3/97/head -> origin/gh/int3/97/head 2025-09-07T09:36:19.6477962Z * [new branch] gh/isuruf/101/base -> origin/gh/isuruf/101/base 2025-09-07T09:36:19.6479540Z * [new branch] gh/isuruf/101/head -> origin/gh/isuruf/101/head 2025-09-07T09:36:19.6481853Z * [new branch] gh/isuruf/141/base -> origin/gh/isuruf/141/base 2025-09-07T09:36:19.6483505Z * [new branch] gh/isuruf/141/head -> origin/gh/isuruf/141/head 2025-09-07T09:36:19.6485190Z * [new branch] gh/isuruf/141/orig -> origin/gh/isuruf/141/orig 2025-09-07T09:36:19.6491819Z * [new branch] gh/isuruf/142/base -> origin/gh/isuruf/142/base 2025-09-07T09:36:19.6493404Z * [new branch] gh/isuruf/142/head -> origin/gh/isuruf/142/head 2025-09-07T09:36:19.6495081Z * [new branch] gh/isuruf/142/orig -> origin/gh/isuruf/142/orig 2025-09-07T09:36:19.6532074Z * [new branch] gh/isuruf/143/base -> origin/gh/isuruf/143/base 2025-09-07T09:36:19.6533540Z * [new branch] gh/isuruf/143/head -> origin/gh/isuruf/143/head 2025-09-07T09:36:19.6535245Z * [new branch] gh/isuruf/143/orig -> origin/gh/isuruf/143/orig 2025-09-07T09:36:19.6537623Z * [new branch] gh/isuruf/144/base -> origin/gh/isuruf/144/base 2025-09-07T09:36:19.6539096Z * [new branch] gh/isuruf/144/head -> origin/gh/isuruf/144/head 2025-09-07T09:36:19.6540723Z * [new branch] gh/isuruf/144/orig -> origin/gh/isuruf/144/orig 2025-09-07T09:36:19.6543127Z * [new branch] gh/isuruf/145/base -> origin/gh/isuruf/145/base 2025-09-07T09:36:19.6544675Z * [new branch] gh/isuruf/145/head -> origin/gh/isuruf/145/head 2025-09-07T09:36:19.6546505Z * [new branch] gh/isuruf/145/orig -> origin/gh/isuruf/145/orig 2025-09-07T09:36:19.6548809Z * [new branch] gh/isuruf/146/base -> origin/gh/isuruf/146/base 2025-09-07T09:36:19.6585468Z * [new branch] gh/isuruf/146/head -> origin/gh/isuruf/146/head 2025-09-07T09:36:19.6587067Z * [new branch] gh/isuruf/146/orig -> origin/gh/isuruf/146/orig 2025-09-07T09:36:19.6589367Z * [new branch] gh/isuruf/81/base -> origin/gh/isuruf/81/base 2025-09-07T09:36:19.6590789Z * [new branch] gh/isuruf/81/head -> origin/gh/isuruf/81/head 2025-09-07T09:36:19.6592353Z * [new branch] gh/isuruf/81/orig -> origin/gh/isuruf/81/orig 2025-09-07T09:36:19.6595643Z * [new branch] gh/jamesjwu/150/base -> origin/gh/jamesjwu/150/base 2025-09-07T09:36:19.6597209Z * [new branch] gh/jamesjwu/150/head -> origin/gh/jamesjwu/150/head 2025-09-07T09:36:19.6598781Z * [new branch] gh/jamesjwu/150/orig -> origin/gh/jamesjwu/150/orig 2025-09-07T09:36:19.6601151Z * [new branch] gh/jamesjwu/154/base -> origin/gh/jamesjwu/154/base 2025-09-07T09:36:19.6602607Z * [new branch] gh/jamesjwu/154/head -> origin/gh/jamesjwu/154/head 2025-09-07T09:36:19.6604149Z * [new branch] gh/jamesjwu/154/orig -> origin/gh/jamesjwu/154/orig 2025-09-07T09:36:19.6606740Z * [new branch] gh/jamesjwu/155/base -> origin/gh/jamesjwu/155/base 2025-09-07T09:36:19.6608213Z * [new branch] gh/jamesjwu/155/head -> origin/gh/jamesjwu/155/head 2025-09-07T09:36:19.6609776Z * [new branch] gh/jamesjwu/155/orig -> origin/gh/jamesjwu/155/orig 2025-09-07T09:36:19.6612073Z * [new branch] gh/jamesjwu/159/base -> origin/gh/jamesjwu/159/base 2025-09-07T09:36:19.6613694Z * [new branch] gh/jamesjwu/159/head -> origin/gh/jamesjwu/159/head 2025-09-07T09:36:19.6615486Z * [new branch] gh/jamesjwu/159/orig -> origin/gh/jamesjwu/159/orig 2025-09-07T09:36:19.6617918Z * [new branch] gh/jamesjwu/163/base -> origin/gh/jamesjwu/163/base 2025-09-07T09:36:19.6619628Z * [new branch] gh/jamesjwu/163/head -> origin/gh/jamesjwu/163/head 2025-09-07T09:36:19.6621051Z * [new branch] gh/jamesjwu/163/orig -> origin/gh/jamesjwu/163/orig 2025-09-07T09:36:19.6623486Z * [new branch] gh/jamesjwu/171/base -> origin/gh/jamesjwu/171/base 2025-09-07T09:36:19.6625192Z * [new branch] gh/jamesjwu/171/head -> origin/gh/jamesjwu/171/head 2025-09-07T09:36:19.6626900Z * [new branch] gh/jamesjwu/171/orig -> origin/gh/jamesjwu/171/orig 2025-09-07T09:36:19.6628948Z * [new branch] gh/jamesjwu/176/base -> origin/gh/jamesjwu/176/base 2025-09-07T09:36:19.6630453Z * [new branch] gh/jamesjwu/176/head -> origin/gh/jamesjwu/176/head 2025-09-07T09:36:19.6632119Z * [new branch] gh/jamesjwu/176/orig -> origin/gh/jamesjwu/176/orig 2025-09-07T09:36:19.6634684Z * [new branch] gh/jamesjwu/181/base -> origin/gh/jamesjwu/181/base 2025-09-07T09:36:19.6636177Z * [new branch] gh/jamesjwu/181/head -> origin/gh/jamesjwu/181/head 2025-09-07T09:36:19.6637715Z * [new branch] gh/jamesjwu/181/orig -> origin/gh/jamesjwu/181/orig 2025-09-07T09:36:19.6640009Z * [new branch] gh/jamesjwu/182/base -> origin/gh/jamesjwu/182/base 2025-09-07T09:36:19.6641581Z * [new branch] gh/jamesjwu/182/head -> origin/gh/jamesjwu/182/head 2025-09-07T09:36:19.6643124Z * [new branch] gh/jamesjwu/182/orig -> origin/gh/jamesjwu/182/orig 2025-09-07T09:36:19.6645538Z * [new branch] gh/jamesjwu/183/base -> origin/gh/jamesjwu/183/base 2025-09-07T09:36:19.6647254Z * [new branch] gh/jamesjwu/183/head -> origin/gh/jamesjwu/183/head 2025-09-07T09:36:19.6649022Z * [new branch] gh/jamesjwu/183/orig -> origin/gh/jamesjwu/183/orig 2025-09-07T09:36:19.6651299Z * [new branch] gh/jamesjwu/184/base -> origin/gh/jamesjwu/184/base 2025-09-07T09:36:19.6652810Z * [new branch] gh/jamesjwu/184/head -> origin/gh/jamesjwu/184/head 2025-09-07T09:36:19.6654429Z * [new branch] gh/jamesjwu/184/orig -> origin/gh/jamesjwu/184/orig 2025-09-07T09:36:19.6657040Z * [new branch] gh/jamesjwu/185/base -> origin/gh/jamesjwu/185/base 2025-09-07T09:36:19.6658563Z * [new branch] gh/jamesjwu/185/head -> origin/gh/jamesjwu/185/head 2025-09-07T09:36:19.6660174Z * [new branch] gh/jamesjwu/185/orig -> origin/gh/jamesjwu/185/orig 2025-09-07T09:36:19.6662626Z * [new branch] gh/jamesjwu/186/base -> origin/gh/jamesjwu/186/base 2025-09-07T09:36:19.6664186Z * [new branch] gh/jamesjwu/186/head -> origin/gh/jamesjwu/186/head 2025-09-07T09:36:19.6666074Z * [new branch] gh/jamesjwu/186/orig -> origin/gh/jamesjwu/186/orig 2025-09-07T09:36:19.6668403Z * [new branch] gh/jamesjwu/187/base -> origin/gh/jamesjwu/187/base 2025-09-07T09:36:19.6669933Z * [new branch] gh/jamesjwu/187/head -> origin/gh/jamesjwu/187/head 2025-09-07T09:36:19.6671483Z * [new branch] gh/jamesjwu/187/orig -> origin/gh/jamesjwu/187/orig 2025-09-07T09:36:19.6673866Z * [new branch] gh/jamesjwu/188/base -> origin/gh/jamesjwu/188/base 2025-09-07T09:36:19.6675705Z * [new branch] gh/jamesjwu/188/head -> origin/gh/jamesjwu/188/head 2025-09-07T09:36:19.6677272Z * [new branch] gh/jamesjwu/188/orig -> origin/gh/jamesjwu/188/orig 2025-09-07T09:36:19.6679561Z * [new branch] gh/jamesjwu/189/base -> origin/gh/jamesjwu/189/base 2025-09-07T09:36:19.6681245Z * [new branch] gh/jamesjwu/189/head -> origin/gh/jamesjwu/189/head 2025-09-07T09:36:19.6682579Z * [new branch] gh/jamesjwu/189/orig -> origin/gh/jamesjwu/189/orig 2025-09-07T09:36:19.6684776Z * [new branch] gh/jamesjwu/190/base -> origin/gh/jamesjwu/190/base 2025-09-07T09:36:19.6686724Z * [new branch] gh/jamesjwu/190/head -> origin/gh/jamesjwu/190/head 2025-09-07T09:36:19.6688241Z * [new branch] gh/jamesjwu/190/orig -> origin/gh/jamesjwu/190/orig 2025-09-07T09:36:19.6690560Z * [new branch] gh/jamesjwu/52/base -> origin/gh/jamesjwu/52/base 2025-09-07T09:36:19.6692253Z * [new branch] gh/jamesjwu/52/head -> origin/gh/jamesjwu/52/head 2025-09-07T09:36:19.6694240Z * [new branch] gh/jamesjwu/53/base -> origin/gh/jamesjwu/53/base 2025-09-07T09:36:19.6696151Z * [new branch] gh/jamesjwu/53/head -> origin/gh/jamesjwu/53/head 2025-09-07T09:36:19.6698273Z * [new branch] gh/jamesjwu/54/base -> origin/gh/jamesjwu/54/base 2025-09-07T09:36:19.6699760Z * [new branch] gh/jamesjwu/54/head -> origin/gh/jamesjwu/54/head 2025-09-07T09:36:19.6701889Z * [new branch] gh/jamesjwu/55/base -> origin/gh/jamesjwu/55/base 2025-09-07T09:36:19.6703468Z * [new branch] gh/jamesjwu/55/head -> origin/gh/jamesjwu/55/head 2025-09-07T09:36:19.6705874Z * [new branch] gh/jamesjwu/56/base -> origin/gh/jamesjwu/56/base 2025-09-07T09:36:19.6707302Z * [new branch] gh/jamesjwu/56/head -> origin/gh/jamesjwu/56/head 2025-09-07T09:36:19.6709464Z * [new branch] gh/jamesjwu/57/base -> origin/gh/jamesjwu/57/base 2025-09-07T09:36:19.6710874Z * [new branch] gh/jamesjwu/57/head -> origin/gh/jamesjwu/57/head 2025-09-07T09:36:19.6712937Z * [new branch] gh/jamesjwu/58/base -> origin/gh/jamesjwu/58/base 2025-09-07T09:36:19.6714448Z * [new branch] gh/jamesjwu/58/head -> origin/gh/jamesjwu/58/head 2025-09-07T09:36:19.6716870Z * [new branch] gh/jamesjwu/59/base -> origin/gh/jamesjwu/59/base 2025-09-07T09:36:19.6718361Z * [new branch] gh/jamesjwu/59/head -> origin/gh/jamesjwu/59/head 2025-09-07T09:36:19.6720395Z * [new branch] gh/jamesjwu/60/base -> origin/gh/jamesjwu/60/base 2025-09-07T09:36:19.6722064Z * [new branch] gh/jamesjwu/60/head -> origin/gh/jamesjwu/60/head 2025-09-07T09:36:19.6724018Z * [new branch] gh/jamesjwu/61/base -> origin/gh/jamesjwu/61/base 2025-09-07T09:36:19.6725991Z * [new branch] gh/jamesjwu/61/head -> origin/gh/jamesjwu/61/head 2025-09-07T09:36:19.6728083Z * [new branch] gh/jamesjwu/62/base -> origin/gh/jamesjwu/62/base 2025-09-07T09:36:19.6729616Z * [new branch] gh/jamesjwu/62/head -> origin/gh/jamesjwu/62/head 2025-09-07T09:36:19.6731614Z * [new branch] gh/jamesjwu/63/base -> origin/gh/jamesjwu/63/base 2025-09-07T09:36:19.6733164Z * [new branch] gh/jamesjwu/63/head -> origin/gh/jamesjwu/63/head 2025-09-07T09:36:19.6735558Z * [new branch] gh/jamesjwu/64/base -> origin/gh/jamesjwu/64/base 2025-09-07T09:36:19.6737276Z * [new branch] gh/jamesjwu/64/head -> origin/gh/jamesjwu/64/head 2025-09-07T09:36:19.6739438Z * [new branch] gh/jamesjwu/65/base -> origin/gh/jamesjwu/65/base 2025-09-07T09:36:19.6740953Z * [new branch] gh/jamesjwu/65/head -> origin/gh/jamesjwu/65/head 2025-09-07T09:36:19.6744110Z * [new branch] gh/janeyx99/165/base -> origin/gh/janeyx99/165/base 2025-09-07T09:36:19.6745930Z * [new branch] gh/janeyx99/165/head -> origin/gh/janeyx99/165/head 2025-09-07T09:36:19.6747725Z * [new branch] gh/janeyx99/165/orig -> origin/gh/janeyx99/165/orig 2025-09-07T09:36:19.6749652Z * [new branch] gh/janeyx99/201/base -> origin/gh/janeyx99/201/base 2025-09-07T09:36:19.6751180Z * [new branch] gh/janeyx99/201/head -> origin/gh/janeyx99/201/head 2025-09-07T09:36:19.6752735Z * [new branch] gh/janeyx99/201/orig -> origin/gh/janeyx99/201/orig 2025-09-07T09:36:19.6755252Z * [new branch] gh/janeyx99/225/base -> origin/gh/janeyx99/225/base 2025-09-07T09:36:19.6756961Z * [new branch] gh/janeyx99/225/head -> origin/gh/janeyx99/225/head 2025-09-07T09:36:19.6758481Z * [new branch] gh/janeyx99/225/orig -> origin/gh/janeyx99/225/orig 2025-09-07T09:36:19.6760720Z * [new branch] gh/janeyx99/296/base -> origin/gh/janeyx99/296/base 2025-09-07T09:36:19.6762318Z * [new branch] gh/janeyx99/296/head -> origin/gh/janeyx99/296/head 2025-09-07T09:36:19.6763806Z * [new branch] gh/janeyx99/296/orig -> origin/gh/janeyx99/296/orig 2025-09-07T09:36:19.6766250Z * [new branch] gh/janeyx99/297/base -> origin/gh/janeyx99/297/base 2025-09-07T09:36:19.6767763Z * [new branch] gh/janeyx99/297/head -> origin/gh/janeyx99/297/head 2025-09-07T09:36:19.6769265Z * [new branch] gh/janeyx99/297/orig -> origin/gh/janeyx99/297/orig 2025-09-07T09:36:19.6771491Z * [new branch] gh/janeyx99/298/base -> origin/gh/janeyx99/298/base 2025-09-07T09:36:19.6773069Z * [new branch] gh/janeyx99/298/head -> origin/gh/janeyx99/298/head 2025-09-07T09:36:19.6774498Z * [new branch] gh/janeyx99/298/orig -> origin/gh/janeyx99/298/orig 2025-09-07T09:36:19.6777135Z * [new branch] gh/janeyx99/299/base -> origin/gh/janeyx99/299/base 2025-09-07T09:36:19.6778641Z * [new branch] gh/janeyx99/299/head -> origin/gh/janeyx99/299/head 2025-09-07T09:36:19.6780116Z * [new branch] gh/janeyx99/299/orig -> origin/gh/janeyx99/299/orig 2025-09-07T09:36:19.6782609Z * [new branch] gh/janeyx99/300/base -> origin/gh/janeyx99/300/base 2025-09-07T09:36:19.6784421Z * [new branch] gh/janeyx99/300/head -> origin/gh/janeyx99/300/head 2025-09-07T09:36:19.6786238Z * [new branch] gh/janeyx99/300/orig -> origin/gh/janeyx99/300/orig 2025-09-07T09:36:19.6788386Z * [new branch] gh/janeyx99/301/base -> origin/gh/janeyx99/301/base 2025-09-07T09:36:19.6789878Z * [new branch] gh/janeyx99/301/head -> origin/gh/janeyx99/301/head 2025-09-07T09:36:19.6791435Z * [new branch] gh/janeyx99/301/orig -> origin/gh/janeyx99/301/orig 2025-09-07T09:36:19.6793614Z * [new branch] gh/janeyx99/302/base -> origin/gh/janeyx99/302/base 2025-09-07T09:36:19.6795400Z * [new branch] gh/janeyx99/302/head -> origin/gh/janeyx99/302/head 2025-09-07T09:36:19.6797618Z * [new branch] gh/janeyx99/303/base -> origin/gh/janeyx99/303/base 2025-09-07T09:36:19.6799082Z * [new branch] gh/janeyx99/303/head -> origin/gh/janeyx99/303/head 2025-09-07T09:36:19.6801358Z * [new branch] gh/janeyx99/88/base -> origin/gh/janeyx99/88/base 2025-09-07T09:36:19.6802942Z * [new branch] gh/janeyx99/88/head -> origin/gh/janeyx99/88/head 2025-09-07T09:36:19.6804459Z * [new branch] gh/janeyx99/88/orig -> origin/gh/janeyx99/88/orig 2025-09-07T09:36:19.6807508Z * [new branch] gh/jansel/360/base -> origin/gh/jansel/360/base 2025-09-07T09:36:19.6809001Z * [new branch] gh/jansel/360/head -> origin/gh/jansel/360/head 2025-09-07T09:36:19.6811321Z * [new branch] gh/jansel/451/base -> origin/gh/jansel/451/base 2025-09-07T09:36:19.6813022Z * [new branch] gh/jansel/451/head -> origin/gh/jansel/451/head 2025-09-07T09:36:19.6814359Z * [new branch] gh/jansel/451/orig -> origin/gh/jansel/451/orig 2025-09-07T09:36:19.6817251Z * [new branch] gh/jansel/462/base -> origin/gh/jansel/462/base 2025-09-07T09:36:19.6818642Z * [new branch] gh/jansel/462/head -> origin/gh/jansel/462/head 2025-09-07T09:36:19.6820482Z * [new branch] gh/jansel/462/orig -> origin/gh/jansel/462/orig 2025-09-07T09:36:19.6822780Z * [new branch] gh/jansel/531/base -> origin/gh/jansel/531/base 2025-09-07T09:36:19.6824403Z * [new branch] gh/jansel/531/head -> origin/gh/jansel/531/head 2025-09-07T09:36:19.6826349Z * [new branch] gh/jansel/531/orig -> origin/gh/jansel/531/orig 2025-09-07T09:36:19.6829145Z * [new branch] gh/jbschlosser/208/head -> origin/gh/jbschlosser/208/head 2025-09-07T09:36:19.6831394Z * [new branch] gh/jbschlosser/247/base -> origin/gh/jbschlosser/247/base 2025-09-07T09:36:19.6832900Z * [new branch] gh/jbschlosser/247/head -> origin/gh/jbschlosser/247/head 2025-09-07T09:36:19.6834605Z * [new branch] gh/jbschlosser/247/orig -> origin/gh/jbschlosser/247/orig 2025-09-07T09:36:19.6837100Z * [new branch] gh/jbschlosser/248/base -> origin/gh/jbschlosser/248/base 2025-09-07T09:36:19.6838752Z * [new branch] gh/jbschlosser/248/head -> origin/gh/jbschlosser/248/head 2025-09-07T09:36:19.6840262Z * [new branch] gh/jbschlosser/248/orig -> origin/gh/jbschlosser/248/orig 2025-09-07T09:36:19.6842570Z * [new branch] gh/jbschlosser/250/base -> origin/gh/jbschlosser/250/base 2025-09-07T09:36:19.6844120Z * [new branch] gh/jbschlosser/250/head -> origin/gh/jbschlosser/250/head 2025-09-07T09:36:19.6846056Z * [new branch] gh/jbschlosser/250/orig -> origin/gh/jbschlosser/250/orig 2025-09-07T09:36:19.6848855Z * [new branch] gh/jiayisunx/59/base -> origin/gh/jiayisunx/59/base 2025-09-07T09:36:19.6850392Z * [new branch] gh/jiayisunx/59/head -> origin/gh/jiayisunx/59/head 2025-09-07T09:36:19.6851914Z * [new branch] gh/jiayisunx/59/orig -> origin/gh/jiayisunx/59/orig 2025-09-07T09:36:19.6854032Z * [new branch] gh/jiayisunx/61/base -> origin/gh/jiayisunx/61/base 2025-09-07T09:36:19.6855845Z * [new branch] gh/jiayisunx/61/head -> origin/gh/jiayisunx/61/head 2025-09-07T09:36:19.6857414Z * [new branch] gh/jiayisunx/61/orig -> origin/gh/jiayisunx/61/orig 2025-09-07T09:36:19.6859616Z * [new branch] gh/jiayisunx/64/base -> origin/gh/jiayisunx/64/base 2025-09-07T09:36:19.6861102Z * [new branch] gh/jiayisunx/64/head -> origin/gh/jiayisunx/64/head 2025-09-07T09:36:19.6862819Z * [new branch] gh/jiayisunx/64/orig -> origin/gh/jiayisunx/64/orig 2025-09-07T09:36:19.6865896Z * [new branch] gh/jiayisunx/65/base -> origin/gh/jiayisunx/65/base 2025-09-07T09:36:19.6867443Z * [new branch] gh/jiayisunx/65/head -> origin/gh/jiayisunx/65/head 2025-09-07T09:36:19.6869026Z * [new branch] gh/jiayisunx/65/orig -> origin/gh/jiayisunx/65/orig 2025-09-07T09:36:19.6871159Z * [new branch] gh/jiayisunx/66/base -> origin/gh/jiayisunx/66/base 2025-09-07T09:36:19.6872715Z * [new branch] gh/jiayisunx/66/head -> origin/gh/jiayisunx/66/head 2025-09-07T09:36:19.6874261Z * [new branch] gh/jiayisunx/66/orig -> origin/gh/jiayisunx/66/orig 2025-09-07T09:36:19.6876795Z * [new branch] gh/jiayisunx/67/base -> origin/gh/jiayisunx/67/base 2025-09-07T09:36:19.6878299Z * [new branch] gh/jiayisunx/67/head -> origin/gh/jiayisunx/67/head 2025-09-07T09:36:19.6880016Z * [new branch] gh/jiayisunx/67/orig -> origin/gh/jiayisunx/67/orig 2025-09-07T09:36:19.6882015Z * [new branch] gh/jiayisunx/68/base -> origin/gh/jiayisunx/68/base 2025-09-07T09:36:19.6883531Z * [new branch] gh/jiayisunx/68/head -> origin/gh/jiayisunx/68/head 2025-09-07T09:36:19.6885292Z * [new branch] gh/jiayisunx/68/orig -> origin/gh/jiayisunx/68/orig 2025-09-07T09:36:19.6887648Z * [new branch] gh/jiayisunx/69/base -> origin/gh/jiayisunx/69/base 2025-09-07T09:36:19.6889178Z * [new branch] gh/jiayisunx/69/head -> origin/gh/jiayisunx/69/head 2025-09-07T09:36:19.6890644Z * [new branch] gh/jiayisunx/69/orig -> origin/gh/jiayisunx/69/orig 2025-09-07T09:36:19.6892859Z * [new branch] gh/jiayisunx/70/base -> origin/gh/jiayisunx/70/base 2025-09-07T09:36:19.6894389Z * [new branch] gh/jiayisunx/70/head -> origin/gh/jiayisunx/70/head 2025-09-07T09:36:19.6896219Z * [new branch] gh/jiayisunx/70/orig -> origin/gh/jiayisunx/70/orig 2025-09-07T09:36:19.6898372Z * [new branch] gh/jiayisunx/71/base -> origin/gh/jiayisunx/71/base 2025-09-07T09:36:19.6900004Z * [new branch] gh/jiayisunx/71/head -> origin/gh/jiayisunx/71/head 2025-09-07T09:36:19.6901567Z * [new branch] gh/jiayisunx/71/orig -> origin/gh/jiayisunx/71/orig 2025-09-07T09:36:19.6903998Z * [new branch] gh/jiayisunx/72/base -> origin/gh/jiayisunx/72/base 2025-09-07T09:36:19.6905740Z * [new branch] gh/jiayisunx/72/head -> origin/gh/jiayisunx/72/head 2025-09-07T09:36:19.6907258Z * [new branch] gh/jiayisunx/72/orig -> origin/gh/jiayisunx/72/orig 2025-09-07T09:36:19.6909553Z * [new branch] gh/jiayisunx/73/base -> origin/gh/jiayisunx/73/base 2025-09-07T09:36:19.6911115Z * [new branch] gh/jiayisunx/73/head -> origin/gh/jiayisunx/73/head 2025-09-07T09:36:19.6912679Z * [new branch] gh/jiayisunx/73/orig -> origin/gh/jiayisunx/73/orig 2025-09-07T09:36:19.6915192Z * [new branch] gh/jiayisunx/74/base -> origin/gh/jiayisunx/74/base 2025-09-07T09:36:19.6916736Z * [new branch] gh/jiayisunx/74/head -> origin/gh/jiayisunx/74/head 2025-09-07T09:36:19.6918207Z * [new branch] gh/jiayisunx/74/orig -> origin/gh/jiayisunx/74/orig 2025-09-07T09:36:19.6920394Z * [new branch] gh/jiayisunx/75/base -> origin/gh/jiayisunx/75/base 2025-09-07T09:36:19.6922107Z * [new branch] gh/jiayisunx/75/head -> origin/gh/jiayisunx/75/head 2025-09-07T09:36:19.6923556Z * [new branch] gh/jiayisunx/75/orig -> origin/gh/jiayisunx/75/orig 2025-09-07T09:36:19.6925987Z * [new branch] gh/jiayisunx/76/base -> origin/gh/jiayisunx/76/base 2025-09-07T09:36:19.6927443Z * [new branch] gh/jiayisunx/76/head -> origin/gh/jiayisunx/76/head 2025-09-07T09:36:19.6929020Z * [new branch] gh/jiayisunx/76/orig -> origin/gh/jiayisunx/76/orig 2025-09-07T09:36:19.6931744Z * [new branch] gh/jjwu@meta.com/1/base -> origin/gh/jjwu@meta.com/1/base 2025-09-07T09:36:19.6933272Z * [new branch] gh/jjwu@meta.com/1/head -> origin/gh/jjwu@meta.com/1/head 2025-09-07T09:36:19.6936260Z * [new branch] gh/justinchuby/111/base -> origin/gh/justinchuby/111/base 2025-09-07T09:36:19.6938159Z * [new branch] gh/justinchuby/111/head -> origin/gh/justinchuby/111/head 2025-09-07T09:36:19.6939675Z * [new branch] gh/justinchuby/111/orig -> origin/gh/justinchuby/111/orig 2025-09-07T09:36:19.6941984Z * [new branch] gh/justinchuby/112/base -> origin/gh/justinchuby/112/base 2025-09-07T09:36:19.6943492Z * [new branch] gh/justinchuby/112/head -> origin/gh/justinchuby/112/head 2025-09-07T09:36:19.6945368Z * [new branch] gh/justinchuby/112/orig -> origin/gh/justinchuby/112/orig 2025-09-07T09:36:19.6947536Z * [new branch] gh/justinchuby/113/base -> origin/gh/justinchuby/113/base 2025-09-07T09:36:19.6949109Z * [new branch] gh/justinchuby/113/head -> origin/gh/justinchuby/113/head 2025-09-07T09:36:19.6950659Z * [new branch] gh/justinchuby/113/orig -> origin/gh/justinchuby/113/orig 2025-09-07T09:36:19.6952831Z * [new branch] gh/justinchuby/114/base -> origin/gh/justinchuby/114/base 2025-09-07T09:36:19.6954459Z * [new branch] gh/justinchuby/114/head -> origin/gh/justinchuby/114/head 2025-09-07T09:36:19.6956214Z * [new branch] gh/justinchuby/114/orig -> origin/gh/justinchuby/114/orig 2025-09-07T09:36:19.6958472Z * [new branch] gh/justinchuby/115/base -> origin/gh/justinchuby/115/base 2025-09-07T09:36:19.6959980Z * [new branch] gh/justinchuby/115/head -> origin/gh/justinchuby/115/head 2025-09-07T09:36:19.6961442Z * [new branch] gh/justinchuby/115/orig -> origin/gh/justinchuby/115/orig 2025-09-07T09:36:19.6964189Z * [new branch] gh/karthickai/1/base -> origin/gh/karthickai/1/base 2025-09-07T09:36:19.6966030Z * [new branch] gh/karthickai/1/head -> origin/gh/karthickai/1/head 2025-09-07T09:36:19.6967572Z * [new branch] gh/karthickai/1/orig -> origin/gh/karthickai/1/orig 2025-09-07T09:36:19.6969693Z * [new branch] gh/karthickai/2/base -> origin/gh/karthickai/2/base 2025-09-07T09:36:19.6971206Z * [new branch] gh/karthickai/2/head -> origin/gh/karthickai/2/head 2025-09-07T09:36:19.6972736Z * [new branch] gh/karthickai/2/orig -> origin/gh/karthickai/2/orig 2025-09-07T09:36:19.6975779Z * [new branch] gh/kurtamohler/32/base -> origin/gh/kurtamohler/32/base 2025-09-07T09:36:19.6977379Z * [new branch] gh/kurtamohler/32/head -> origin/gh/kurtamohler/32/head 2025-09-07T09:36:19.6978828Z * [new branch] gh/kurtamohler/32/orig -> origin/gh/kurtamohler/32/orig 2025-09-07T09:36:19.6981086Z * [new branch] gh/kurtamohler/33/base -> origin/gh/kurtamohler/33/base 2025-09-07T09:36:19.6982708Z * [new branch] gh/kurtamohler/33/head -> origin/gh/kurtamohler/33/head 2025-09-07T09:36:19.6984290Z * [new branch] gh/kurtamohler/33/orig -> origin/gh/kurtamohler/33/orig 2025-09-07T09:36:19.6986875Z * [new branch] gh/kurtamohler/34/base -> origin/gh/kurtamohler/34/base 2025-09-07T09:36:19.6988337Z * [new branch] gh/kurtamohler/34/head -> origin/gh/kurtamohler/34/head 2025-09-07T09:36:19.6989955Z * [new branch] gh/kurtamohler/34/orig -> origin/gh/kurtamohler/34/orig 2025-09-07T09:36:19.6992132Z * [new branch] gh/kurtamohler/41/base -> origin/gh/kurtamohler/41/base 2025-09-07T09:36:19.6993759Z * [new branch] gh/kurtamohler/41/head -> origin/gh/kurtamohler/41/head 2025-09-07T09:36:19.6995370Z * [new branch] gh/kurtamohler/41/orig -> origin/gh/kurtamohler/41/orig 2025-09-07T09:36:19.6997677Z * [new branch] gh/kurtamohler/46/base -> origin/gh/kurtamohler/46/base 2025-09-07T09:36:19.6999198Z * [new branch] gh/kurtamohler/46/head -> origin/gh/kurtamohler/46/head 2025-09-07T09:36:19.7000722Z * [new branch] gh/kurtamohler/46/orig -> origin/gh/kurtamohler/46/orig 2025-09-07T09:36:19.7002817Z * [new branch] gh/kurtamohler/47/base -> origin/gh/kurtamohler/47/base 2025-09-07T09:36:19.7004430Z * [new branch] gh/kurtamohler/47/head -> origin/gh/kurtamohler/47/head 2025-09-07T09:36:19.7006329Z * [new branch] gh/kurtamohler/47/orig -> origin/gh/kurtamohler/47/orig 2025-09-07T09:36:19.7008510Z * [new branch] gh/kurtamohler/48/base -> origin/gh/kurtamohler/48/base 2025-09-07T09:36:19.7010287Z * [new branch] gh/kurtamohler/48/head -> origin/gh/kurtamohler/48/head 2025-09-07T09:36:19.7011696Z * [new branch] gh/kurtamohler/48/orig -> origin/gh/kurtamohler/48/orig 2025-09-07T09:36:19.7013891Z * [new branch] gh/kurtamohler/49/base -> origin/gh/kurtamohler/49/base 2025-09-07T09:36:19.7015479Z * [new branch] gh/kurtamohler/49/head -> origin/gh/kurtamohler/49/head 2025-09-07T09:36:19.7017247Z * [new branch] gh/kurtamohler/49/orig -> origin/gh/kurtamohler/49/orig 2025-09-07T09:36:19.7019436Z * [new branch] gh/kurtamohler/50/base -> origin/gh/kurtamohler/50/base 2025-09-07T09:36:19.7020960Z * [new branch] gh/kurtamohler/50/head -> origin/gh/kurtamohler/50/head 2025-09-07T09:36:19.7022596Z * [new branch] gh/kurtamohler/50/orig -> origin/gh/kurtamohler/50/orig 2025-09-07T09:36:19.7025896Z * [new branch] gh/kwen2501/130/base -> origin/gh/kwen2501/130/base 2025-09-07T09:36:19.7027620Z * [new branch] gh/kwen2501/130/head -> origin/gh/kwen2501/130/head 2025-09-07T09:36:19.7029203Z * [new branch] gh/kwen2501/130/orig -> origin/gh/kwen2501/130/orig 2025-09-07T09:36:19.7031463Z * [new branch] gh/kwen2501/15/base -> origin/gh/kwen2501/15/base 2025-09-07T09:36:19.7033024Z * [new branch] gh/kwen2501/15/head -> origin/gh/kwen2501/15/head 2025-09-07T09:36:19.7035410Z * [new branch] gh/kwen2501/156/base -> origin/gh/kwen2501/156/base 2025-09-07T09:36:19.7036949Z * [new branch] gh/kwen2501/156/head -> origin/gh/kwen2501/156/head 2025-09-07T09:36:19.7038533Z * [new branch] gh/kwen2501/156/orig -> origin/gh/kwen2501/156/orig 2025-09-07T09:36:19.7040731Z * [new branch] gh/kwen2501/170/base -> origin/gh/kwen2501/170/base 2025-09-07T09:36:19.7042261Z * [new branch] gh/kwen2501/170/head -> origin/gh/kwen2501/170/head 2025-09-07T09:36:19.7044521Z * [new branch] gh/kwen2501/186/base -> origin/gh/kwen2501/186/base 2025-09-07T09:36:19.7046502Z * [new branch] gh/kwen2501/186/head -> origin/gh/kwen2501/186/head 2025-09-07T09:36:19.7047990Z * [new branch] gh/kwen2501/186/orig -> origin/gh/kwen2501/186/orig 2025-09-07T09:36:19.7050008Z * [new branch] gh/kwen2501/187/base -> origin/gh/kwen2501/187/base 2025-09-07T09:36:19.7051616Z * [new branch] gh/kwen2501/187/head -> origin/gh/kwen2501/187/head 2025-09-07T09:36:19.7053099Z * [new branch] gh/kwen2501/187/orig -> origin/gh/kwen2501/187/orig 2025-09-07T09:36:19.7055736Z * [new branch] gh/kwen2501/188/base -> origin/gh/kwen2501/188/base 2025-09-07T09:36:19.7057206Z * [new branch] gh/kwen2501/188/head -> origin/gh/kwen2501/188/head 2025-09-07T09:36:19.7058657Z * [new branch] gh/kwen2501/188/orig -> origin/gh/kwen2501/188/orig 2025-09-07T09:36:19.7060837Z * [new branch] gh/kwen2501/194/base -> origin/gh/kwen2501/194/base 2025-09-07T09:36:19.7062622Z * [new branch] gh/kwen2501/194/head -> origin/gh/kwen2501/194/head 2025-09-07T09:36:19.7064026Z * [new branch] gh/kwen2501/194/orig -> origin/gh/kwen2501/194/orig 2025-09-07T09:36:19.7066638Z * [new branch] gh/kwen2501/199/base -> origin/gh/kwen2501/199/base 2025-09-07T09:36:19.7068176Z * [new branch] gh/kwen2501/199/head -> origin/gh/kwen2501/199/head 2025-09-07T09:36:19.7069658Z * [new branch] gh/kwen2501/199/orig -> origin/gh/kwen2501/199/orig 2025-09-07T09:36:19.7071832Z * [new branch] gh/kwen2501/200/base -> origin/gh/kwen2501/200/base 2025-09-07T09:36:19.7073631Z * [new branch] gh/kwen2501/200/head -> origin/gh/kwen2501/200/head 2025-09-07T09:36:19.7075156Z * [new branch] gh/kwen2501/200/orig -> origin/gh/kwen2501/200/orig 2025-09-07T09:36:19.7077549Z * [new branch] gh/kwen2501/201/base -> origin/gh/kwen2501/201/base 2025-09-07T09:36:19.7078963Z * [new branch] gh/kwen2501/201/head -> origin/gh/kwen2501/201/head 2025-09-07T09:36:19.7080511Z * [new branch] gh/kwen2501/201/orig -> origin/gh/kwen2501/201/orig 2025-09-07T09:36:19.7082834Z * [new branch] gh/kwen2501/203/base -> origin/gh/kwen2501/203/base 2025-09-07T09:36:19.7084311Z * [new branch] gh/kwen2501/203/head -> origin/gh/kwen2501/203/head 2025-09-07T09:36:19.7086130Z * [new branch] gh/kwen2501/203/orig -> origin/gh/kwen2501/203/orig 2025-09-07T09:36:19.7088328Z * [new branch] gh/kwen2501/204/base -> origin/gh/kwen2501/204/base 2025-09-07T09:36:19.7089956Z * [new branch] gh/kwen2501/204/head -> origin/gh/kwen2501/204/head 2025-09-07T09:36:19.7091391Z * [new branch] gh/kwen2501/204/orig -> origin/gh/kwen2501/204/orig 2025-09-07T09:36:19.7093551Z * [new branch] gh/kwen2501/205/base -> origin/gh/kwen2501/205/base 2025-09-07T09:36:19.7095266Z * [new branch] gh/kwen2501/205/head -> origin/gh/kwen2501/205/head 2025-09-07T09:36:19.7097191Z * [new branch] gh/kwen2501/205/orig -> origin/gh/kwen2501/205/orig 2025-09-07T09:36:19.7099077Z * [new branch] gh/kwen2501/206/base -> origin/gh/kwen2501/206/base 2025-09-07T09:36:19.7100565Z * [new branch] gh/kwen2501/206/head -> origin/gh/kwen2501/206/head 2025-09-07T09:36:19.7102343Z * [new branch] gh/kwen2501/206/orig -> origin/gh/kwen2501/206/orig 2025-09-07T09:36:19.7104648Z * [new branch] gh/kwen2501/207/base -> origin/gh/kwen2501/207/base 2025-09-07T09:36:19.7106450Z * [new branch] gh/kwen2501/207/head -> origin/gh/kwen2501/207/head 2025-09-07T09:36:19.7107952Z * [new branch] gh/kwen2501/207/orig -> origin/gh/kwen2501/207/orig 2025-09-07T09:36:19.7110103Z * [new branch] gh/kwen2501/208/base -> origin/gh/kwen2501/208/base 2025-09-07T09:36:19.7111648Z * [new branch] gh/kwen2501/208/head -> origin/gh/kwen2501/208/head 2025-09-07T09:36:19.7113109Z * [new branch] gh/kwen2501/208/orig -> origin/gh/kwen2501/208/orig 2025-09-07T09:36:19.7115534Z * [new branch] gh/kwen2501/209/base -> origin/gh/kwen2501/209/base 2025-09-07T09:36:19.7117405Z * [new branch] gh/kwen2501/209/head -> origin/gh/kwen2501/209/head 2025-09-07T09:36:19.7118994Z * [new branch] gh/kwen2501/209/orig -> origin/gh/kwen2501/209/orig 2025-09-07T09:36:19.7121184Z * [new branch] gh/kwen2501/210/base -> origin/gh/kwen2501/210/base 2025-09-07T09:36:19.7122708Z * [new branch] gh/kwen2501/210/head -> origin/gh/kwen2501/210/head 2025-09-07T09:36:19.7124223Z * [new branch] gh/kwen2501/210/orig -> origin/gh/kwen2501/210/orig 2025-09-07T09:36:19.7126756Z * [new branch] gh/kwen2501/211/base -> origin/gh/kwen2501/211/base 2025-09-07T09:36:19.7128244Z * [new branch] gh/kwen2501/211/head -> origin/gh/kwen2501/211/head 2025-09-07T09:36:19.7130591Z * [new branch] gh/kwen2501/212/base -> origin/gh/kwen2501/212/base 2025-09-07T09:36:19.7132107Z * [new branch] gh/kwen2501/212/head -> origin/gh/kwen2501/212/head 2025-09-07T09:36:19.7133588Z * [new branch] gh/kwen2501/212/orig -> origin/gh/kwen2501/212/orig 2025-09-07T09:36:19.7136138Z * [new branch] gh/kwen2501/213/base -> origin/gh/kwen2501/213/base 2025-09-07T09:36:19.7137747Z * [new branch] gh/kwen2501/213/head -> origin/gh/kwen2501/213/head 2025-09-07T09:36:19.7139183Z * [new branch] gh/kwen2501/213/orig -> origin/gh/kwen2501/213/orig 2025-09-07T09:36:19.7141559Z * [new branch] gh/kwen2501/214/base -> origin/gh/kwen2501/214/base 2025-09-07T09:36:19.7143233Z * [new branch] gh/kwen2501/214/head -> origin/gh/kwen2501/214/head 2025-09-07T09:36:19.7144686Z * [new branch] gh/kwen2501/214/orig -> origin/gh/kwen2501/214/orig 2025-09-07T09:36:19.7147244Z * [new branch] gh/kwen2501/215/base -> origin/gh/kwen2501/215/base 2025-09-07T09:36:19.7148720Z * [new branch] gh/kwen2501/215/head -> origin/gh/kwen2501/215/head 2025-09-07T09:36:19.7150326Z * [new branch] gh/kwen2501/215/orig -> origin/gh/kwen2501/215/orig 2025-09-07T09:36:19.7152392Z * [new branch] gh/kwen2501/216/base -> origin/gh/kwen2501/216/base 2025-09-07T09:36:19.7154025Z * [new branch] gh/kwen2501/216/head -> origin/gh/kwen2501/216/head 2025-09-07T09:36:19.7155794Z * [new branch] gh/kwen2501/216/orig -> origin/gh/kwen2501/216/orig 2025-09-07T09:36:19.7157984Z * [new branch] gh/kwen2501/217/base -> origin/gh/kwen2501/217/base 2025-09-07T09:36:19.7159579Z * [new branch] gh/kwen2501/217/head -> origin/gh/kwen2501/217/head 2025-09-07T09:36:19.7161171Z * [new branch] gh/kwen2501/217/orig -> origin/gh/kwen2501/217/orig 2025-09-07T09:36:19.7163339Z * [new branch] gh/kwen2501/218/base -> origin/gh/kwen2501/218/base 2025-09-07T09:36:19.7164853Z * [new branch] gh/kwen2501/218/head -> origin/gh/kwen2501/218/head 2025-09-07T09:36:19.7166670Z * [new branch] gh/kwen2501/218/orig -> origin/gh/kwen2501/218/orig 2025-09-07T09:36:19.7168853Z * [new branch] gh/kwen2501/219/base -> origin/gh/kwen2501/219/base 2025-09-07T09:36:19.7170408Z * [new branch] gh/kwen2501/219/head -> origin/gh/kwen2501/219/head 2025-09-07T09:36:19.7172014Z * [new branch] gh/kwen2501/219/orig -> origin/gh/kwen2501/219/orig 2025-09-07T09:36:19.7174194Z * [new branch] gh/kwen2501/220/base -> origin/gh/kwen2501/220/base 2025-09-07T09:36:19.7176151Z * [new branch] gh/kwen2501/220/head -> origin/gh/kwen2501/220/head 2025-09-07T09:36:19.7177621Z * [new branch] gh/kwen2501/220/orig -> origin/gh/kwen2501/220/orig 2025-09-07T09:36:19.7179915Z * [new branch] gh/kwen2501/221/base -> origin/gh/kwen2501/221/base 2025-09-07T09:36:19.7181414Z * [new branch] gh/kwen2501/221/head -> origin/gh/kwen2501/221/head 2025-09-07T09:36:19.7183154Z * [new branch] gh/kwen2501/221/orig -> origin/gh/kwen2501/221/orig 2025-09-07T09:36:19.7185380Z * [new branch] gh/kwen2501/222/base -> origin/gh/kwen2501/222/base 2025-09-07T09:36:19.7187035Z * [new branch] gh/kwen2501/222/head -> origin/gh/kwen2501/222/head 2025-09-07T09:36:19.7188484Z * [new branch] gh/kwen2501/222/orig -> origin/gh/kwen2501/222/orig 2025-09-07T09:36:19.7190787Z * [new branch] gh/kwen2501/223/base -> origin/gh/kwen2501/223/base 2025-09-07T09:36:19.7192305Z * [new branch] gh/kwen2501/223/head -> origin/gh/kwen2501/223/head 2025-09-07T09:36:19.7193859Z * [new branch] gh/kwen2501/223/orig -> origin/gh/kwen2501/223/orig 2025-09-07T09:36:19.7196558Z * [new branch] gh/kwen2501/224/base -> origin/gh/kwen2501/224/base 2025-09-07T09:36:19.7198215Z * [new branch] gh/kwen2501/224/head -> origin/gh/kwen2501/224/head 2025-09-07T09:36:19.7199685Z * [new branch] gh/kwen2501/224/orig -> origin/gh/kwen2501/224/orig 2025-09-07T09:36:19.7202254Z * [new branch] gh/kwen2501/225/base -> origin/gh/kwen2501/225/base 2025-09-07T09:36:19.7203661Z * [new branch] gh/kwen2501/225/head -> origin/gh/kwen2501/225/head 2025-09-07T09:36:19.7205255Z * [new branch] gh/kwen2501/225/orig -> origin/gh/kwen2501/225/orig 2025-09-07T09:36:19.7207597Z * [new branch] gh/kwen2501/226/base -> origin/gh/kwen2501/226/base 2025-09-07T09:36:19.7209230Z * [new branch] gh/kwen2501/226/head -> origin/gh/kwen2501/226/head 2025-09-07T09:36:19.7210784Z * [new branch] gh/kwen2501/226/orig -> origin/gh/kwen2501/226/orig 2025-09-07T09:36:19.7213152Z * [new branch] gh/kwen2501/227/base -> origin/gh/kwen2501/227/base 2025-09-07T09:36:19.7214613Z * [new branch] gh/kwen2501/227/head -> origin/gh/kwen2501/227/head 2025-09-07T09:36:19.7216433Z * [new branch] gh/kwen2501/227/orig -> origin/gh/kwen2501/227/orig 2025-09-07T09:36:19.7218628Z * [new branch] gh/kwen2501/228/base -> origin/gh/kwen2501/228/base 2025-09-07T09:36:19.7220204Z * [new branch] gh/kwen2501/228/head -> origin/gh/kwen2501/228/head 2025-09-07T09:36:19.7221780Z * [new branch] gh/kwen2501/228/orig -> origin/gh/kwen2501/228/orig 2025-09-07T09:36:19.7224082Z * [new branch] gh/kwen2501/229/base -> origin/gh/kwen2501/229/base 2025-09-07T09:36:19.7225859Z * [new branch] gh/kwen2501/229/head -> origin/gh/kwen2501/229/head 2025-09-07T09:36:19.7227420Z * [new branch] gh/kwen2501/229/orig -> origin/gh/kwen2501/229/orig 2025-09-07T09:36:19.7229600Z * [new branch] gh/kwen2501/230/base -> origin/gh/kwen2501/230/base 2025-09-07T09:36:19.7231283Z * [new branch] gh/kwen2501/230/head -> origin/gh/kwen2501/230/head 2025-09-07T09:36:19.7232810Z * [new branch] gh/kwen2501/230/orig -> origin/gh/kwen2501/230/orig 2025-09-07T09:36:19.7235295Z * [new branch] gh/kwen2501/231/base -> origin/gh/kwen2501/231/base 2025-09-07T09:36:19.7236872Z * [new branch] gh/kwen2501/231/head -> origin/gh/kwen2501/231/head 2025-09-07T09:36:19.7238350Z * [new branch] gh/kwen2501/231/orig -> origin/gh/kwen2501/231/orig 2025-09-07T09:36:19.7240622Z * [new branch] gh/kwen2501/232/base -> origin/gh/kwen2501/232/base 2025-09-07T09:36:19.7242060Z * [new branch] gh/kwen2501/232/head -> origin/gh/kwen2501/232/head 2025-09-07T09:36:19.7243636Z * [new branch] gh/kwen2501/232/orig -> origin/gh/kwen2501/232/orig 2025-09-07T09:36:19.7246931Z * [new branch] gh/laithsakka/156/base -> origin/gh/laithsakka/156/base 2025-09-07T09:36:19.7248398Z * [new branch] gh/laithsakka/156/head -> origin/gh/laithsakka/156/head 2025-09-07T09:36:19.7249987Z * [new branch] gh/laithsakka/156/orig -> origin/gh/laithsakka/156/orig 2025-09-07T09:36:19.7252320Z * [new branch] gh/laithsakka/160/base -> origin/gh/laithsakka/160/base 2025-09-07T09:36:19.7253811Z * [new branch] gh/laithsakka/160/head -> origin/gh/laithsakka/160/head 2025-09-07T09:36:19.7255532Z * [new branch] gh/laithsakka/160/orig -> origin/gh/laithsakka/160/orig 2025-09-07T09:36:19.7257844Z * [new branch] gh/laithsakka/178/base -> origin/gh/laithsakka/178/base 2025-09-07T09:36:19.7259389Z * [new branch] gh/laithsakka/178/head -> origin/gh/laithsakka/178/head 2025-09-07T09:36:19.7260965Z * [new branch] gh/laithsakka/178/orig -> origin/gh/laithsakka/178/orig 2025-09-07T09:36:19.7263364Z * [new branch] gh/laithsakka/191/base -> origin/gh/laithsakka/191/base 2025-09-07T09:36:19.7264867Z * [new branch] gh/laithsakka/191/head -> origin/gh/laithsakka/191/head 2025-09-07T09:36:19.7267285Z * [new branch] gh/laithsakka/191/orig -> origin/gh/laithsakka/191/orig 2025-09-07T09:36:19.7269388Z * [new branch] gh/laithsakka/237/base -> origin/gh/laithsakka/237/base 2025-09-07T09:36:19.7270930Z * [new branch] gh/laithsakka/237/head -> origin/gh/laithsakka/237/head 2025-09-07T09:36:19.7272428Z * [new branch] gh/laithsakka/237/orig -> origin/gh/laithsakka/237/orig 2025-09-07T09:36:19.7274660Z * [new branch] gh/laithsakka/249/base -> origin/gh/laithsakka/249/base 2025-09-07T09:36:19.7276512Z * [new branch] gh/laithsakka/249/head -> origin/gh/laithsakka/249/head 2025-09-07T09:36:19.7277933Z * [new branch] gh/laithsakka/249/orig -> origin/gh/laithsakka/249/orig 2025-09-07T09:36:19.7280180Z * [new branch] gh/laithsakka/251/base -> origin/gh/laithsakka/251/base 2025-09-07T09:36:19.7281809Z * [new branch] gh/laithsakka/251/head -> origin/gh/laithsakka/251/head 2025-09-07T09:36:19.7283276Z * [new branch] gh/laithsakka/251/orig -> origin/gh/laithsakka/251/orig 2025-09-07T09:36:19.7285924Z * [new branch] gh/laithsakka/254/base -> origin/gh/laithsakka/254/base 2025-09-07T09:36:19.7287421Z * [new branch] gh/laithsakka/254/head -> origin/gh/laithsakka/254/head 2025-09-07T09:36:19.7288908Z * [new branch] gh/laithsakka/254/orig -> origin/gh/laithsakka/254/orig 2025-09-07T09:36:19.7291254Z * [new branch] gh/laithsakka/255/base -> origin/gh/laithsakka/255/base 2025-09-07T09:36:19.7292724Z * [new branch] gh/laithsakka/255/head -> origin/gh/laithsakka/255/head 2025-09-07T09:36:19.7294163Z * [new branch] gh/laithsakka/255/orig -> origin/gh/laithsakka/255/orig 2025-09-07T09:36:19.7296647Z * [new branch] gh/laithsakka/256/base -> origin/gh/laithsakka/256/base 2025-09-07T09:36:19.7298284Z * [new branch] gh/laithsakka/256/head -> origin/gh/laithsakka/256/head 2025-09-07T09:36:19.7299725Z * [new branch] gh/laithsakka/256/orig -> origin/gh/laithsakka/256/orig 2025-09-07T09:36:19.7302183Z * [new branch] gh/laithsakka/257/base -> origin/gh/laithsakka/257/base 2025-09-07T09:36:19.7303796Z * [new branch] gh/laithsakka/257/head -> origin/gh/laithsakka/257/head 2025-09-07T09:36:19.7305637Z * [new branch] gh/laithsakka/257/orig -> origin/gh/laithsakka/257/orig 2025-09-07T09:36:19.7307946Z * [new branch] gh/laithsakka/258/base -> origin/gh/laithsakka/258/base 2025-09-07T09:36:19.7309423Z * [new branch] gh/laithsakka/258/head -> origin/gh/laithsakka/258/head 2025-09-07T09:36:19.7311027Z * [new branch] gh/laithsakka/258/orig -> origin/gh/laithsakka/258/orig 2025-09-07T09:36:19.7313301Z * [new branch] gh/laithsakka/259/base -> origin/gh/laithsakka/259/base 2025-09-07T09:36:19.7314844Z * [new branch] gh/laithsakka/259/head -> origin/gh/laithsakka/259/head 2025-09-07T09:36:19.7316682Z * [new branch] gh/laithsakka/259/orig -> origin/gh/laithsakka/259/orig 2025-09-07T09:36:19.7318781Z * [new branch] gh/laithsakka/260/base -> origin/gh/laithsakka/260/base 2025-09-07T09:36:19.7320321Z * [new branch] gh/laithsakka/260/head -> origin/gh/laithsakka/260/head 2025-09-07T09:36:19.7321831Z * [new branch] gh/laithsakka/260/orig -> origin/gh/laithsakka/260/orig 2025-09-07T09:36:19.7324034Z * [new branch] gh/laithsakka/261/base -> origin/gh/laithsakka/261/base 2025-09-07T09:36:19.7325854Z * [new branch] gh/laithsakka/261/head -> origin/gh/laithsakka/261/head 2025-09-07T09:36:19.7327423Z * [new branch] gh/laithsakka/261/orig -> origin/gh/laithsakka/261/orig 2025-09-07T09:36:19.7329937Z * [new branch] gh/laithsakka/262/base -> origin/gh/laithsakka/262/base 2025-09-07T09:36:19.7332129Z * [new branch] gh/laithsakka/262/head -> origin/gh/laithsakka/262/head 2025-09-07T09:36:19.7333512Z * [new branch] gh/laithsakka/262/orig -> origin/gh/laithsakka/262/orig 2025-09-07T09:36:19.7335943Z * [new branch] gh/laithsakka/263/base -> origin/gh/laithsakka/263/base 2025-09-07T09:36:19.7337440Z * [new branch] gh/laithsakka/263/head -> origin/gh/laithsakka/263/head 2025-09-07T09:36:19.7338941Z * [new branch] gh/laithsakka/263/orig -> origin/gh/laithsakka/263/orig 2025-09-07T09:36:19.7341248Z * [new branch] gh/laithsakka/264/base -> origin/gh/laithsakka/264/base 2025-09-07T09:36:19.7342973Z * [new branch] gh/laithsakka/264/head -> origin/gh/laithsakka/264/head 2025-09-07T09:36:19.7344424Z * [new branch] gh/laithsakka/264/orig -> origin/gh/laithsakka/264/orig 2025-09-07T09:36:19.7347070Z * [new branch] gh/laithsakka/265/base -> origin/gh/laithsakka/265/base 2025-09-07T09:36:19.7348546Z * [new branch] gh/laithsakka/265/head -> origin/gh/laithsakka/265/head 2025-09-07T09:36:19.7350052Z * [new branch] gh/laithsakka/265/orig -> origin/gh/laithsakka/265/orig 2025-09-07T09:36:19.7352206Z * [new branch] gh/laithsakka/266/base -> origin/gh/laithsakka/266/base 2025-09-07T09:36:19.7353888Z * [new branch] gh/laithsakka/266/head -> origin/gh/laithsakka/266/head 2025-09-07T09:36:19.7355578Z * [new branch] gh/laithsakka/266/orig -> origin/gh/laithsakka/266/orig 2025-09-07T09:36:19.7357815Z * [new branch] gh/laithsakka/267/base -> origin/gh/laithsakka/267/base 2025-09-07T09:36:19.7359299Z * [new branch] gh/laithsakka/267/head -> origin/gh/laithsakka/267/head 2025-09-07T09:36:19.7360837Z * [new branch] gh/laithsakka/267/orig -> origin/gh/laithsakka/267/orig 2025-09-07T09:36:19.7363135Z * [new branch] gh/laithsakka/268/base -> origin/gh/laithsakka/268/base 2025-09-07T09:36:19.7364631Z * [new branch] gh/laithsakka/268/head -> origin/gh/laithsakka/268/head 2025-09-07T09:36:19.7366525Z * [new branch] gh/laithsakka/268/orig -> origin/gh/laithsakka/268/orig 2025-09-07T09:36:19.7368809Z * [new branch] gh/laithsakka/28/base -> origin/gh/laithsakka/28/base 2025-09-07T09:36:19.7370902Z * [new branch] gh/laithsakka/29/base -> origin/gh/laithsakka/29/base 2025-09-07T09:36:19.7372982Z * [new branch] gh/laithsakka/30/base -> origin/gh/laithsakka/30/base 2025-09-07T09:36:19.7374495Z * [new branch] gh/laithsakka/30/head -> origin/gh/laithsakka/30/head 2025-09-07T09:36:19.7376866Z * [new branch] gh/laithsakka/31/base -> origin/gh/laithsakka/31/base 2025-09-07T09:36:19.7378320Z * [new branch] gh/laithsakka/31/head -> origin/gh/laithsakka/31/head 2025-09-07T09:36:19.7380482Z * [new branch] gh/laithsakka/32/base -> origin/gh/laithsakka/32/base 2025-09-07T09:36:19.7382119Z * [new branch] gh/laithsakka/32/head -> origin/gh/laithsakka/32/head 2025-09-07T09:36:19.7386498Z * [new branch] gh/lucaskabela/1/base -> origin/gh/lucaskabela/1/base 2025-09-07T09:36:19.7388079Z * [new branch] gh/lucaskabela/1/head -> origin/gh/lucaskabela/1/head 2025-09-07T09:36:19.7390354Z * [new branch] gh/lucaskabela/10/base -> origin/gh/lucaskabela/10/base 2025-09-07T09:36:19.7391862Z * [new branch] gh/lucaskabela/10/head -> origin/gh/lucaskabela/10/head 2025-09-07T09:36:19.7393462Z * [new branch] gh/lucaskabela/10/orig -> origin/gh/lucaskabela/10/orig 2025-09-07T09:36:19.7395672Z * [new branch] gh/lucaskabela/11/base -> origin/gh/lucaskabela/11/base 2025-09-07T09:36:19.7397740Z * [new branch] gh/lucaskabela/11/head -> origin/gh/lucaskabela/11/head 2025-09-07T09:36:19.7399073Z * [new branch] gh/lucaskabela/11/orig -> origin/gh/lucaskabela/11/orig 2025-09-07T09:36:19.7401108Z * [new branch] gh/lucaskabela/12/base -> origin/gh/lucaskabela/12/base 2025-09-07T09:36:19.7402655Z * [new branch] gh/lucaskabela/12/head -> origin/gh/lucaskabela/12/head 2025-09-07T09:36:19.7404140Z * [new branch] gh/lucaskabela/12/orig -> origin/gh/lucaskabela/12/orig 2025-09-07T09:36:19.7406512Z * [new branch] gh/lucaskabela/13/base -> origin/gh/lucaskabela/13/base 2025-09-07T09:36:19.7407998Z * [new branch] gh/lucaskabela/13/head -> origin/gh/lucaskabela/13/head 2025-09-07T09:36:19.7409539Z * [new branch] gh/lucaskabela/13/orig -> origin/gh/lucaskabela/13/orig 2025-09-07T09:36:19.7411547Z * [new branch] gh/lucaskabela/14/base -> origin/gh/lucaskabela/14/base 2025-09-07T09:36:19.7413085Z * [new branch] gh/lucaskabela/14/head -> origin/gh/lucaskabela/14/head 2025-09-07T09:36:19.7414757Z * [new branch] gh/lucaskabela/14/orig -> origin/gh/lucaskabela/14/orig 2025-09-07T09:36:19.7417191Z * [new branch] gh/lucaskabela/15/base -> origin/gh/lucaskabela/15/base 2025-09-07T09:36:19.7418648Z * [new branch] gh/lucaskabela/15/head -> origin/gh/lucaskabela/15/head 2025-09-07T09:36:19.7420207Z * [new branch] gh/lucaskabela/15/orig -> origin/gh/lucaskabela/15/orig 2025-09-07T09:36:19.7422380Z * [new branch] gh/lucaskabela/16/base -> origin/gh/lucaskabela/16/base 2025-09-07T09:36:19.7424008Z * [new branch] gh/lucaskabela/16/head -> origin/gh/lucaskabela/16/head 2025-09-07T09:36:19.7425762Z * [new branch] gh/lucaskabela/16/orig -> origin/gh/lucaskabela/16/orig 2025-09-07T09:36:19.7427821Z * [new branch] gh/lucaskabela/17/base -> origin/gh/lucaskabela/17/base 2025-09-07T09:36:19.7429440Z * [new branch] gh/lucaskabela/17/head -> origin/gh/lucaskabela/17/head 2025-09-07T09:36:19.7430905Z * [new branch] gh/lucaskabela/17/orig -> origin/gh/lucaskabela/17/orig 2025-09-07T09:36:19.7433196Z * [new branch] gh/lucaskabela/2/base -> origin/gh/lucaskabela/2/base 2025-09-07T09:36:19.7434792Z * [new branch] gh/lucaskabela/2/head -> origin/gh/lucaskabela/2/head 2025-09-07T09:36:19.7436657Z * [new branch] gh/lucaskabela/2/orig -> origin/gh/lucaskabela/2/orig 2025-09-07T09:36:19.7438861Z * [new branch] gh/lucaskabela/3/base -> origin/gh/lucaskabela/3/base 2025-09-07T09:36:19.7440476Z * [new branch] gh/lucaskabela/3/head -> origin/gh/lucaskabela/3/head 2025-09-07T09:36:19.7442040Z * [new branch] gh/lucaskabela/3/orig -> origin/gh/lucaskabela/3/orig 2025-09-07T09:36:19.7444098Z * [new branch] gh/lucaskabela/4/base -> origin/gh/lucaskabela/4/base 2025-09-07T09:36:19.7445969Z * [new branch] gh/lucaskabela/4/head -> origin/gh/lucaskabela/4/head 2025-09-07T09:36:19.7447460Z * [new branch] gh/lucaskabela/4/orig -> origin/gh/lucaskabela/4/orig 2025-09-07T09:36:19.7449669Z * [new branch] gh/lucaskabela/5/base -> origin/gh/lucaskabela/5/base 2025-09-07T09:36:19.7451276Z * [new branch] gh/lucaskabela/5/head -> origin/gh/lucaskabela/5/head 2025-09-07T09:36:19.7452951Z * [new branch] gh/lucaskabela/5/orig -> origin/gh/lucaskabela/5/orig 2025-09-07T09:36:19.7455212Z * [new branch] gh/lucaskabela/6/base -> origin/gh/lucaskabela/6/base 2025-09-07T09:36:19.7456752Z * [new branch] gh/lucaskabela/6/head -> origin/gh/lucaskabela/6/head 2025-09-07T09:36:19.7458267Z * [new branch] gh/lucaskabela/6/orig -> origin/gh/lucaskabela/6/orig 2025-09-07T09:36:19.7460715Z * [new branch] gh/lucaskabela/7/base -> origin/gh/lucaskabela/7/base 2025-09-07T09:36:19.7462161Z * [new branch] gh/lucaskabela/7/head -> origin/gh/lucaskabela/7/head 2025-09-07T09:36:19.7463696Z * [new branch] gh/lucaskabela/7/orig -> origin/gh/lucaskabela/7/orig 2025-09-07T09:36:19.7466304Z * [new branch] gh/lucaskabela/8/base -> origin/gh/lucaskabela/8/base 2025-09-07T09:36:19.7467792Z * [new branch] gh/lucaskabela/8/head -> origin/gh/lucaskabela/8/head 2025-09-07T09:36:19.7469356Z * [new branch] gh/lucaskabela/8/orig -> origin/gh/lucaskabela/8/orig 2025-09-07T09:36:19.7471819Z * [new branch] gh/lucaskabela/9/base -> origin/gh/lucaskabela/9/base 2025-09-07T09:36:19.7473141Z * [new branch] gh/lucaskabela/9/head -> origin/gh/lucaskabela/9/head 2025-09-07T09:36:19.7474733Z * [new branch] gh/lucaskabela/9/orig -> origin/gh/lucaskabela/9/orig 2025-09-07T09:36:19.7477660Z * [new branch] gh/lw/3/base -> origin/gh/lw/3/base 2025-09-07T09:36:19.7479012Z * [new branch] gh/lw/3/head -> origin/gh/lw/3/head 2025-09-07T09:36:19.7480541Z * [new branch] gh/lw/3/orig -> origin/gh/lw/3/orig 2025-09-07T09:36:19.7483271Z * [new branch] gh/malfet/14/base -> origin/gh/malfet/14/base 2025-09-07T09:36:19.7485826Z * [new branch] gh/malfet/330/base -> origin/gh/malfet/330/base 2025-09-07T09:36:19.7487364Z * [new branch] gh/malfet/330/head -> origin/gh/malfet/330/head 2025-09-07T09:36:19.7489049Z * [new branch] gh/malfet/330/orig -> origin/gh/malfet/330/orig 2025-09-07T09:36:19.7491204Z * [new branch] gh/malfet/396/base -> origin/gh/malfet/396/base 2025-09-07T09:36:19.7492726Z * [new branch] gh/malfet/396/head -> origin/gh/malfet/396/head 2025-09-07T09:36:19.7494325Z * [new branch] gh/malfet/396/orig -> origin/gh/malfet/396/orig 2025-09-07T09:36:19.7496764Z * [new branch] gh/malfet/397/base -> origin/gh/malfet/397/base 2025-09-07T09:36:19.7500901Z * [new branch] gh/malfet/397/head -> origin/gh/malfet/397/head 2025-09-07T09:36:19.7502596Z * [new branch] gh/malfet/397/orig -> origin/gh/malfet/397/orig 2025-09-07T09:36:19.7504706Z * [new branch] gh/malfet/398/base -> origin/gh/malfet/398/base 2025-09-07T09:36:19.7506447Z * [new branch] gh/malfet/398/head -> origin/gh/malfet/398/head 2025-09-07T09:36:19.7510214Z * [new branch] gh/malfet/398/orig -> origin/gh/malfet/398/orig 2025-09-07T09:36:19.7511088Z * [new branch] gh/malfet/399/base -> origin/gh/malfet/399/base 2025-09-07T09:36:19.7512038Z * [new branch] gh/malfet/399/head -> origin/gh/malfet/399/head 2025-09-07T09:36:19.7513434Z * [new branch] gh/malfet/399/orig -> origin/gh/malfet/399/orig 2025-09-07T09:36:19.7515919Z * [new branch] gh/malfet/414/base -> origin/gh/malfet/414/base 2025-09-07T09:36:19.7517486Z * [new branch] gh/malfet/414/head -> origin/gh/malfet/414/head 2025-09-07T09:36:19.7519040Z * [new branch] gh/malfet/414/orig -> origin/gh/malfet/414/orig 2025-09-07T09:36:19.7521186Z * [new branch] gh/malfet/417/base -> origin/gh/malfet/417/base 2025-09-07T09:36:19.7522709Z * [new branch] gh/malfet/417/head -> origin/gh/malfet/417/head 2025-09-07T09:36:19.7524268Z * [new branch] gh/malfet/417/orig -> origin/gh/malfet/417/orig 2025-09-07T09:36:19.7526683Z * [new branch] gh/malfet/418/base -> origin/gh/malfet/418/base 2025-09-07T09:36:19.7528369Z * [new branch] gh/malfet/418/head -> origin/gh/malfet/418/head 2025-09-07T09:36:19.7529718Z * [new branch] gh/malfet/418/orig -> origin/gh/malfet/418/orig 2025-09-07T09:36:19.7531858Z * [new branch] gh/malfet/475/base -> origin/gh/malfet/475/base 2025-09-07T09:36:19.7533521Z * [new branch] gh/malfet/475/head -> origin/gh/malfet/475/head 2025-09-07T09:36:19.7535078Z * [new branch] gh/malfet/475/orig -> origin/gh/malfet/475/orig 2025-09-07T09:36:19.7537540Z * [new branch] gh/malfet/476/base -> origin/gh/malfet/476/base 2025-09-07T09:36:19.7539056Z * [new branch] gh/malfet/476/head -> origin/gh/malfet/476/head 2025-09-07T09:36:19.7540552Z * [new branch] gh/malfet/476/orig -> origin/gh/malfet/476/orig 2025-09-07T09:36:19.7542758Z * [new branch] gh/malfet/477/base -> origin/gh/malfet/477/base 2025-09-07T09:36:19.7544369Z * [new branch] gh/malfet/477/head -> origin/gh/malfet/477/head 2025-09-07T09:36:19.7546245Z * [new branch] gh/malfet/477/orig -> origin/gh/malfet/477/orig 2025-09-07T09:36:19.7548315Z * [new branch] gh/malfet/478/base -> origin/gh/malfet/478/base 2025-09-07T09:36:19.7549819Z * [new branch] gh/malfet/478/head -> origin/gh/malfet/478/head 2025-09-07T09:36:19.7551393Z * [new branch] gh/malfet/478/orig -> origin/gh/malfet/478/orig 2025-09-07T09:36:19.7553427Z * [new branch] gh/malfet/479/base -> origin/gh/malfet/479/base 2025-09-07T09:36:19.7555236Z * [new branch] gh/malfet/479/head -> origin/gh/malfet/479/head 2025-09-07T09:36:19.7556931Z * [new branch] gh/malfet/479/orig -> origin/gh/malfet/479/orig 2025-09-07T09:36:19.7559138Z * [new branch] gh/malfet/480/base -> origin/gh/malfet/480/base 2025-09-07T09:36:19.7560672Z * [new branch] gh/malfet/480/head -> origin/gh/malfet/480/head 2025-09-07T09:36:19.7562319Z * [new branch] gh/malfet/480/orig -> origin/gh/malfet/480/orig 2025-09-07T09:36:19.7564504Z * [new branch] gh/malfet/481/base -> origin/gh/malfet/481/base 2025-09-07T09:36:19.7566295Z * [new branch] gh/malfet/481/head -> origin/gh/malfet/481/head 2025-09-07T09:36:19.7567834Z * [new branch] gh/malfet/481/orig -> origin/gh/malfet/481/orig 2025-09-07T09:36:19.7570020Z * [new branch] gh/malfet/482/base -> origin/gh/malfet/482/base 2025-09-07T09:36:19.7571591Z * [new branch] gh/malfet/482/head -> origin/gh/malfet/482/head 2025-09-07T09:36:19.7573120Z * [new branch] gh/malfet/482/orig -> origin/gh/malfet/482/orig 2025-09-07T09:36:19.7575364Z * [new branch] gh/malfet/483/base -> origin/gh/malfet/483/base 2025-09-07T09:36:19.7577077Z * [new branch] gh/malfet/483/head -> origin/gh/malfet/483/head 2025-09-07T09:36:19.7578532Z * [new branch] gh/malfet/483/orig -> origin/gh/malfet/483/orig 2025-09-07T09:36:19.7590536Z * [new branch] gh/malfet/484/base -> origin/gh/malfet/484/base 2025-09-07T09:36:19.7590988Z * [new branch] gh/malfet/484/head -> origin/gh/malfet/484/head 2025-09-07T09:36:19.7591400Z * [new branch] gh/malfet/484/orig -> origin/gh/malfet/484/orig 2025-09-07T09:36:19.7591792Z * [new branch] gh/malfet/485/base -> origin/gh/malfet/485/base 2025-09-07T09:36:19.7592178Z * [new branch] gh/malfet/485/head -> origin/gh/malfet/485/head 2025-09-07T09:36:19.7592562Z * [new branch] gh/malfet/485/orig -> origin/gh/malfet/485/orig 2025-09-07T09:36:19.7592959Z * [new branch] gh/malfet/486/base -> origin/gh/malfet/486/base 2025-09-07T09:36:19.7593942Z * [new branch] gh/malfet/486/head -> origin/gh/malfet/486/head 2025-09-07T09:36:19.7595550Z * [new branch] gh/malfet/486/orig -> origin/gh/malfet/486/orig 2025-09-07T09:36:19.7597764Z * [new branch] gh/malfet/487/base -> origin/gh/malfet/487/base 2025-09-07T09:36:19.7599266Z * [new branch] gh/malfet/487/head -> origin/gh/malfet/487/head 2025-09-07T09:36:19.7600752Z * [new branch] gh/malfet/487/orig -> origin/gh/malfet/487/orig 2025-09-07T09:36:19.7602938Z * [new branch] gh/malfet/488/base -> origin/gh/malfet/488/base 2025-09-07T09:36:19.7604558Z * [new branch] gh/malfet/488/head -> origin/gh/malfet/488/head 2025-09-07T09:36:19.7606393Z * [new branch] gh/malfet/488/orig -> origin/gh/malfet/488/orig 2025-09-07T09:36:19.7608634Z * [new branch] gh/malfet/489/base -> origin/gh/malfet/489/base 2025-09-07T09:36:19.7610219Z * [new branch] gh/malfet/489/head -> origin/gh/malfet/489/head 2025-09-07T09:36:19.7611903Z * [new branch] gh/malfet/489/orig -> origin/gh/malfet/489/orig 2025-09-07T09:36:19.7613959Z * [new branch] gh/malfet/490/base -> origin/gh/malfet/490/base 2025-09-07T09:36:19.7615762Z * [new branch] gh/malfet/490/head -> origin/gh/malfet/490/head 2025-09-07T09:36:19.7617373Z * [new branch] gh/malfet/490/orig -> origin/gh/malfet/490/orig 2025-09-07T09:36:19.7619721Z * [new branch] gh/malfet/491/base -> origin/gh/malfet/491/base 2025-09-07T09:36:19.7621313Z * [new branch] gh/malfet/491/head -> origin/gh/malfet/491/head 2025-09-07T09:36:19.7623029Z * [new branch] gh/malfet/491/orig -> origin/gh/malfet/491/orig 2025-09-07T09:36:19.7625268Z * [new branch] gh/malfet/492/base -> origin/gh/malfet/492/base 2025-09-07T09:36:19.7627055Z * [new branch] gh/malfet/492/head -> origin/gh/malfet/492/head 2025-09-07T09:36:19.7628566Z * [new branch] gh/malfet/492/orig -> origin/gh/malfet/492/orig 2025-09-07T09:36:19.7630792Z * [new branch] gh/malfet/493/base -> origin/gh/malfet/493/base 2025-09-07T09:36:19.7632240Z * [new branch] gh/malfet/493/head -> origin/gh/malfet/493/head 2025-09-07T09:36:19.7633818Z * [new branch] gh/malfet/493/orig -> origin/gh/malfet/493/orig 2025-09-07T09:36:19.7636282Z * [new branch] gh/malfet/494/base -> origin/gh/malfet/494/base 2025-09-07T09:36:19.7637858Z * [new branch] gh/malfet/494/head -> origin/gh/malfet/494/head 2025-09-07T09:36:19.7639400Z * [new branch] gh/malfet/494/orig -> origin/gh/malfet/494/orig 2025-09-07T09:36:19.7641510Z * [new branch] gh/malfet/495/base -> origin/gh/malfet/495/base 2025-09-07T09:36:19.7643131Z * [new branch] gh/malfet/495/head -> origin/gh/malfet/495/head 2025-09-07T09:36:19.7644590Z * [new branch] gh/malfet/495/orig -> origin/gh/malfet/495/orig 2025-09-07T09:36:19.7647101Z * [new branch] gh/malfet/496/base -> origin/gh/malfet/496/base 2025-09-07T09:36:19.7648653Z * [new branch] gh/malfet/496/head -> origin/gh/malfet/496/head 2025-09-07T09:36:19.7650145Z * [new branch] gh/malfet/496/orig -> origin/gh/malfet/496/orig 2025-09-07T09:36:19.7652386Z * [new branch] gh/malfet/497/base -> origin/gh/malfet/497/base 2025-09-07T09:36:19.7653939Z * [new branch] gh/malfet/497/head -> origin/gh/malfet/497/head 2025-09-07T09:36:19.7655803Z * [new branch] gh/malfet/497/orig -> origin/gh/malfet/497/orig 2025-09-07T09:36:19.7658042Z * [new branch] gh/malfet/498/base -> origin/gh/malfet/498/base 2025-09-07T09:36:19.7659755Z * [new branch] gh/malfet/498/head -> origin/gh/malfet/498/head 2025-09-07T09:36:19.7661157Z * [new branch] gh/malfet/498/orig -> origin/gh/malfet/498/orig 2025-09-07T09:36:19.7663442Z * [new branch] gh/malfet/499/base -> origin/gh/malfet/499/base 2025-09-07T09:36:19.7664920Z * [new branch] gh/malfet/499/head -> origin/gh/malfet/499/head 2025-09-07T09:36:19.7666774Z * [new branch] gh/malfet/499/orig -> origin/gh/malfet/499/orig 2025-09-07T09:36:19.7668903Z * [new branch] gh/malfet/500/base -> origin/gh/malfet/500/base 2025-09-07T09:36:19.7670426Z * [new branch] gh/malfet/500/head -> origin/gh/malfet/500/head 2025-09-07T09:36:19.7672027Z * [new branch] gh/malfet/500/orig -> origin/gh/malfet/500/orig 2025-09-07T09:36:19.7674543Z * [new branch] gh/malfet/501/base -> origin/gh/malfet/501/base 2025-09-07T09:36:19.7676417Z * [new branch] gh/malfet/501/head -> origin/gh/malfet/501/head 2025-09-07T09:36:19.7677935Z * [new branch] gh/malfet/501/orig -> origin/gh/malfet/501/orig 2025-09-07T09:36:19.7680116Z * [new branch] gh/malfet/502/base -> origin/gh/malfet/502/base 2025-09-07T09:36:19.7681664Z * [new branch] gh/malfet/502/head -> origin/gh/malfet/502/head 2025-09-07T09:36:19.7683161Z * [new branch] gh/malfet/502/orig -> origin/gh/malfet/502/orig 2025-09-07T09:36:19.7685728Z * [new branch] gh/malfet/503/base -> origin/gh/malfet/503/base 2025-09-07T09:36:19.7687164Z * [new branch] gh/malfet/503/head -> origin/gh/malfet/503/head 2025-09-07T09:36:19.7688707Z * [new branch] gh/malfet/503/orig -> origin/gh/malfet/503/orig 2025-09-07T09:36:19.7690861Z * [new branch] gh/malfet/504/base -> origin/gh/malfet/504/base 2025-09-07T09:36:19.7692482Z * [new branch] gh/malfet/504/head -> origin/gh/malfet/504/head 2025-09-07T09:36:19.7693988Z * [new branch] gh/malfet/504/orig -> origin/gh/malfet/504/orig 2025-09-07T09:36:19.7696597Z * [new branch] gh/malfet/505/base -> origin/gh/malfet/505/base 2025-09-07T09:36:19.7698149Z * [new branch] gh/malfet/505/head -> origin/gh/malfet/505/head 2025-09-07T09:36:19.7699691Z * [new branch] gh/malfet/505/orig -> origin/gh/malfet/505/orig 2025-09-07T09:36:19.7702078Z * [new branch] gh/malfet/506/base -> origin/gh/malfet/506/base 2025-09-07T09:36:19.7703586Z * [new branch] gh/malfet/506/head -> origin/gh/malfet/506/head 2025-09-07T09:36:19.7705274Z * [new branch] gh/malfet/506/orig -> origin/gh/malfet/506/orig 2025-09-07T09:36:19.7707765Z * [new branch] gh/malfet/507/base -> origin/gh/malfet/507/base 2025-09-07T09:36:19.7709262Z * [new branch] gh/malfet/507/head -> origin/gh/malfet/507/head 2025-09-07T09:36:19.7710882Z * [new branch] gh/malfet/507/orig -> origin/gh/malfet/507/orig 2025-09-07T09:36:19.7713201Z * [new branch] gh/malfet/508/base -> origin/gh/malfet/508/base 2025-09-07T09:36:19.7714778Z * [new branch] gh/malfet/508/head -> origin/gh/malfet/508/head 2025-09-07T09:36:19.7716685Z * [new branch] gh/malfet/508/orig -> origin/gh/malfet/508/orig 2025-09-07T09:36:19.7718687Z * [new branch] gh/malfet/509/base -> origin/gh/malfet/509/base 2025-09-07T09:36:19.7720228Z * [new branch] gh/malfet/509/head -> origin/gh/malfet/509/head 2025-09-07T09:36:19.7721824Z * [new branch] gh/malfet/509/orig -> origin/gh/malfet/509/orig 2025-09-07T09:36:19.7724293Z * [new branch] gh/malfet/510/base -> origin/gh/malfet/510/base 2025-09-07T09:36:19.7725942Z * [new branch] gh/malfet/510/head -> origin/gh/malfet/510/head 2025-09-07T09:36:19.7727538Z * [new branch] gh/malfet/510/orig -> origin/gh/malfet/510/orig 2025-09-07T09:36:19.7729774Z * [new branch] gh/malfet/511/base -> origin/gh/malfet/511/base 2025-09-07T09:36:19.7731310Z * [new branch] gh/malfet/511/head -> origin/gh/malfet/511/head 2025-09-07T09:36:19.7732808Z * [new branch] gh/malfet/511/orig -> origin/gh/malfet/511/orig 2025-09-07T09:36:19.7735283Z * [new branch] gh/malfet/512/base -> origin/gh/malfet/512/base 2025-09-07T09:36:19.7736921Z * [new branch] gh/malfet/512/head -> origin/gh/malfet/512/head 2025-09-07T09:36:19.7738395Z * [new branch] gh/malfet/512/orig -> origin/gh/malfet/512/orig 2025-09-07T09:36:19.7740589Z * [new branch] gh/malfet/513/base -> origin/gh/malfet/513/base 2025-09-07T09:36:19.7742232Z * [new branch] gh/malfet/513/head -> origin/gh/malfet/513/head 2025-09-07T09:36:19.7743692Z * [new branch] gh/malfet/513/orig -> origin/gh/malfet/513/orig 2025-09-07T09:36:19.7746184Z * [new branch] gh/malfet/64/base -> origin/gh/malfet/64/base 2025-09-07T09:36:19.7747844Z * [new branch] gh/malfet/64/head -> origin/gh/malfet/64/head 2025-09-07T09:36:19.7750585Z * [new branch] gh/manuelcandales/10/base -> origin/gh/manuelcandales/10/base 2025-09-07T09:36:19.7752160Z * [new branch] gh/manuelcandales/10/head -> origin/gh/manuelcandales/10/head 2025-09-07T09:36:19.7753682Z * [new branch] gh/manuelcandales/10/orig -> origin/gh/manuelcandales/10/orig 2025-09-07T09:36:19.7756215Z * [new branch] gh/manuelcandales/11/base -> origin/gh/manuelcandales/11/base 2025-09-07T09:36:19.7757736Z * [new branch] gh/manuelcandales/11/head -> origin/gh/manuelcandales/11/head 2025-09-07T09:36:19.7759265Z * [new branch] gh/manuelcandales/11/orig -> origin/gh/manuelcandales/11/orig 2025-09-07T09:36:19.7761641Z * [new branch] gh/manuelcandales/9/base -> origin/gh/manuelcandales/9/base 2025-09-07T09:36:19.7763048Z * [new branch] gh/manuelcandales/9/head -> origin/gh/manuelcandales/9/head 2025-09-07T09:36:19.7764650Z * [new branch] gh/manuelcandales/9/orig -> origin/gh/manuelcandales/9/orig 2025-09-07T09:36:19.7768484Z * [new branch] gh/markkm/1/base -> origin/gh/markkm/1/base 2025-09-07T09:36:19.7771387Z * [new branch] gh/masnesral/204/base -> origin/gh/masnesral/204/base 2025-09-07T09:36:19.7772952Z * [new branch] gh/masnesral/204/head -> origin/gh/masnesral/204/head 2025-09-07T09:36:19.7774587Z * [new branch] gh/masnesral/204/orig -> origin/gh/masnesral/204/orig 2025-09-07T09:36:19.7777065Z * [new branch] gh/masnesral/235/base -> origin/gh/masnesral/235/base 2025-09-07T09:36:19.7778726Z * [new branch] gh/masnesral/235/head -> origin/gh/masnesral/235/head 2025-09-07T09:36:19.7780808Z * [new branch] gh/masnesral/235/orig -> origin/gh/masnesral/235/orig 2025-09-07T09:36:19.7782669Z * [new branch] gh/masnesral/34/base -> origin/gh/masnesral/34/base 2025-09-07T09:36:19.7785556Z * [new branch] gh/mhorowitz/0/base -> origin/gh/mhorowitz/0/base 2025-09-07T09:36:19.7787247Z * [new branch] gh/mhorowitz/0/head -> origin/gh/mhorowitz/0/head 2025-09-07T09:36:19.7789288Z * [new branch] gh/mhorowitz/1/base -> origin/gh/mhorowitz/1/base 2025-09-07T09:36:19.7790871Z * [new branch] gh/mhorowitz/1/head -> origin/gh/mhorowitz/1/head 2025-09-07T09:36:19.7793014Z * [new branch] gh/mhorowitz/2/base -> origin/gh/mhorowitz/2/base 2025-09-07T09:36:19.7794505Z * [new branch] gh/mhorowitz/2/head -> origin/gh/mhorowitz/2/head 2025-09-07T09:36:19.7796863Z * [new branch] gh/mhorowitz/3/base -> origin/gh/mhorowitz/3/base 2025-09-07T09:36:19.7798267Z * [new branch] gh/mhorowitz/3/head -> origin/gh/mhorowitz/3/head 2025-09-07T09:36:19.7800365Z * [new branch] gh/mhorowitz/4/base -> origin/gh/mhorowitz/4/base 2025-09-07T09:36:19.7801955Z * [new branch] gh/mhorowitz/4/head -> origin/gh/mhorowitz/4/head 2025-09-07T09:36:19.7803911Z * [new branch] gh/mhorowitz/5/base -> origin/gh/mhorowitz/5/base 2025-09-07T09:36:19.7805715Z * [new branch] gh/mhorowitz/5/head -> origin/gh/mhorowitz/5/head 2025-09-07T09:36:19.7807776Z * [new branch] gh/mhorowitz/6/base -> origin/gh/mhorowitz/6/base 2025-09-07T09:36:19.7809280Z * [new branch] gh/mhorowitz/6/head -> origin/gh/mhorowitz/6/head 2025-09-07T09:36:19.7812105Z * [new branch] gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base 2025-09-07T09:36:19.7813607Z * [new branch] gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head 2025-09-07T09:36:19.7816125Z * [new branch] gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base 2025-09-07T09:36:19.7817685Z * [new branch] gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head 2025-09-07T09:36:19.7819785Z * [new branch] gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base 2025-09-07T09:36:19.7821246Z * [new branch] gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head 2025-09-07T09:36:19.7823513Z * [new branch] gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base 2025-09-07T09:36:19.7825108Z * [new branch] gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head 2025-09-07T09:36:19.7829495Z * [new branch] gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base 2025-09-07T09:36:19.7830999Z * [new branch] gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head 2025-09-07T09:36:19.7833175Z * [new branch] gh/mikaylagawarecki/317/base -> origin/gh/mikaylagawarecki/317/base 2025-09-07T09:36:19.7834747Z * [new branch] gh/mikaylagawarecki/317/head -> origin/gh/mikaylagawarecki/317/head 2025-09-07T09:36:19.7836693Z * [new branch] gh/mikaylagawarecki/317/orig -> origin/gh/mikaylagawarecki/317/orig 2025-09-07T09:36:19.7838858Z * [new branch] gh/mikaylagawarecki/320/base -> origin/gh/mikaylagawarecki/320/base 2025-09-07T09:36:19.7840430Z * [new branch] gh/mikaylagawarecki/320/head -> origin/gh/mikaylagawarecki/320/head 2025-09-07T09:36:19.7841999Z * [new branch] gh/mikaylagawarecki/320/orig -> origin/gh/mikaylagawarecki/320/orig 2025-09-07T09:36:19.7844208Z * [new branch] gh/mikaylagawarecki/329/base -> origin/gh/mikaylagawarecki/329/base 2025-09-07T09:36:19.7846046Z * [new branch] gh/mikaylagawarecki/329/head -> origin/gh/mikaylagawarecki/329/head 2025-09-07T09:36:19.7847647Z * [new branch] gh/mikaylagawarecki/329/orig -> origin/gh/mikaylagawarecki/329/orig 2025-09-07T09:36:19.7849867Z * [new branch] gh/mikaylagawarecki/330/base -> origin/gh/mikaylagawarecki/330/base 2025-09-07T09:36:19.7851339Z * [new branch] gh/mikaylagawarecki/330/head -> origin/gh/mikaylagawarecki/330/head 2025-09-07T09:36:19.7852962Z * [new branch] gh/mikaylagawarecki/330/orig -> origin/gh/mikaylagawarecki/330/orig 2025-09-07T09:36:19.7855329Z * [new branch] gh/mikaylagawarecki/331/base -> origin/gh/mikaylagawarecki/331/base 2025-09-07T09:36:19.7857225Z * [new branch] gh/mikaylagawarecki/331/head -> origin/gh/mikaylagawarecki/331/head 2025-09-07T09:36:19.7858549Z * [new branch] gh/mikaylagawarecki/331/orig -> origin/gh/mikaylagawarecki/331/orig 2025-09-07T09:36:19.7860878Z * [new branch] gh/mikaylagawarecki/332/base -> origin/gh/mikaylagawarecki/332/base 2025-09-07T09:36:19.7862638Z * [new branch] gh/mikaylagawarecki/332/head -> origin/gh/mikaylagawarecki/332/head 2025-09-07T09:36:19.7864154Z * [new branch] gh/mikaylagawarecki/332/orig -> origin/gh/mikaylagawarecki/332/orig 2025-09-07T09:36:19.7866921Z * [new branch] gh/mikaylagawarecki/334/base -> origin/gh/mikaylagawarecki/334/base 2025-09-07T09:36:19.7868369Z * [new branch] gh/mikaylagawarecki/334/head -> origin/gh/mikaylagawarecki/334/head 2025-09-07T09:36:19.7869927Z * [new branch] gh/mikaylagawarecki/334/orig -> origin/gh/mikaylagawarecki/334/orig 2025-09-07T09:36:19.7872123Z * [new branch] gh/mikaylagawarecki/335/base -> origin/gh/mikaylagawarecki/335/base 2025-09-07T09:36:19.7873772Z * [new branch] gh/mikaylagawarecki/335/head -> origin/gh/mikaylagawarecki/335/head 2025-09-07T09:36:19.7875549Z * [new branch] gh/mikaylagawarecki/335/orig -> origin/gh/mikaylagawarecki/335/orig 2025-09-07T09:36:19.7877795Z * [new branch] gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base 2025-09-07T09:36:19.7879329Z * [new branch] gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head 2025-09-07T09:36:19.7880902Z * [new branch] gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig 2025-09-07T09:36:19.7882989Z * [new branch] gh/mikaylagawarecki/337/base -> origin/gh/mikaylagawarecki/337/base 2025-09-07T09:36:19.7884517Z * [new branch] gh/mikaylagawarecki/337/head -> origin/gh/mikaylagawarecki/337/head 2025-09-07T09:36:19.7886373Z * [new branch] gh/mikaylagawarecki/337/orig -> origin/gh/mikaylagawarecki/337/orig 2025-09-07T09:36:19.7888548Z * [new branch] gh/mikaylagawarecki/338/base -> origin/gh/mikaylagawarecki/338/base 2025-09-07T09:36:19.7890179Z * [new branch] gh/mikaylagawarecki/338/head -> origin/gh/mikaylagawarecki/338/head 2025-09-07T09:36:19.7891724Z * [new branch] gh/mikaylagawarecki/338/orig -> origin/gh/mikaylagawarecki/338/orig 2025-09-07T09:36:19.7893752Z * [new branch] gh/mikaylagawarecki/339/base -> origin/gh/mikaylagawarecki/339/base 2025-09-07T09:36:19.7895568Z * [new branch] gh/mikaylagawarecki/339/head -> origin/gh/mikaylagawarecki/339/head 2025-09-07T09:36:19.7897220Z * [new branch] gh/mikaylagawarecki/339/orig -> origin/gh/mikaylagawarecki/339/orig 2025-09-07T09:36:19.7899981Z * [new branch] gh/mlazos/1/base -> origin/gh/mlazos/1/base 2025-09-07T09:36:19.7901663Z * [new branch] gh/mlazos/1/head -> origin/gh/mlazos/1/head 2025-09-07T09:36:19.7903328Z * [new branch] gh/mlazos/1/orig -> origin/gh/mlazos/1/orig 2025-09-07T09:36:19.7905733Z * [new branch] gh/mlazos/12/base -> origin/gh/mlazos/12/base 2025-09-07T09:36:19.7907286Z * [new branch] gh/mlazos/12/head -> origin/gh/mlazos/12/head 2025-09-07T09:36:19.7908880Z * [new branch] gh/mlazos/12/orig -> origin/gh/mlazos/12/orig 2025-09-07T09:36:19.7911080Z * [new branch] gh/mlazos/13/base -> origin/gh/mlazos/13/base 2025-09-07T09:36:19.7912674Z * [new branch] gh/mlazos/13/head -> origin/gh/mlazos/13/head 2025-09-07T09:36:19.7914177Z * [new branch] gh/mlazos/13/orig -> origin/gh/mlazos/13/orig 2025-09-07T09:36:19.7916711Z * [new branch] gh/mlazos/14/base -> origin/gh/mlazos/14/base 2025-09-07T09:36:19.7918217Z * [new branch] gh/mlazos/14/head -> origin/gh/mlazos/14/head 2025-09-07T09:36:19.7920037Z * [new branch] gh/mlazos/14/orig -> origin/gh/mlazos/14/orig 2025-09-07T09:36:19.7922006Z * [new branch] gh/mlazos/15/base -> origin/gh/mlazos/15/base 2025-09-07T09:36:19.7923542Z * [new branch] gh/mlazos/15/head -> origin/gh/mlazos/15/head 2025-09-07T09:36:19.7925259Z * [new branch] gh/mlazos/15/orig -> origin/gh/mlazos/15/orig 2025-09-07T09:36:19.7928669Z * [new branch] gh/mlazos/16/base -> origin/gh/mlazos/16/base 2025-09-07T09:36:19.7929635Z * [new branch] gh/mlazos/16/head -> origin/gh/mlazos/16/head 2025-09-07T09:36:19.7931157Z * [new branch] gh/mlazos/16/orig -> origin/gh/mlazos/16/orig 2025-09-07T09:36:19.7933233Z * [new branch] gh/mlazos/17/base -> origin/gh/mlazos/17/base 2025-09-07T09:36:19.7934742Z * [new branch] gh/mlazos/17/head -> origin/gh/mlazos/17/head 2025-09-07T09:36:19.7936599Z * [new branch] gh/mlazos/17/orig -> origin/gh/mlazos/17/orig 2025-09-07T09:36:19.7938839Z * [new branch] gh/mlazos/2/base -> origin/gh/mlazos/2/base 2025-09-07T09:36:19.7940330Z * [new branch] gh/mlazos/2/head -> origin/gh/mlazos/2/head 2025-09-07T09:36:19.7941999Z * [new branch] gh/mlazos/2/orig -> origin/gh/mlazos/2/orig 2025-09-07T09:36:19.7944279Z * [new branch] gh/mlazos/3/base -> origin/gh/mlazos/3/base 2025-09-07T09:36:19.7946101Z * [new branch] gh/mlazos/3/head -> origin/gh/mlazos/3/head 2025-09-07T09:36:19.7947708Z * [new branch] gh/mlazos/3/orig -> origin/gh/mlazos/3/orig 2025-09-07T09:36:19.7950438Z * [new branch] gh/mrmiywj/1/base -> origin/gh/mrmiywj/1/base 2025-09-07T09:36:19.7952097Z * [new branch] gh/mrmiywj/1/head -> origin/gh/mrmiywj/1/head 2025-09-07T09:36:19.7955554Z * [new branch] gh/muchulee8/62/base -> origin/gh/muchulee8/62/base 2025-09-07T09:36:19.7957269Z * [new branch] gh/muchulee8/62/head -> origin/gh/muchulee8/62/head 2025-09-07T09:36:19.7958954Z * [new branch] gh/muchulee8/62/orig -> origin/gh/muchulee8/62/orig 2025-09-07T09:36:19.7961202Z * [new branch] gh/muchulee8/63/base -> origin/gh/muchulee8/63/base 2025-09-07T09:36:19.7962866Z * [new branch] gh/muchulee8/63/head -> origin/gh/muchulee8/63/head 2025-09-07T09:36:19.7964472Z * [new branch] gh/muchulee8/63/orig -> origin/gh/muchulee8/63/orig 2025-09-07T09:36:19.7967276Z * [new branch] gh/muchulee8/64/base -> origin/gh/muchulee8/64/base 2025-09-07T09:36:19.7968776Z * [new branch] gh/muchulee8/64/head -> origin/gh/muchulee8/64/head 2025-09-07T09:36:19.7970285Z * [new branch] gh/muchulee8/64/orig -> origin/gh/muchulee8/64/orig 2025-09-07T09:36:19.7972631Z * [new branch] gh/muchulee8/65/base -> origin/gh/muchulee8/65/base 2025-09-07T09:36:19.7974341Z * [new branch] gh/muchulee8/65/head -> origin/gh/muchulee8/65/head 2025-09-07T09:36:19.7976237Z * [new branch] gh/muchulee8/65/orig -> origin/gh/muchulee8/65/orig 2025-09-07T09:36:19.7979034Z * [new branch] gh/naveenthangudu/1/base -> origin/gh/naveenthangudu/1/base 2025-09-07T09:36:19.7980548Z * [new branch] gh/naveenthangudu/1/head -> origin/gh/naveenthangudu/1/head 2025-09-07T09:36:19.7982342Z * [new branch] gh/naveenthangudu/1/orig -> origin/gh/naveenthangudu/1/orig 2025-09-07T09:36:19.7984577Z * [new branch] gh/naveenthangudu/2/base -> origin/gh/naveenthangudu/2/base 2025-09-07T09:36:19.7986549Z * [new branch] gh/naveenthangudu/2/head -> origin/gh/naveenthangudu/2/head 2025-09-07T09:36:19.7988209Z * [new branch] gh/naveenthangudu/2/orig -> origin/gh/naveenthangudu/2/orig 2025-09-07T09:36:19.7990190Z * [new branch] gh/naveenthangudu/3/base -> origin/gh/naveenthangudu/3/base 2025-09-07T09:36:19.7991761Z * [new branch] gh/naveenthangudu/3/head -> origin/gh/naveenthangudu/3/head 2025-09-07T09:36:19.7993303Z * [new branch] gh/naveenthangudu/3/orig -> origin/gh/naveenthangudu/3/orig 2025-09-07T09:36:19.7995710Z * [new branch] gh/naveenthangudu/4/base -> origin/gh/naveenthangudu/4/base 2025-09-07T09:36:19.7997352Z * [new branch] gh/naveenthangudu/4/head -> origin/gh/naveenthangudu/4/head 2025-09-07T09:36:19.7998874Z * [new branch] gh/naveenthangudu/4/orig -> origin/gh/naveenthangudu/4/orig 2025-09-07T09:36:19.8001292Z * [new branch] gh/naveenthangudu/5/base -> origin/gh/naveenthangudu/5/base 2025-09-07T09:36:19.8002680Z * [new branch] gh/naveenthangudu/5/head -> origin/gh/naveenthangudu/5/head 2025-09-07T09:36:19.8004345Z * [new branch] gh/naveenthangudu/5/orig -> origin/gh/naveenthangudu/5/orig 2025-09-07T09:36:19.8006795Z * [new branch] gh/naveenthangudu/6/base -> origin/gh/naveenthangudu/6/base 2025-09-07T09:36:19.8008439Z * [new branch] gh/naveenthangudu/6/head -> origin/gh/naveenthangudu/6/head 2025-09-07T09:36:19.8010026Z * [new branch] gh/naveenthangudu/6/orig -> origin/gh/naveenthangudu/6/orig 2025-09-07T09:36:19.8012476Z * [new branch] gh/oulgen/35/base -> origin/gh/oulgen/35/base 2025-09-07T09:36:19.8014030Z * [new branch] gh/oulgen/35/head -> origin/gh/oulgen/35/head 2025-09-07T09:36:19.8015931Z * [new branch] gh/oulgen/35/orig -> origin/gh/oulgen/35/orig 2025-09-07T09:36:19.8018246Z * [new branch] gh/oulgen/48/base -> origin/gh/oulgen/48/base 2025-09-07T09:36:19.8019700Z * [new branch] gh/oulgen/48/head -> origin/gh/oulgen/48/head 2025-09-07T09:36:19.8021223Z * [new branch] gh/oulgen/48/orig -> origin/gh/oulgen/48/orig 2025-09-07T09:36:19.8023447Z * [new branch] gh/oulgen/49/base -> origin/gh/oulgen/49/base 2025-09-07T09:36:19.8025164Z * [new branch] gh/oulgen/49/head -> origin/gh/oulgen/49/head 2025-09-07T09:36:19.8026879Z * [new branch] gh/oulgen/49/orig -> origin/gh/oulgen/49/orig 2025-09-07T09:36:19.8029760Z * [new branch] gh/pearu/108/base -> origin/gh/pearu/108/base 2025-09-07T09:36:19.8031419Z * [new branch] gh/pearu/108/head -> origin/gh/pearu/108/head 2025-09-07T09:36:19.8033180Z * [new branch] gh/pearu/108/orig -> origin/gh/pearu/108/orig 2025-09-07T09:36:19.8035615Z * [new branch] gh/pearu/109/base -> origin/gh/pearu/109/base 2025-09-07T09:36:19.8037158Z * [new branch] gh/pearu/109/head -> origin/gh/pearu/109/head 2025-09-07T09:36:19.8038765Z * [new branch] gh/pearu/109/orig -> origin/gh/pearu/109/orig 2025-09-07T09:36:19.8040847Z * [new branch] gh/pearu/110/base -> origin/gh/pearu/110/base 2025-09-07T09:36:19.8042490Z * [new branch] gh/pearu/110/head -> origin/gh/pearu/110/head 2025-09-07T09:36:19.8044065Z * [new branch] gh/pearu/110/orig -> origin/gh/pearu/110/orig 2025-09-07T09:36:19.8046561Z * [new branch] gh/pearu/111/base -> origin/gh/pearu/111/base 2025-09-07T09:36:19.8048119Z * [new branch] gh/pearu/111/head -> origin/gh/pearu/111/head 2025-09-07T09:36:19.8049737Z * [new branch] gh/pearu/111/orig -> origin/gh/pearu/111/orig 2025-09-07T09:36:19.8051767Z * [new branch] gh/pearu/112/base -> origin/gh/pearu/112/base 2025-09-07T09:36:19.8053608Z * [new branch] gh/pearu/112/head -> origin/gh/pearu/112/head 2025-09-07T09:36:19.8055331Z * [new branch] gh/pearu/112/orig -> origin/gh/pearu/112/orig 2025-09-07T09:36:19.8057671Z * [new branch] gh/pearu/113/base -> origin/gh/pearu/113/base 2025-09-07T09:36:19.8059288Z * [new branch] gh/pearu/113/head -> origin/gh/pearu/113/head 2025-09-07T09:36:19.8060798Z * [new branch] gh/pearu/113/orig -> origin/gh/pearu/113/orig 2025-09-07T09:36:19.8063124Z * [new branch] gh/pearu/114/base -> origin/gh/pearu/114/base 2025-09-07T09:36:19.8064669Z * [new branch] gh/pearu/114/head -> origin/gh/pearu/114/head 2025-09-07T09:36:19.8066601Z * [new branch] gh/pearu/114/orig -> origin/gh/pearu/114/orig 2025-09-07T09:36:19.8068759Z * [new branch] gh/pearu/115/base -> origin/gh/pearu/115/base 2025-09-07T09:36:19.8070330Z * [new branch] gh/pearu/115/head -> origin/gh/pearu/115/head 2025-09-07T09:36:19.8071821Z * [new branch] gh/pearu/115/orig -> origin/gh/pearu/115/orig 2025-09-07T09:36:19.8074057Z * [new branch] gh/pearu/116/base -> origin/gh/pearu/116/base 2025-09-07T09:36:19.8075779Z * [new branch] gh/pearu/116/head -> origin/gh/pearu/116/head 2025-09-07T09:36:19.8077530Z * [new branch] gh/pearu/116/orig -> origin/gh/pearu/116/orig 2025-09-07T09:36:19.8079676Z * [new branch] gh/pearu/117/base -> origin/gh/pearu/117/base 2025-09-07T09:36:19.8081083Z * [new branch] gh/pearu/117/head -> origin/gh/pearu/117/head 2025-09-07T09:36:19.8082549Z * [new branch] gh/pearu/117/orig -> origin/gh/pearu/117/orig 2025-09-07T09:36:19.8085385Z * [new branch] gh/pearu/56/base -> origin/gh/pearu/56/base 2025-09-07T09:36:19.8087134Z * [new branch] gh/pearu/56/head -> origin/gh/pearu/56/head 2025-09-07T09:36:19.8088626Z * [new branch] gh/pearu/56/orig -> origin/gh/pearu/56/orig 2025-09-07T09:36:19.8091052Z * [new branch] gh/pearu/97/base -> origin/gh/pearu/97/base 2025-09-07T09:36:19.8092684Z * [new branch] gh/pearu/97/head -> origin/gh/pearu/97/head 2025-09-07T09:36:19.8094313Z * [new branch] gh/pearu/97/orig -> origin/gh/pearu/97/orig 2025-09-07T09:36:19.8097284Z * [new branch] gh/qqaatw/29/base -> origin/gh/qqaatw/29/base 2025-09-07T09:36:19.8098827Z * [new branch] gh/qqaatw/29/head -> origin/gh/qqaatw/29/head 2025-09-07T09:36:19.8100306Z * [new branch] gh/qqaatw/29/orig -> origin/gh/qqaatw/29/orig 2025-09-07T09:36:19.8102735Z * [new branch] gh/raymo/refresh-script -> origin/gh/raymo/refresh-script 2025-09-07T09:36:19.8105786Z * [new branch] gh/rec/141/base -> origin/gh/rec/141/base 2025-09-07T09:36:19.8107096Z * [new branch] gh/rec/141/head -> origin/gh/rec/141/head 2025-09-07T09:36:19.8109421Z * [new branch] gh/rec/153/base -> origin/gh/rec/153/base 2025-09-07T09:36:19.8110986Z * [new branch] gh/rec/153/head -> origin/gh/rec/153/head 2025-09-07T09:36:19.8112585Z * [new branch] gh/rec/153/orig -> origin/gh/rec/153/orig 2025-09-07T09:36:19.8114743Z * [new branch] gh/rec/154/base -> origin/gh/rec/154/base 2025-09-07T09:36:19.8116646Z * [new branch] gh/rec/154/head -> origin/gh/rec/154/head 2025-09-07T09:36:19.8118154Z * [new branch] gh/rec/154/orig -> origin/gh/rec/154/orig 2025-09-07T09:36:19.8120480Z * [new branch] gh/rec/156/base -> origin/gh/rec/156/base 2025-09-07T09:36:19.8121926Z * [new branch] gh/rec/156/head -> origin/gh/rec/156/head 2025-09-07T09:36:19.8123523Z * [new branch] gh/rec/156/orig -> origin/gh/rec/156/orig 2025-09-07T09:36:19.8125847Z * [new branch] gh/rec/160/base -> origin/gh/rec/160/base 2025-09-07T09:36:19.8127368Z * [new branch] gh/rec/160/head -> origin/gh/rec/160/head 2025-09-07T09:36:19.8129000Z * [new branch] gh/rec/160/orig -> origin/gh/rec/160/orig 2025-09-07T09:36:19.8131333Z * [new branch] gh/rec/162/base -> origin/gh/rec/162/base 2025-09-07T09:36:19.8132846Z * [new branch] gh/rec/162/head -> origin/gh/rec/162/head 2025-09-07T09:36:19.8134305Z * [new branch] gh/rec/162/orig -> origin/gh/rec/162/orig 2025-09-07T09:36:19.8136834Z * [new branch] gh/rec/163/base -> origin/gh/rec/163/base 2025-09-07T09:36:19.8138316Z * [new branch] gh/rec/163/head -> origin/gh/rec/163/head 2025-09-07T09:36:19.8139755Z * [new branch] gh/rec/163/orig -> origin/gh/rec/163/orig 2025-09-07T09:36:19.8142059Z * [new branch] gh/rec/164/base -> origin/gh/rec/164/base 2025-09-07T09:36:19.8143566Z * [new branch] gh/rec/164/head -> origin/gh/rec/164/head 2025-09-07T09:36:19.8145476Z * [new branch] gh/rec/164/orig -> origin/gh/rec/164/orig 2025-09-07T09:36:19.8147798Z * [new branch] gh/rec/165/base -> origin/gh/rec/165/base 2025-09-07T09:36:19.8149503Z * [new branch] gh/rec/165/head -> origin/gh/rec/165/head 2025-09-07T09:36:19.8151027Z * [new branch] gh/rec/165/orig -> origin/gh/rec/165/orig 2025-09-07T09:36:19.8153122Z * [new branch] gh/rec/166/base -> origin/gh/rec/166/base 2025-09-07T09:36:19.8154680Z * [new branch] gh/rec/166/head -> origin/gh/rec/166/head 2025-09-07T09:36:19.8156416Z * [new branch] gh/rec/166/orig -> origin/gh/rec/166/orig 2025-09-07T09:36:19.8159218Z * [new branch] gh/robert-hardwick/1/base -> origin/gh/robert-hardwick/1/base 2025-09-07T09:36:19.8160778Z * [new branch] gh/robert-hardwick/1/head -> origin/gh/robert-hardwick/1/head 2025-09-07T09:36:19.8162229Z * [new branch] gh/robert-hardwick/1/orig -> origin/gh/robert-hardwick/1/orig 2025-09-07T09:36:19.8164612Z * [new branch] gh/robert-hardwick/2/base -> origin/gh/robert-hardwick/2/base 2025-09-07T09:36:19.8166453Z * [new branch] gh/robert-hardwick/2/head -> origin/gh/robert-hardwick/2/head 2025-09-07T09:36:19.8168061Z * [new branch] gh/robert-hardwick/2/orig -> origin/gh/robert-hardwick/2/orig 2025-09-07T09:36:19.8170074Z * [new branch] gh/robert-hardwick/3/base -> origin/gh/robert-hardwick/3/base 2025-09-07T09:36:19.8171767Z * [new branch] gh/robert-hardwick/3/head -> origin/gh/robert-hardwick/3/head 2025-09-07T09:36:19.8173192Z * [new branch] gh/robert-hardwick/3/orig -> origin/gh/robert-hardwick/3/orig 2025-09-07T09:36:19.8175548Z * [new branch] gh/robert-hardwick/4/base -> origin/gh/robert-hardwick/4/base 2025-09-07T09:36:19.8177174Z * [new branch] gh/robert-hardwick/4/head -> origin/gh/robert-hardwick/4/head 2025-09-07T09:36:19.8178769Z * [new branch] gh/robert-hardwick/4/orig -> origin/gh/robert-hardwick/4/orig 2025-09-07T09:36:19.8181645Z * [new branch] gh/rtimpe/1/base -> origin/gh/rtimpe/1/base 2025-09-07T09:36:19.8183333Z * [new branch] gh/rtimpe/1/head -> origin/gh/rtimpe/1/head 2025-09-07T09:36:19.8185681Z * [new branch] gh/rtimpe/10/base -> origin/gh/rtimpe/10/base 2025-09-07T09:36:19.8187257Z * [new branch] gh/rtimpe/10/head -> origin/gh/rtimpe/10/head 2025-09-07T09:36:19.8189048Z * [new branch] gh/rtimpe/10/orig -> origin/gh/rtimpe/10/orig 2025-09-07T09:36:19.8191218Z * [new branch] gh/rtimpe/11/base -> origin/gh/rtimpe/11/base 2025-09-07T09:36:19.8192618Z * [new branch] gh/rtimpe/11/head -> origin/gh/rtimpe/11/head 2025-09-07T09:36:19.8194044Z * [new branch] gh/rtimpe/11/orig -> origin/gh/rtimpe/11/orig 2025-09-07T09:36:19.8196598Z * [new branch] gh/rtimpe/12/base -> origin/gh/rtimpe/12/base 2025-09-07T09:36:19.8198148Z * [new branch] gh/rtimpe/12/head -> origin/gh/rtimpe/12/head 2025-09-07T09:36:19.8199671Z * [new branch] gh/rtimpe/12/orig -> origin/gh/rtimpe/12/orig 2025-09-07T09:36:19.8201815Z * [new branch] gh/rtimpe/13/base -> origin/gh/rtimpe/13/base 2025-09-07T09:36:19.8203394Z * [new branch] gh/rtimpe/13/head -> origin/gh/rtimpe/13/head 2025-09-07T09:36:19.8204902Z * [new branch] gh/rtimpe/13/orig -> origin/gh/rtimpe/13/orig 2025-09-07T09:36:19.8207456Z * [new branch] gh/rtimpe/14/base -> origin/gh/rtimpe/14/base 2025-09-07T09:36:19.8208889Z * [new branch] gh/rtimpe/14/head -> origin/gh/rtimpe/14/head 2025-09-07T09:36:19.8210560Z * [new branch] gh/rtimpe/14/orig -> origin/gh/rtimpe/14/orig 2025-09-07T09:36:19.8212744Z * [new branch] gh/rtimpe/15/base -> origin/gh/rtimpe/15/base 2025-09-07T09:36:19.8214227Z * [new branch] gh/rtimpe/15/head -> origin/gh/rtimpe/15/head 2025-09-07T09:36:19.8216116Z * [new branch] gh/rtimpe/15/orig -> origin/gh/rtimpe/15/orig 2025-09-07T09:36:19.8218212Z * [new branch] gh/rtimpe/2/base -> origin/gh/rtimpe/2/base 2025-09-07T09:36:19.8219696Z * [new branch] gh/rtimpe/2/head -> origin/gh/rtimpe/2/head 2025-09-07T09:36:19.8221948Z * [new branch] gh/rtimpe/3/base -> origin/gh/rtimpe/3/base 2025-09-07T09:36:19.8223489Z * [new branch] gh/rtimpe/3/head -> origin/gh/rtimpe/3/head 2025-09-07T09:36:19.8225896Z * [new branch] gh/rtimpe/4/base -> origin/gh/rtimpe/4/base 2025-09-07T09:36:19.8227314Z * [new branch] gh/rtimpe/4/head -> origin/gh/rtimpe/4/head 2025-09-07T09:36:19.8229656Z * [new branch] gh/rtimpe/9/base -> origin/gh/rtimpe/9/base 2025-09-07T09:36:19.8231126Z * [new branch] gh/rtimpe/9/head -> origin/gh/rtimpe/9/head 2025-09-07T09:36:19.8232608Z * [new branch] gh/rtimpe/9/orig -> origin/gh/rtimpe/9/orig 2025-09-07T09:36:19.8235860Z * [new branch] gh/ruisizhang123/1/base -> origin/gh/ruisizhang123/1/base 2025-09-07T09:36:19.8237321Z * [new branch] gh/ruisizhang123/1/head -> origin/gh/ruisizhang123/1/head 2025-09-07T09:36:19.8238837Z * [new branch] gh/ruisizhang123/1/orig -> origin/gh/ruisizhang123/1/orig 2025-09-07T09:36:19.8240989Z * [new branch] gh/ruisizhang123/4/base -> origin/gh/ruisizhang123/4/base 2025-09-07T09:36:19.8242688Z * [new branch] gh/ruisizhang123/4/head -> origin/gh/ruisizhang123/4/head 2025-09-07T09:36:19.8244245Z * [new branch] gh/ruisizhang123/4/orig -> origin/gh/ruisizhang123/4/orig 2025-09-07T09:36:19.8246898Z * [new branch] gh/ruisizhang123/5/base -> origin/gh/ruisizhang123/5/base 2025-09-07T09:36:19.8248273Z * [new branch] gh/ruisizhang123/5/head -> origin/gh/ruisizhang123/5/head 2025-09-07T09:36:19.8249910Z * [new branch] gh/ruisizhang123/5/orig -> origin/gh/ruisizhang123/5/orig 2025-09-07T09:36:19.8252106Z * [new branch] gh/ruisizhang123/6/base -> origin/gh/ruisizhang123/6/base 2025-09-07T09:36:19.8253935Z * [new branch] gh/ruisizhang123/6/head -> origin/gh/ruisizhang123/6/head 2025-09-07T09:36:19.8255327Z * [new branch] gh/ruisizhang123/6/orig -> origin/gh/ruisizhang123/6/orig 2025-09-07T09:36:19.8257672Z * [new branch] gh/ruisizhang123/7/base -> origin/gh/ruisizhang123/7/base 2025-09-07T09:36:19.8259354Z * [new branch] gh/ruisizhang123/7/head -> origin/gh/ruisizhang123/7/head 2025-09-07T09:36:19.8260851Z * [new branch] gh/ruisizhang123/7/orig -> origin/gh/ruisizhang123/7/orig 2025-09-07T09:36:19.8263178Z * [new branch] gh/ruisizhang123/8/base -> origin/gh/ruisizhang123/8/base 2025-09-07T09:36:19.8264659Z * [new branch] gh/ruisizhang123/8/head -> origin/gh/ruisizhang123/8/head 2025-09-07T09:36:19.8266672Z * [new branch] gh/ruisizhang123/8/orig -> origin/gh/ruisizhang123/8/orig 2025-09-07T09:36:19.8268672Z * [new branch] gh/ruisizhang123/9/base -> origin/gh/ruisizhang123/9/base 2025-09-07T09:36:19.8270212Z * [new branch] gh/ruisizhang123/9/head -> origin/gh/ruisizhang123/9/head 2025-09-07T09:36:19.8272002Z * [new branch] gh/ruisizhang123/9/orig -> origin/gh/ruisizhang123/9/orig 2025-09-07T09:36:19.8274677Z * [new branch] gh/sarckk/2/base -> origin/gh/sarckk/2/base 2025-09-07T09:36:19.8276595Z * [new branch] gh/sarckk/2/head -> origin/gh/sarckk/2/head 2025-09-07T09:36:19.8278032Z * [new branch] gh/sarckk/2/orig -> origin/gh/sarckk/2/orig 2025-09-07T09:36:19.8280788Z * [new branch] gh/seemethere/35/base -> origin/gh/seemethere/35/base 2025-09-07T09:36:19.8282514Z * [new branch] gh/seemethere/35/head -> origin/gh/seemethere/35/head 2025-09-07T09:36:19.8284057Z * [new branch] gh/seemethere/35/orig -> origin/gh/seemethere/35/orig 2025-09-07T09:36:19.8286552Z * [new branch] gh/seemethere/37/base -> origin/gh/seemethere/37/base 2025-09-07T09:36:19.8288214Z * [new branch] gh/seemethere/37/head -> origin/gh/seemethere/37/head 2025-09-07T09:36:19.8289639Z * [new branch] gh/seemethere/37/orig -> origin/gh/seemethere/37/orig 2025-09-07T09:36:19.8291938Z * [new branch] gh/seemethere/43/base -> origin/gh/seemethere/43/base 2025-09-07T09:36:19.8293471Z * [new branch] gh/seemethere/43/head -> origin/gh/seemethere/43/head 2025-09-07T09:36:19.8295326Z * [new branch] gh/seemethere/43/orig -> origin/gh/seemethere/43/orig 2025-09-07T09:36:19.8297457Z * [new branch] gh/seemethere/44/base -> origin/gh/seemethere/44/base 2025-09-07T09:36:19.8298968Z * [new branch] gh/seemethere/44/head -> origin/gh/seemethere/44/head 2025-09-07T09:36:19.8300567Z * [new branch] gh/seemethere/44/orig -> origin/gh/seemethere/44/orig 2025-09-07T09:36:19.8302846Z * [new branch] gh/seemethere/48/base -> origin/gh/seemethere/48/base 2025-09-07T09:36:19.8304394Z * [new branch] gh/seemethere/48/head -> origin/gh/seemethere/48/head 2025-09-07T09:36:19.8306294Z * [new branch] gh/seemethere/48/orig -> origin/gh/seemethere/48/orig 2025-09-07T09:36:19.8308552Z * [new branch] gh/seemethere/49/base -> origin/gh/seemethere/49/base 2025-09-07T09:36:19.8310132Z * [new branch] gh/seemethere/49/head -> origin/gh/seemethere/49/head 2025-09-07T09:36:19.8311632Z * [new branch] gh/seemethere/49/orig -> origin/gh/seemethere/49/orig 2025-09-07T09:36:19.8313715Z * [new branch] gh/seemethere/52/base -> origin/gh/seemethere/52/base 2025-09-07T09:36:19.8315360Z * [new branch] gh/seemethere/52/head -> origin/gh/seemethere/52/head 2025-09-07T09:36:19.8317093Z * [new branch] gh/seemethere/52/orig -> origin/gh/seemethere/52/orig 2025-09-07T09:36:19.8319623Z * [new branch] gh/seemethere/53/base -> origin/gh/seemethere/53/base 2025-09-07T09:36:19.8320964Z * [new branch] gh/seemethere/53/head -> origin/gh/seemethere/53/head 2025-09-07T09:36:19.8322409Z * [new branch] gh/seemethere/53/orig -> origin/gh/seemethere/53/orig 2025-09-07T09:36:19.8324696Z * [new branch] gh/seemethere/54/base -> origin/gh/seemethere/54/base 2025-09-07T09:36:19.8326545Z * [new branch] gh/seemethere/54/head -> origin/gh/seemethere/54/head 2025-09-07T09:36:19.8328124Z * [new branch] gh/seemethere/54/orig -> origin/gh/seemethere/54/orig 2025-09-07T09:36:19.8330442Z * [new branch] gh/seemethere/55/base -> origin/gh/seemethere/55/base 2025-09-07T09:36:19.8331850Z * [new branch] gh/seemethere/55/head -> origin/gh/seemethere/55/head 2025-09-07T09:36:19.8333293Z * [new branch] gh/seemethere/55/orig -> origin/gh/seemethere/55/orig 2025-09-07T09:36:19.8335940Z * [new branch] gh/seemethere/56/base -> origin/gh/seemethere/56/base 2025-09-07T09:36:19.8337704Z * [new branch] gh/seemethere/56/head -> origin/gh/seemethere/56/head 2025-09-07T09:36:19.8339072Z * [new branch] gh/seemethere/56/orig -> origin/gh/seemethere/56/orig 2025-09-07T09:36:19.8341271Z * [new branch] gh/seemethere/57/base -> origin/gh/seemethere/57/base 2025-09-07T09:36:19.8342985Z * [new branch] gh/seemethere/57/head -> origin/gh/seemethere/57/head 2025-09-07T09:36:19.8344522Z * [new branch] gh/seemethere/57/orig -> origin/gh/seemethere/57/orig 2025-09-07T09:36:19.8347012Z * [new branch] gh/seemethere/58/base -> origin/gh/seemethere/58/base 2025-09-07T09:36:19.8348571Z * [new branch] gh/seemethere/58/head -> origin/gh/seemethere/58/head 2025-09-07T09:36:19.8350268Z * [new branch] gh/seemethere/58/orig -> origin/gh/seemethere/58/orig 2025-09-07T09:36:19.8352399Z * [new branch] gh/seemethere/59/base -> origin/gh/seemethere/59/base 2025-09-07T09:36:19.8353932Z * [new branch] gh/seemethere/59/head -> origin/gh/seemethere/59/head 2025-09-07T09:36:19.8355539Z * [new branch] gh/seemethere/59/orig -> origin/gh/seemethere/59/orig 2025-09-07T09:36:19.8357761Z * [new branch] gh/seemethere/60/base -> origin/gh/seemethere/60/base 2025-09-07T09:36:19.8359469Z * [new branch] gh/seemethere/60/head -> origin/gh/seemethere/60/head 2025-09-07T09:36:19.8360973Z * [new branch] gh/seemethere/60/orig -> origin/gh/seemethere/60/orig 2025-09-07T09:36:19.8363117Z * [new branch] gh/seemethere/61/base -> origin/gh/seemethere/61/base 2025-09-07T09:36:19.8364680Z * [new branch] gh/seemethere/61/head -> origin/gh/seemethere/61/head 2025-09-07T09:36:19.8366520Z * [new branch] gh/seemethere/61/orig -> origin/gh/seemethere/61/orig 2025-09-07T09:36:19.8368577Z * [new branch] gh/seemethere/62/base -> origin/gh/seemethere/62/base 2025-09-07T09:36:19.8370133Z * [new branch] gh/seemethere/62/head -> origin/gh/seemethere/62/head 2025-09-07T09:36:19.8371685Z * [new branch] gh/seemethere/62/orig -> origin/gh/seemethere/62/orig 2025-09-07T09:36:19.8373863Z * [new branch] gh/seemethere/63/base -> origin/gh/seemethere/63/base 2025-09-07T09:36:19.8376494Z * [new branch] gh/seemethere/63/head -> origin/gh/seemethere/63/head 2025-09-07T09:36:19.8378002Z * [new branch] gh/seemethere/63/orig -> origin/gh/seemethere/63/orig 2025-09-07T09:36:19.8380976Z * [new branch] gh/shunting314/145/base -> origin/gh/shunting314/145/base 2025-09-07T09:36:19.8382967Z * [new branch] gh/shunting314/145/head -> origin/gh/shunting314/145/head 2025-09-07T09:36:19.8384858Z * [new branch] gh/shunting314/145/orig -> origin/gh/shunting314/145/orig 2025-09-07T09:36:19.8387277Z * [new branch] gh/shunting314/176/base -> origin/gh/shunting314/176/base 2025-09-07T09:36:19.8388812Z * [new branch] gh/shunting314/176/head -> origin/gh/shunting314/176/head 2025-09-07T09:36:19.8390403Z * [new branch] gh/shunting314/176/orig -> origin/gh/shunting314/176/orig 2025-09-07T09:36:19.8392867Z * [new branch] gh/shunting314/211/base -> origin/gh/shunting314/211/base 2025-09-07T09:36:19.8394226Z * [new branch] gh/shunting314/211/head -> origin/gh/shunting314/211/head 2025-09-07T09:36:19.8396174Z * [new branch] gh/shunting314/211/orig -> origin/gh/shunting314/211/orig 2025-09-07T09:36:19.8398344Z * [new branch] gh/shunting314/212/base -> origin/gh/shunting314/212/base 2025-09-07T09:36:19.8399936Z * [new branch] gh/shunting314/212/head -> origin/gh/shunting314/212/head 2025-09-07T09:36:19.8401354Z * [new branch] gh/shunting314/212/orig -> origin/gh/shunting314/212/orig 2025-09-07T09:36:19.8403730Z * [new branch] gh/shunting314/213/base -> origin/gh/shunting314/213/base 2025-09-07T09:36:19.8405457Z * [new branch] gh/shunting314/213/head -> origin/gh/shunting314/213/head 2025-09-07T09:36:19.8407159Z * [new branch] gh/shunting314/213/orig -> origin/gh/shunting314/213/orig 2025-09-07T09:36:19.8409309Z * [new branch] gh/shunting314/214/base -> origin/gh/shunting314/214/base 2025-09-07T09:36:19.8410902Z * [new branch] gh/shunting314/214/head -> origin/gh/shunting314/214/head 2025-09-07T09:36:19.8412313Z * [new branch] gh/shunting314/214/orig -> origin/gh/shunting314/214/orig 2025-09-07T09:36:19.8414500Z * [new branch] gh/shunting314/215/base -> origin/gh/shunting314/215/base 2025-09-07T09:36:19.8416379Z * [new branch] gh/shunting314/215/head -> origin/gh/shunting314/215/head 2025-09-07T09:36:19.8418127Z * [new branch] gh/shunting314/215/orig -> origin/gh/shunting314/215/orig 2025-09-07T09:36:19.8420309Z * [new branch] gh/shunting314/216/base -> origin/gh/shunting314/216/base 2025-09-07T09:36:19.8421958Z * [new branch] gh/shunting314/216/head -> origin/gh/shunting314/216/head 2025-09-07T09:36:19.8423360Z * [new branch] gh/shunting314/216/orig -> origin/gh/shunting314/216/orig 2025-09-07T09:36:19.8425991Z * [new branch] gh/shunting314/217/base -> origin/gh/shunting314/217/base 2025-09-07T09:36:19.8427642Z * [new branch] gh/shunting314/217/head -> origin/gh/shunting314/217/head 2025-09-07T09:36:19.8429376Z * [new branch] gh/shunting314/217/orig -> origin/gh/shunting314/217/orig 2025-09-07T09:36:19.8431681Z * [new branch] gh/shunting314/218/base -> origin/gh/shunting314/218/base 2025-09-07T09:36:19.8432958Z * [new branch] gh/shunting314/218/head -> origin/gh/shunting314/218/head 2025-09-07T09:36:19.8434681Z * [new branch] gh/shunting314/218/orig -> origin/gh/shunting314/218/orig 2025-09-07T09:36:19.8437224Z * [new branch] gh/shunting314/219/base -> origin/gh/shunting314/219/base 2025-09-07T09:36:19.8438744Z * [new branch] gh/shunting314/219/head -> origin/gh/shunting314/219/head 2025-09-07T09:36:19.8440243Z * [new branch] gh/shunting314/219/orig -> origin/gh/shunting314/219/orig 2025-09-07T09:36:19.8442692Z * [new branch] gh/shunting314/220/base -> origin/gh/shunting314/220/base 2025-09-07T09:36:19.8444303Z * [new branch] gh/shunting314/220/head -> origin/gh/shunting314/220/head 2025-09-07T09:36:19.8445997Z * [new branch] gh/shunting314/220/orig -> origin/gh/shunting314/220/orig 2025-09-07T09:36:19.8448510Z * [new branch] gh/shunting314/221/base -> origin/gh/shunting314/221/base 2025-09-07T09:36:19.8449757Z * [new branch] gh/shunting314/221/head -> origin/gh/shunting314/221/head 2025-09-07T09:36:19.8451208Z * [new branch] gh/shunting314/221/orig -> origin/gh/shunting314/221/orig 2025-09-07T09:36:19.8453382Z * [new branch] gh/shunting314/222/base -> origin/gh/shunting314/222/base 2025-09-07T09:36:19.8455102Z * [new branch] gh/shunting314/222/head -> origin/gh/shunting314/222/head 2025-09-07T09:36:19.8457070Z * [new branch] gh/shunting314/222/orig -> origin/gh/shunting314/222/orig 2025-09-07T09:36:19.8458977Z * [new branch] gh/shunting314/223/base -> origin/gh/shunting314/223/base 2025-09-07T09:36:19.8460511Z * [new branch] gh/shunting314/223/head -> origin/gh/shunting314/223/head 2025-09-07T09:36:19.8462281Z * [new branch] gh/shunting314/223/orig -> origin/gh/shunting314/223/orig 2025-09-07T09:36:19.8465255Z * [new branch] gh/silverguo/1/base -> origin/gh/silverguo/1/base 2025-09-07T09:36:19.8467031Z * [new branch] gh/silverguo/1/head -> origin/gh/silverguo/1/head 2025-09-07T09:36:19.8468919Z * [new branch] gh/silverguo/2/base -> origin/gh/silverguo/2/base 2025-09-07T09:36:19.8470332Z * [new branch] gh/silverguo/2/head -> origin/gh/silverguo/2/head 2025-09-07T09:36:19.8472438Z * [new branch] gh/silverguo/3/base -> origin/gh/silverguo/3/base 2025-09-07T09:36:19.8474031Z * [new branch] gh/silverguo/3/head -> origin/gh/silverguo/3/head 2025-09-07T09:36:19.8476379Z * [new branch] gh/silverguo/4/base -> origin/gh/silverguo/4/base 2025-09-07T09:36:19.8477843Z * [new branch] gh/silverguo/4/head -> origin/gh/silverguo/4/head 2025-09-07T09:36:19.8480594Z * [new branch] gh/sinhaanhsul/1/base -> origin/gh/sinhaanhsul/1/base 2025-09-07T09:36:19.8482344Z * [new branch] gh/sinhaanhsul/1/head -> origin/gh/sinhaanhsul/1/head 2025-09-07T09:36:19.8484922Z * [new branch] gh/skarjala/17/base -> origin/gh/skarjala/17/base 2025-09-07T09:36:19.8486930Z * [new branch] gh/skarjala/17/head -> origin/gh/skarjala/17/head 2025-09-07T09:36:19.8488324Z * [new branch] gh/skarjala/17/orig -> origin/gh/skarjala/17/orig 2025-09-07T09:36:19.8490613Z * [new branch] gh/skarjala/18/base -> origin/gh/skarjala/18/base 2025-09-07T09:36:19.8492139Z * [new branch] gh/skarjala/18/head -> origin/gh/skarjala/18/head 2025-09-07T09:36:19.8493838Z * [new branch] gh/skarjala/18/orig -> origin/gh/skarjala/18/orig 2025-09-07T09:36:19.8496209Z * [new branch] gh/skarjala/19/base -> origin/gh/skarjala/19/base 2025-09-07T09:36:19.8497675Z * [new branch] gh/skarjala/19/head -> origin/gh/skarjala/19/head 2025-09-07T09:36:19.8499207Z * [new branch] gh/skarjala/19/orig -> origin/gh/skarjala/19/orig 2025-09-07T09:36:19.8502105Z * [new branch] gh/slayton58/1/base -> origin/gh/slayton58/1/base 2025-09-07T09:36:19.8503619Z * [new branch] gh/slayton58/1/head -> origin/gh/slayton58/1/head 2025-09-07T09:36:19.8505351Z * [new branch] gh/slayton58/1/orig -> origin/gh/slayton58/1/orig 2025-09-07T09:36:19.8507719Z * [new branch] gh/slayton58/2/base -> origin/gh/slayton58/2/base 2025-09-07T09:36:19.8509159Z * [new branch] gh/slayton58/2/head -> origin/gh/slayton58/2/head 2025-09-07T09:36:19.8510605Z * [new branch] gh/slayton58/2/orig -> origin/gh/slayton58/2/orig 2025-09-07T09:36:19.8512679Z * [new branch] gh/slayton58/3/base -> origin/gh/slayton58/3/base 2025-09-07T09:36:19.8514450Z * [new branch] gh/slayton58/3/head -> origin/gh/slayton58/3/head 2025-09-07T09:36:19.8516222Z * [new branch] gh/slayton58/3/orig -> origin/gh/slayton58/3/orig 2025-09-07T09:36:19.8518398Z * [new branch] gh/slayton58/4/base -> origin/gh/slayton58/4/base 2025-09-07T09:36:19.8519888Z * [new branch] gh/slayton58/4/head -> origin/gh/slayton58/4/head 2025-09-07T09:36:19.8521414Z * [new branch] gh/slayton58/4/orig -> origin/gh/slayton58/4/orig 2025-09-07T09:36:19.8523778Z * [new branch] gh/slayton58/5/base -> origin/gh/slayton58/5/base 2025-09-07T09:36:19.8525682Z * [new branch] gh/slayton58/5/head -> origin/gh/slayton58/5/head 2025-09-07T09:36:19.8527261Z * [new branch] gh/slayton58/5/orig -> origin/gh/slayton58/5/orig 2025-09-07T09:36:19.8530111Z * [new branch] gh/soulitzer/269/base -> origin/gh/soulitzer/269/base 2025-09-07T09:36:19.8531610Z * [new branch] gh/soulitzer/269/head -> origin/gh/soulitzer/269/head 2025-09-07T09:36:19.8533242Z * [new branch] gh/soulitzer/269/orig -> origin/gh/soulitzer/269/orig 2025-09-07T09:36:19.8535744Z * [new branch] gh/soulitzer/276/base -> origin/gh/soulitzer/276/base 2025-09-07T09:36:19.8537336Z * [new branch] gh/soulitzer/276/head -> origin/gh/soulitzer/276/head 2025-09-07T09:36:19.8538940Z * [new branch] gh/soulitzer/276/orig -> origin/gh/soulitzer/276/orig 2025-09-07T09:36:19.8541311Z * [new branch] gh/soulitzer/287/base -> origin/gh/soulitzer/287/base 2025-09-07T09:36:19.8543090Z * [new branch] gh/soulitzer/287/head -> origin/gh/soulitzer/287/head 2025-09-07T09:36:19.8544745Z * [new branch] gh/soulitzer/287/orig -> origin/gh/soulitzer/287/orig 2025-09-07T09:36:19.8547376Z * [new branch] gh/soulitzer/296/base -> origin/gh/soulitzer/296/base 2025-09-07T09:36:19.8548734Z * [new branch] gh/soulitzer/296/head -> origin/gh/soulitzer/296/head 2025-09-07T09:36:19.8550335Z * [new branch] gh/soulitzer/296/orig -> origin/gh/soulitzer/296/orig 2025-09-07T09:36:19.8552647Z * [new branch] gh/soulitzer/299/base -> origin/gh/soulitzer/299/base 2025-09-07T09:36:19.8554170Z * [new branch] gh/soulitzer/299/head -> origin/gh/soulitzer/299/head 2025-09-07T09:36:19.8555964Z * [new branch] gh/soulitzer/299/orig -> origin/gh/soulitzer/299/orig 2025-09-07T09:36:19.8558260Z * [new branch] gh/soulitzer/300/base -> origin/gh/soulitzer/300/base 2025-09-07T09:36:19.8559895Z * [new branch] gh/soulitzer/300/head -> origin/gh/soulitzer/300/head 2025-09-07T09:36:19.8561237Z * [new branch] gh/soulitzer/300/orig -> origin/gh/soulitzer/300/orig 2025-09-07T09:36:19.8563796Z * [new branch] gh/soulitzer/301/base -> origin/gh/soulitzer/301/base 2025-09-07T09:36:19.8565433Z * [new branch] gh/soulitzer/301/head -> origin/gh/soulitzer/301/head 2025-09-07T09:36:19.8567027Z * [new branch] gh/soulitzer/301/orig -> origin/gh/soulitzer/301/orig 2025-09-07T09:36:19.8569312Z * [new branch] gh/soulitzer/313/base -> origin/gh/soulitzer/313/base 2025-09-07T09:36:19.8570868Z * [new branch] gh/soulitzer/313/head -> origin/gh/soulitzer/313/head 2025-09-07T09:36:19.8572171Z * [new branch] gh/soulitzer/313/orig -> origin/gh/soulitzer/313/orig 2025-09-07T09:36:19.8574379Z * [new branch] gh/soulitzer/319/base -> origin/gh/soulitzer/319/base 2025-09-07T09:36:19.8576225Z * [new branch] gh/soulitzer/319/head -> origin/gh/soulitzer/319/head 2025-09-07T09:36:19.8577821Z * [new branch] gh/soulitzer/319/orig -> origin/gh/soulitzer/319/orig 2025-09-07T09:36:19.8580442Z * [new branch] gh/soulitzer/320/base -> origin/gh/soulitzer/320/base 2025-09-07T09:36:19.8581921Z * [new branch] gh/soulitzer/320/head -> origin/gh/soulitzer/320/head 2025-09-07T09:36:19.8583518Z * [new branch] gh/soulitzer/320/orig -> origin/gh/soulitzer/320/orig 2025-09-07T09:36:19.8586080Z * [new branch] gh/soulitzer/336/base -> origin/gh/soulitzer/336/base 2025-09-07T09:36:19.8587443Z * [new branch] gh/soulitzer/336/head -> origin/gh/soulitzer/336/head 2025-09-07T09:36:19.8589059Z * [new branch] gh/soulitzer/336/orig -> origin/gh/soulitzer/336/orig 2025-09-07T09:36:19.8591389Z * [new branch] gh/soulitzer/347/base -> origin/gh/soulitzer/347/base 2025-09-07T09:36:19.8592772Z * [new branch] gh/soulitzer/347/head -> origin/gh/soulitzer/347/head 2025-09-07T09:36:19.8594296Z * [new branch] gh/soulitzer/347/orig -> origin/gh/soulitzer/347/orig 2025-09-07T09:36:19.8596985Z * [new branch] gh/soulitzer/349/base -> origin/gh/soulitzer/349/base 2025-09-07T09:36:19.8598538Z * [new branch] gh/soulitzer/349/head -> origin/gh/soulitzer/349/head 2025-09-07T09:36:19.8600170Z * [new branch] gh/soulitzer/349/orig -> origin/gh/soulitzer/349/orig 2025-09-07T09:36:19.8602296Z * [new branch] gh/soulitzer/350/base -> origin/gh/soulitzer/350/base 2025-09-07T09:36:19.8603631Z * [new branch] gh/soulitzer/350/head -> origin/gh/soulitzer/350/head 2025-09-07T09:36:19.8605330Z * [new branch] gh/soulitzer/350/orig -> origin/gh/soulitzer/350/orig 2025-09-07T09:36:19.8607722Z * [new branch] gh/soulitzer/351/base -> origin/gh/soulitzer/351/base 2025-09-07T09:36:19.8609306Z * [new branch] gh/soulitzer/351/head -> origin/gh/soulitzer/351/head 2025-09-07T09:36:19.8610841Z * [new branch] gh/soulitzer/351/orig -> origin/gh/soulitzer/351/orig 2025-09-07T09:36:19.8613014Z * [new branch] gh/soulitzer/353/base -> origin/gh/soulitzer/353/base 2025-09-07T09:36:19.8614694Z * [new branch] gh/soulitzer/353/head -> origin/gh/soulitzer/353/head 2025-09-07T09:36:19.8616855Z * [new branch] gh/soulitzer/353/orig -> origin/gh/soulitzer/353/orig 2025-09-07T09:36:19.8619399Z * [new branch] gh/soulitzer/358/base -> origin/gh/soulitzer/358/base 2025-09-07T09:36:19.8621011Z * [new branch] gh/soulitzer/358/head -> origin/gh/soulitzer/358/head 2025-09-07T09:36:19.8622691Z * [new branch] gh/soulitzer/358/orig -> origin/gh/soulitzer/358/orig 2025-09-07T09:36:19.8625440Z * [new branch] gh/soulitzer/359/base -> origin/gh/soulitzer/359/base 2025-09-07T09:36:19.8627120Z * [new branch] gh/soulitzer/359/head -> origin/gh/soulitzer/359/head 2025-09-07T09:36:19.8628676Z * [new branch] gh/soulitzer/359/orig -> origin/gh/soulitzer/359/orig 2025-09-07T09:36:19.8630832Z * [new branch] gh/soulitzer/362/base -> origin/gh/soulitzer/362/base 2025-09-07T09:36:19.8632299Z * [new branch] gh/soulitzer/362/head -> origin/gh/soulitzer/362/head 2025-09-07T09:36:19.8633942Z * [new branch] gh/soulitzer/362/orig -> origin/gh/soulitzer/362/orig 2025-09-07T09:36:19.8636345Z * [new branch] gh/soulitzer/372/base -> origin/gh/soulitzer/372/base 2025-09-07T09:36:19.8637904Z * [new branch] gh/soulitzer/372/head -> origin/gh/soulitzer/372/head 2025-09-07T09:36:19.8639467Z * [new branch] gh/soulitzer/372/orig -> origin/gh/soulitzer/372/orig 2025-09-07T09:36:19.8641676Z * [new branch] gh/soulitzer/373/base -> origin/gh/soulitzer/373/base 2025-09-07T09:36:19.8643361Z * [new branch] gh/soulitzer/373/head -> origin/gh/soulitzer/373/head 2025-09-07T09:36:19.8644733Z * [new branch] gh/soulitzer/373/orig -> origin/gh/soulitzer/373/orig 2025-09-07T09:36:19.8647258Z * [new branch] gh/soulitzer/374/base -> origin/gh/soulitzer/374/base 2025-09-07T09:36:19.8648811Z * [new branch] gh/soulitzer/374/head -> origin/gh/soulitzer/374/head 2025-09-07T09:36:19.8650292Z * [new branch] gh/soulitzer/374/orig -> origin/gh/soulitzer/374/orig 2025-09-07T09:36:19.8652560Z * [new branch] gh/soulitzer/375/base -> origin/gh/soulitzer/375/base 2025-09-07T09:36:19.8654148Z * [new branch] gh/soulitzer/375/head -> origin/gh/soulitzer/375/head 2025-09-07T09:36:19.8656041Z * [new branch] gh/soulitzer/375/orig -> origin/gh/soulitzer/375/orig 2025-09-07T09:36:19.8658197Z * [new branch] gh/soulitzer/376/base -> origin/gh/soulitzer/376/base 2025-09-07T09:36:19.8659791Z * [new branch] gh/soulitzer/376/head -> origin/gh/soulitzer/376/head 2025-09-07T09:36:19.8661299Z * [new branch] gh/soulitzer/376/orig -> origin/gh/soulitzer/376/orig 2025-09-07T09:36:19.8663829Z * [new branch] gh/soulitzer/377/base -> origin/gh/soulitzer/377/base 2025-09-07T09:36:19.8665529Z * [new branch] gh/soulitzer/377/head -> origin/gh/soulitzer/377/head 2025-09-07T09:36:19.8667082Z * [new branch] gh/soulitzer/377/orig -> origin/gh/soulitzer/377/orig 2025-09-07T09:36:19.8669311Z * [new branch] gh/soulitzer/378/base -> origin/gh/soulitzer/378/base 2025-09-07T09:36:19.8670897Z * [new branch] gh/soulitzer/378/head -> origin/gh/soulitzer/378/head 2025-09-07T09:36:19.8672678Z * [new branch] gh/soulitzer/378/orig -> origin/gh/soulitzer/378/orig 2025-09-07T09:36:19.8675240Z * [new branch] gh/soulitzer/379/base -> origin/gh/soulitzer/379/base 2025-09-07T09:36:19.8677157Z * [new branch] gh/soulitzer/379/head -> origin/gh/soulitzer/379/head 2025-09-07T09:36:19.8678543Z * [new branch] gh/soulitzer/379/orig -> origin/gh/soulitzer/379/orig 2025-09-07T09:36:19.8681301Z * [new branch] gh/swolchok/728/next -> origin/gh/swolchok/728/next 2025-09-07T09:36:19.8683770Z * [new branch] gh/swolchok/767/base -> origin/gh/swolchok/767/base 2025-09-07T09:36:19.8685779Z * [new branch] gh/swolchok/767/head -> origin/gh/swolchok/767/head 2025-09-07T09:36:19.8687577Z * [new branch] gh/swolchok/767/orig -> origin/gh/swolchok/767/orig 2025-09-07T09:36:19.8689816Z * [new branch] gh/swolchok/768/base -> origin/gh/swolchok/768/base 2025-09-07T09:36:19.8691436Z * [new branch] gh/swolchok/768/head -> origin/gh/swolchok/768/head 2025-09-07T09:36:19.8693253Z * [new branch] gh/swolchok/768/orig -> origin/gh/swolchok/768/orig 2025-09-07T09:36:19.8695996Z * [new branch] gh/swolchok/769/base -> origin/gh/swolchok/769/base 2025-09-07T09:36:19.8697486Z * [new branch] gh/swolchok/769/head -> origin/gh/swolchok/769/head 2025-09-07T09:36:19.8699164Z * [new branch] gh/swolchok/769/orig -> origin/gh/swolchok/769/orig 2025-09-07T09:36:19.8701324Z * [new branch] gh/swolchok/771/base -> origin/gh/swolchok/771/base 2025-09-07T09:36:19.8703074Z * [new branch] gh/swolchok/771/head -> origin/gh/swolchok/771/head 2025-09-07T09:36:19.8704515Z * [new branch] gh/swolchok/771/orig -> origin/gh/swolchok/771/orig 2025-09-07T09:36:19.8707020Z * [new branch] gh/swolchok/772/base -> origin/gh/swolchok/772/base 2025-09-07T09:36:19.8708730Z * [new branch] gh/swolchok/772/head -> origin/gh/swolchok/772/head 2025-09-07T09:36:19.8710400Z * [new branch] gh/swolchok/772/orig -> origin/gh/swolchok/772/orig 2025-09-07T09:36:19.8713112Z * [new branch] gh/swolchok/773/base -> origin/gh/swolchok/773/base 2025-09-07T09:36:19.8714621Z * [new branch] gh/swolchok/773/head -> origin/gh/swolchok/773/head 2025-09-07T09:36:19.8716574Z * [new branch] gh/swolchok/773/orig -> origin/gh/swolchok/773/orig 2025-09-07T09:36:19.8719049Z * [new branch] gh/swolchok/786/base -> origin/gh/swolchok/786/base 2025-09-07T09:36:19.8720477Z * [new branch] gh/swolchok/786/head -> origin/gh/swolchok/786/head 2025-09-07T09:36:19.8721953Z * [new branch] gh/swolchok/786/orig -> origin/gh/swolchok/786/orig 2025-09-07T09:36:19.8724057Z * [new branch] gh/swolchok/787/base -> origin/gh/swolchok/787/base 2025-09-07T09:36:19.8725994Z * [new branch] gh/swolchok/787/head -> origin/gh/swolchok/787/head 2025-09-07T09:36:19.8727551Z * [new branch] gh/swolchok/787/orig -> origin/gh/swolchok/787/orig 2025-09-07T09:36:19.8729762Z * [new branch] gh/swolchok/788/base -> origin/gh/swolchok/788/base 2025-09-07T09:36:19.8731499Z * [new branch] gh/swolchok/788/head -> origin/gh/swolchok/788/head 2025-09-07T09:36:19.8733071Z * [new branch] gh/swolchok/788/orig -> origin/gh/swolchok/788/orig 2025-09-07T09:36:19.8735316Z * [new branch] gh/swolchok/789/base -> origin/gh/swolchok/789/base 2025-09-07T09:36:19.8736927Z * [new branch] gh/swolchok/789/head -> origin/gh/swolchok/789/head 2025-09-07T09:36:19.8738529Z * [new branch] gh/swolchok/789/orig -> origin/gh/swolchok/789/orig 2025-09-07T09:36:19.8740673Z * [new branch] gh/swolchok/790/base -> origin/gh/swolchok/790/base 2025-09-07T09:36:19.8742368Z * [new branch] gh/swolchok/790/head -> origin/gh/swolchok/790/head 2025-09-07T09:36:19.8743877Z * [new branch] gh/swolchok/790/orig -> origin/gh/swolchok/790/orig 2025-09-07T09:36:19.8746826Z * [new branch] gh/swolchok/791/base -> origin/gh/swolchok/791/base 2025-09-07T09:36:19.8748226Z * [new branch] gh/swolchok/791/head -> origin/gh/swolchok/791/head 2025-09-07T09:36:19.8749852Z * [new branch] gh/swolchok/791/orig -> origin/gh/swolchok/791/orig 2025-09-07T09:36:19.8752151Z * [new branch] gh/swolchok/792/base -> origin/gh/swolchok/792/base 2025-09-07T09:36:19.8753657Z * [new branch] gh/swolchok/792/head -> origin/gh/swolchok/792/head 2025-09-07T09:36:19.8755161Z * [new branch] gh/swolchok/792/orig -> origin/gh/swolchok/792/orig 2025-09-07T09:36:19.8757630Z * [new branch] gh/swolchok/793/base -> origin/gh/swolchok/793/base 2025-09-07T09:36:19.8759086Z * [new branch] gh/swolchok/793/head -> origin/gh/swolchok/793/head 2025-09-07T09:36:19.8760619Z * [new branch] gh/swolchok/793/orig -> origin/gh/swolchok/793/orig 2025-09-07T09:36:19.8762912Z * [new branch] gh/swolchok/794/base -> origin/gh/swolchok/794/base 2025-09-07T09:36:19.8764669Z * [new branch] gh/swolchok/794/head -> origin/gh/swolchok/794/head 2025-09-07T09:36:19.8766300Z * [new branch] gh/swolchok/794/orig -> origin/gh/swolchok/794/orig 2025-09-07T09:36:19.8768606Z * [new branch] gh/swolchok/795/base -> origin/gh/swolchok/795/base 2025-09-07T09:36:19.8770161Z * [new branch] gh/swolchok/795/head -> origin/gh/swolchok/795/head 2025-09-07T09:36:19.8771734Z * [new branch] gh/swolchok/795/orig -> origin/gh/swolchok/795/orig 2025-09-07T09:36:19.8773962Z * [new branch] gh/swolchok/796/base -> origin/gh/swolchok/796/base 2025-09-07T09:36:19.8776177Z * [new branch] gh/swolchok/796/head -> origin/gh/swolchok/796/head 2025-09-07T09:36:19.8777453Z * [new branch] gh/swolchok/796/orig -> origin/gh/swolchok/796/orig 2025-09-07T09:36:19.8779819Z * [new branch] gh/swolchok/797/base -> origin/gh/swolchok/797/base 2025-09-07T09:36:19.8781566Z * [new branch] gh/swolchok/797/head -> origin/gh/swolchok/797/head 2025-09-07T09:36:19.8783161Z * [new branch] gh/swolchok/797/orig -> origin/gh/swolchok/797/orig 2025-09-07T09:36:19.8785624Z * [new branch] gh/swolchok/798/base -> origin/gh/swolchok/798/base 2025-09-07T09:36:19.8787164Z * [new branch] gh/swolchok/798/head -> origin/gh/swolchok/798/head 2025-09-07T09:36:19.8788810Z * [new branch] gh/swolchok/798/orig -> origin/gh/swolchok/798/orig 2025-09-07T09:36:19.8791222Z * [new branch] gh/swolchok/799/base -> origin/gh/swolchok/799/base 2025-09-07T09:36:19.8792764Z * [new branch] gh/swolchok/799/head -> origin/gh/swolchok/799/head 2025-09-07T09:36:19.8794443Z * [new branch] gh/swolchok/799/orig -> origin/gh/swolchok/799/orig 2025-09-07T09:36:19.8797033Z * [new branch] gh/swolchok/800/base -> origin/gh/swolchok/800/base 2025-09-07T09:36:19.8798513Z * [new branch] gh/swolchok/800/head -> origin/gh/swolchok/800/head 2025-09-07T09:36:19.8800356Z * [new branch] gh/swolchok/800/orig -> origin/gh/swolchok/800/orig 2025-09-07T09:36:19.8802661Z * [new branch] gh/swolchok/801/base -> origin/gh/swolchok/801/base 2025-09-07T09:36:19.8804095Z * [new branch] gh/swolchok/801/head -> origin/gh/swolchok/801/head 2025-09-07T09:36:19.8806279Z * [new branch] gh/swolchok/801/orig -> origin/gh/swolchok/801/orig 2025-09-07T09:36:19.8808533Z * [new branch] gh/swolchok/802/base -> origin/gh/swolchok/802/base 2025-09-07T09:36:19.8809983Z * [new branch] gh/swolchok/802/head -> origin/gh/swolchok/802/head 2025-09-07T09:36:19.8811591Z * [new branch] gh/swolchok/802/orig -> origin/gh/swolchok/802/orig 2025-09-07T09:36:19.8814088Z * [new branch] gh/swolchok/803/base -> origin/gh/swolchok/803/base 2025-09-07T09:36:19.8815835Z * [new branch] gh/swolchok/803/head -> origin/gh/swolchok/803/head 2025-09-07T09:36:19.8817391Z * [new branch] gh/swolchok/803/orig -> origin/gh/swolchok/803/orig 2025-09-07T09:36:19.8819792Z * [new branch] gh/swolchok/804/base -> origin/gh/swolchok/804/base 2025-09-07T09:36:19.8821281Z * [new branch] gh/swolchok/804/head -> origin/gh/swolchok/804/head 2025-09-07T09:36:19.8823112Z * [new branch] gh/swolchok/804/orig -> origin/gh/swolchok/804/orig 2025-09-07T09:36:19.8825344Z * [new branch] gh/swolchok/805/base -> origin/gh/swolchok/805/base 2025-09-07T09:36:19.8827243Z * [new branch] gh/swolchok/805/head -> origin/gh/swolchok/805/head 2025-09-07T09:36:19.8828686Z * [new branch] gh/swolchok/805/orig -> origin/gh/swolchok/805/orig 2025-09-07T09:36:19.8830766Z * [new branch] gh/swolchok/806/base -> origin/gh/swolchok/806/base 2025-09-07T09:36:19.8832339Z * [new branch] gh/swolchok/806/head -> origin/gh/swolchok/806/head 2025-09-07T09:36:19.8834021Z * [new branch] gh/swolchok/806/orig -> origin/gh/swolchok/806/orig 2025-09-07T09:36:19.8837146Z * [new branch] gh/swolchok/807/base -> origin/gh/swolchok/807/base 2025-09-07T09:36:19.8838561Z * [new branch] gh/swolchok/807/head -> origin/gh/swolchok/807/head 2025-09-07T09:36:19.8840211Z * [new branch] gh/swolchok/807/orig -> origin/gh/swolchok/807/orig 2025-09-07T09:36:19.8842726Z * [new branch] gh/swolchok/808/base -> origin/gh/swolchok/808/base 2025-09-07T09:36:19.8844055Z * [new branch] gh/swolchok/808/head -> origin/gh/swolchok/808/head 2025-09-07T09:36:19.8845878Z * [new branch] gh/swolchok/808/orig -> origin/gh/swolchok/808/orig 2025-09-07T09:36:19.8848083Z * [new branch] gh/swolchok/809/base -> origin/gh/swolchok/809/base 2025-09-07T09:36:19.8849635Z * [new branch] gh/swolchok/809/head -> origin/gh/swolchok/809/head 2025-09-07T09:36:19.8851170Z * [new branch] gh/swolchok/809/orig -> origin/gh/swolchok/809/orig 2025-09-07T09:36:19.8853704Z * [new branch] gh/swolchok/810/base -> origin/gh/swolchok/810/base 2025-09-07T09:36:19.8855423Z * [new branch] gh/swolchok/810/head -> origin/gh/swolchok/810/head 2025-09-07T09:36:19.8856993Z * [new branch] gh/swolchok/810/orig -> origin/gh/swolchok/810/orig 2025-09-07T09:36:19.8859271Z * [new branch] gh/swolchok/811/base -> origin/gh/swolchok/811/base 2025-09-07T09:36:19.8860878Z * [new branch] gh/swolchok/811/head -> origin/gh/swolchok/811/head 2025-09-07T09:36:19.8862622Z * [new branch] gh/swolchok/811/orig -> origin/gh/swolchok/811/orig 2025-09-07T09:36:19.8865218Z * [new branch] gh/swolchok/812/base -> origin/gh/swolchok/812/base 2025-09-07T09:36:19.8866855Z * [new branch] gh/swolchok/812/head -> origin/gh/swolchok/812/head 2025-09-07T09:36:19.8868312Z * [new branch] gh/swolchok/812/orig -> origin/gh/swolchok/812/orig 2025-09-07T09:36:19.8870517Z * [new branch] gh/swolchok/813/base -> origin/gh/swolchok/813/base 2025-09-07T09:36:19.8872004Z * [new branch] gh/swolchok/813/head -> origin/gh/swolchok/813/head 2025-09-07T09:36:19.8873534Z * [new branch] gh/swolchok/813/orig -> origin/gh/swolchok/813/orig 2025-09-07T09:36:19.8876179Z * [new branch] gh/swolchok/814/base -> origin/gh/swolchok/814/base 2025-09-07T09:36:19.8877666Z * [new branch] gh/swolchok/814/head -> origin/gh/swolchok/814/head 2025-09-07T09:36:19.8879255Z * [new branch] gh/swolchok/814/orig -> origin/gh/swolchok/814/orig 2025-09-07T09:36:19.8881506Z * [new branch] gh/swolchok/815/base -> origin/gh/swolchok/815/base 2025-09-07T09:36:19.8883134Z * [new branch] gh/swolchok/815/head -> origin/gh/swolchok/815/head 2025-09-07T09:36:19.8884730Z * [new branch] gh/swolchok/815/orig -> origin/gh/swolchok/815/orig 2025-09-07T09:36:19.8887314Z * [new branch] gh/swolchok/816/base -> origin/gh/swolchok/816/base 2025-09-07T09:36:19.8888871Z * [new branch] gh/swolchok/816/head -> origin/gh/swolchok/816/head 2025-09-07T09:36:19.8890555Z * [new branch] gh/swolchok/816/orig -> origin/gh/swolchok/816/orig 2025-09-07T09:36:19.8892822Z * [new branch] gh/swolchok/817/base -> origin/gh/swolchok/817/base 2025-09-07T09:36:19.8894466Z * [new branch] gh/swolchok/817/head -> origin/gh/swolchok/817/head 2025-09-07T09:36:19.8896206Z * [new branch] gh/swolchok/817/orig -> origin/gh/swolchok/817/orig 2025-09-07T09:36:19.8898523Z * [new branch] gh/swolchok/818/base -> origin/gh/swolchok/818/base 2025-09-07T09:36:19.8900137Z * [new branch] gh/swolchok/818/head -> origin/gh/swolchok/818/head 2025-09-07T09:36:19.8901904Z * [new branch] gh/swolchok/818/orig -> origin/gh/swolchok/818/orig 2025-09-07T09:36:19.8904321Z * [new branch] gh/swolchok/819/base -> origin/gh/swolchok/819/base 2025-09-07T09:36:19.8906105Z * [new branch] gh/swolchok/819/head -> origin/gh/swolchok/819/head 2025-09-07T09:36:19.8907837Z * [new branch] gh/swolchok/819/orig -> origin/gh/swolchok/819/orig 2025-09-07T09:36:19.8909979Z * [new branch] gh/swolchok/820/base -> origin/gh/swolchok/820/base 2025-09-07T09:36:19.8911467Z * [new branch] gh/swolchok/820/head -> origin/gh/swolchok/820/head 2025-09-07T09:36:19.8912973Z * [new branch] gh/swolchok/820/orig -> origin/gh/swolchok/820/orig 2025-09-07T09:36:19.8915711Z * [new branch] gh/swolchok/821/base -> origin/gh/swolchok/821/base 2025-09-07T09:36:19.8917038Z * [new branch] gh/swolchok/821/head -> origin/gh/swolchok/821/head 2025-09-07T09:36:19.8918464Z * [new branch] gh/swolchok/821/orig -> origin/gh/swolchok/821/orig 2025-09-07T09:36:19.8921110Z * [new branch] gh/swolchok/822/base -> origin/gh/swolchok/822/base 2025-09-07T09:36:19.8922534Z * [new branch] gh/swolchok/822/head -> origin/gh/swolchok/822/head 2025-09-07T09:36:19.8923983Z * [new branch] gh/swolchok/822/orig -> origin/gh/swolchok/822/orig 2025-09-07T09:36:19.8926908Z * [new branch] gh/swolchok/823/base -> origin/gh/swolchok/823/base 2025-09-07T09:36:19.8928270Z * [new branch] gh/swolchok/823/head -> origin/gh/swolchok/823/head 2025-09-07T09:36:19.8929784Z * [new branch] gh/swolchok/823/orig -> origin/gh/swolchok/823/orig 2025-09-07T09:36:19.8932050Z * [new branch] gh/swolchok/824/base -> origin/gh/swolchok/824/base 2025-09-07T09:36:19.8933564Z * [new branch] gh/swolchok/824/head -> origin/gh/swolchok/824/head 2025-09-07T09:36:19.8935224Z * [new branch] gh/swolchok/824/orig -> origin/gh/swolchok/824/orig 2025-09-07T09:36:19.8937767Z * [new branch] gh/swolchok/825/base -> origin/gh/swolchok/825/base 2025-09-07T09:36:19.8939201Z * [new branch] gh/swolchok/825/head -> origin/gh/swolchok/825/head 2025-09-07T09:36:19.8940816Z * [new branch] gh/swolchok/825/orig -> origin/gh/swolchok/825/orig 2025-09-07T09:36:19.8943330Z * [new branch] gh/swolchok/826/base -> origin/gh/swolchok/826/base 2025-09-07T09:36:19.8944823Z * [new branch] gh/swolchok/826/head -> origin/gh/swolchok/826/head 2025-09-07T09:36:19.8946561Z * [new branch] gh/swolchok/826/orig -> origin/gh/swolchok/826/orig 2025-09-07T09:36:19.8948842Z * [new branch] gh/swolchok/827/base -> origin/gh/swolchok/827/base 2025-09-07T09:36:19.8950335Z * [new branch] gh/swolchok/827/head -> origin/gh/swolchok/827/head 2025-09-07T09:36:19.8951784Z * [new branch] gh/swolchok/827/orig -> origin/gh/swolchok/827/orig 2025-09-07T09:36:19.8954207Z * [new branch] gh/swolchok/828/base -> origin/gh/swolchok/828/base 2025-09-07T09:36:19.8955995Z * [new branch] gh/swolchok/828/head -> origin/gh/swolchok/828/head 2025-09-07T09:36:19.8957548Z * [new branch] gh/swolchok/828/orig -> origin/gh/swolchok/828/orig 2025-09-07T09:36:19.8959767Z * [new branch] gh/swolchok/829/base -> origin/gh/swolchok/829/base 2025-09-07T09:36:19.8961279Z * [new branch] gh/swolchok/829/head -> origin/gh/swolchok/829/head 2025-09-07T09:36:19.8962788Z * [new branch] gh/swolchok/829/orig -> origin/gh/swolchok/829/orig 2025-09-07T09:36:19.8965533Z * [new branch] gh/swolchok/830/base -> origin/gh/swolchok/830/base 2025-09-07T09:36:19.8966895Z * [new branch] gh/swolchok/830/head -> origin/gh/swolchok/830/head 2025-09-07T09:36:19.8968307Z * [new branch] gh/swolchok/830/orig -> origin/gh/swolchok/830/orig 2025-09-07T09:36:19.8970394Z * [new branch] gh/swolchok/831/base -> origin/gh/swolchok/831/base 2025-09-07T09:36:19.8972324Z * [new branch] gh/swolchok/831/head -> origin/gh/swolchok/831/head 2025-09-07T09:36:19.8973513Z * [new branch] gh/swolchok/831/orig -> origin/gh/swolchok/831/orig 2025-09-07T09:36:19.8975962Z * [new branch] gh/swolchok/832/base -> origin/gh/swolchok/832/base 2025-09-07T09:36:19.8977647Z * [new branch] gh/swolchok/832/head -> origin/gh/swolchok/832/head 2025-09-07T09:36:19.8979247Z * [new branch] gh/swolchok/832/orig -> origin/gh/swolchok/832/orig 2025-09-07T09:36:19.8982045Z * [new branch] gh/syed-ahmed/3/base -> origin/gh/syed-ahmed/3/base 2025-09-07T09:36:19.8983605Z * [new branch] gh/syed-ahmed/3/head -> origin/gh/syed-ahmed/3/head 2025-09-07T09:36:19.8985346Z * [new branch] gh/syed-ahmed/3/orig -> origin/gh/syed-ahmed/3/orig 2025-09-07T09:36:19.8987632Z * [new branch] gh/syed-ahmed/4/base -> origin/gh/syed-ahmed/4/base 2025-09-07T09:36:19.8989186Z * [new branch] gh/syed-ahmed/4/head -> origin/gh/syed-ahmed/4/head 2025-09-07T09:36:19.8990594Z * [new branch] gh/syed-ahmed/4/orig -> origin/gh/syed-ahmed/4/orig 2025-09-07T09:36:19.8992902Z * [new branch] gh/syed-ahmed/5/base -> origin/gh/syed-ahmed/5/base 2025-09-07T09:36:19.8994566Z * [new branch] gh/syed-ahmed/5/head -> origin/gh/syed-ahmed/5/head 2025-09-07T09:36:19.8996506Z * [new branch] gh/syed-ahmed/5/orig -> origin/gh/syed-ahmed/5/orig 2025-09-07T09:36:19.8999393Z * [new branch] gh/teja-rao/4/base -> origin/gh/teja-rao/4/base 2025-09-07T09:36:19.9000991Z * [new branch] gh/teja-rao/4/head -> origin/gh/teja-rao/4/head 2025-09-07T09:36:19.9002504Z * [new branch] gh/teja-rao/4/orig -> origin/gh/teja-rao/4/orig 2025-09-07T09:36:19.9005390Z * [new branch] gh/tianyu-l/2/base -> origin/gh/tianyu-l/2/base 2025-09-07T09:36:19.9007070Z * [new branch] gh/tianyu-l/2/head -> origin/gh/tianyu-l/2/head 2025-09-07T09:36:19.9008542Z * [new branch] gh/tianyu-l/2/orig -> origin/gh/tianyu-l/2/orig 2025-09-07T09:36:19.9010729Z * [new branch] gh/tianyu-l/3/base -> origin/gh/tianyu-l/3/base 2025-09-07T09:36:19.9012348Z * [new branch] gh/tianyu-l/3/head -> origin/gh/tianyu-l/3/head 2025-09-07T09:36:19.9013723Z * [new branch] gh/tianyu-l/3/orig -> origin/gh/tianyu-l/3/orig 2025-09-07T09:36:19.9016200Z * [new branch] gh/tianyu-l/4/base -> origin/gh/tianyu-l/4/base 2025-09-07T09:36:19.9017728Z * [new branch] gh/tianyu-l/4/head -> origin/gh/tianyu-l/4/head 2025-09-07T09:36:19.9019200Z * [new branch] gh/tianyu-l/4/orig -> origin/gh/tianyu-l/4/orig 2025-09-07T09:36:19.9022519Z * [new branch] gh/tugsbayasgalan/1/base -> origin/gh/tugsbayasgalan/1/base 2025-09-07T09:36:19.9023924Z * [new branch] gh/tugsbayasgalan/1/head -> origin/gh/tugsbayasgalan/1/head 2025-09-07T09:36:19.9025946Z * [new branch] gh/tugsbayasgalan/1/orig -> origin/gh/tugsbayasgalan/1/orig 2025-09-07T09:36:19.9028247Z * [new branch] gh/tugsbayasgalan/10/base -> origin/gh/tugsbayasgalan/10/base 2025-09-07T09:36:19.9029866Z * [new branch] gh/tugsbayasgalan/10/head -> origin/gh/tugsbayasgalan/10/head 2025-09-07T09:36:19.9031397Z * [new branch] gh/tugsbayasgalan/10/orig -> origin/gh/tugsbayasgalan/10/orig 2025-09-07T09:36:19.9033440Z * [new branch] gh/tugsbayasgalan/11/base -> origin/gh/tugsbayasgalan/11/base 2025-09-07T09:36:19.9035215Z * [new branch] gh/tugsbayasgalan/11/head -> origin/gh/tugsbayasgalan/11/head 2025-09-07T09:36:19.9037046Z * [new branch] gh/tugsbayasgalan/11/orig -> origin/gh/tugsbayasgalan/11/orig 2025-09-07T09:36:19.9039557Z * [new branch] gh/tugsbayasgalan/12/base -> origin/gh/tugsbayasgalan/12/base 2025-09-07T09:36:19.9040742Z * [new branch] gh/tugsbayasgalan/12/head -> origin/gh/tugsbayasgalan/12/head 2025-09-07T09:36:19.9042181Z * [new branch] gh/tugsbayasgalan/12/orig -> origin/gh/tugsbayasgalan/12/orig 2025-09-07T09:36:19.9044557Z * [new branch] gh/tugsbayasgalan/13/base -> origin/gh/tugsbayasgalan/13/base 2025-09-07T09:36:19.9046408Z * [new branch] gh/tugsbayasgalan/13/head -> origin/gh/tugsbayasgalan/13/head 2025-09-07T09:36:19.9047934Z * [new branch] gh/tugsbayasgalan/13/orig -> origin/gh/tugsbayasgalan/13/orig 2025-09-07T09:36:19.9050102Z * [new branch] gh/tugsbayasgalan/14/base -> origin/gh/tugsbayasgalan/14/base 2025-09-07T09:36:19.9051592Z * [new branch] gh/tugsbayasgalan/14/head -> origin/gh/tugsbayasgalan/14/head 2025-09-07T09:36:19.9053142Z * [new branch] gh/tugsbayasgalan/14/orig -> origin/gh/tugsbayasgalan/14/orig 2025-09-07T09:36:19.9055746Z * [new branch] gh/tugsbayasgalan/15/base -> origin/gh/tugsbayasgalan/15/base 2025-09-07T09:36:19.9057239Z * [new branch] gh/tugsbayasgalan/15/head -> origin/gh/tugsbayasgalan/15/head 2025-09-07T09:36:19.9058899Z * [new branch] gh/tugsbayasgalan/15/orig -> origin/gh/tugsbayasgalan/15/orig 2025-09-07T09:36:19.9061113Z * [new branch] gh/tugsbayasgalan/2/base -> origin/gh/tugsbayasgalan/2/base 2025-09-07T09:36:19.9062784Z * [new branch] gh/tugsbayasgalan/2/head -> origin/gh/tugsbayasgalan/2/head 2025-09-07T09:36:19.9064268Z * [new branch] gh/tugsbayasgalan/2/orig -> origin/gh/tugsbayasgalan/2/orig 2025-09-07T09:36:19.9067010Z * [new branch] gh/tugsbayasgalan/3/base -> origin/gh/tugsbayasgalan/3/base 2025-09-07T09:36:19.9068602Z * [new branch] gh/tugsbayasgalan/3/head -> origin/gh/tugsbayasgalan/3/head 2025-09-07T09:36:19.9070088Z * [new branch] gh/tugsbayasgalan/3/orig -> origin/gh/tugsbayasgalan/3/orig 2025-09-07T09:36:19.9072385Z * [new branch] gh/tugsbayasgalan/4/base -> origin/gh/tugsbayasgalan/4/base 2025-09-07T09:36:19.9074115Z * [new branch] gh/tugsbayasgalan/4/head -> origin/gh/tugsbayasgalan/4/head 2025-09-07T09:36:19.9075858Z * [new branch] gh/tugsbayasgalan/4/orig -> origin/gh/tugsbayasgalan/4/orig 2025-09-07T09:36:19.9078132Z * [new branch] gh/tugsbayasgalan/5/base -> origin/gh/tugsbayasgalan/5/base 2025-09-07T09:36:19.9079696Z * [new branch] gh/tugsbayasgalan/5/head -> origin/gh/tugsbayasgalan/5/head 2025-09-07T09:36:19.9081194Z * [new branch] gh/tugsbayasgalan/5/orig -> origin/gh/tugsbayasgalan/5/orig 2025-09-07T09:36:19.9083332Z * [new branch] gh/tugsbayasgalan/6/base -> origin/gh/tugsbayasgalan/6/base 2025-09-07T09:36:19.9085124Z * [new branch] gh/tugsbayasgalan/6/head -> origin/gh/tugsbayasgalan/6/head 2025-09-07T09:36:19.9087076Z * [new branch] gh/tugsbayasgalan/6/orig -> origin/gh/tugsbayasgalan/6/orig 2025-09-07T09:36:19.9089285Z * [new branch] gh/tugsbayasgalan/7/base -> origin/gh/tugsbayasgalan/7/base 2025-09-07T09:36:19.9090775Z * [new branch] gh/tugsbayasgalan/7/head -> origin/gh/tugsbayasgalan/7/head 2025-09-07T09:36:19.9092439Z * [new branch] gh/tugsbayasgalan/7/orig -> origin/gh/tugsbayasgalan/7/orig 2025-09-07T09:36:19.9094756Z * [new branch] gh/tugsbayasgalan/8/base -> origin/gh/tugsbayasgalan/8/base 2025-09-07T09:36:19.9096607Z * [new branch] gh/tugsbayasgalan/8/head -> origin/gh/tugsbayasgalan/8/head 2025-09-07T09:36:19.9098234Z * [new branch] gh/tugsbayasgalan/8/orig -> origin/gh/tugsbayasgalan/8/orig 2025-09-07T09:36:19.9100354Z * [new branch] gh/tugsbayasgalan/9/base -> origin/gh/tugsbayasgalan/9/base 2025-09-07T09:36:19.9102062Z * [new branch] gh/tugsbayasgalan/9/head -> origin/gh/tugsbayasgalan/9/head 2025-09-07T09:36:19.9103474Z * [new branch] gh/tugsbayasgalan/9/orig -> origin/gh/tugsbayasgalan/9/orig 2025-09-07T09:36:19.9106656Z * [new branch] gh/v0i0/1/base -> origin/gh/v0i0/1/base 2025-09-07T09:36:19.9108098Z * [new branch] gh/v0i0/1/head -> origin/gh/v0i0/1/head 2025-09-07T09:36:19.9109657Z * [new branch] gh/v0i0/1/orig -> origin/gh/v0i0/1/orig 2025-09-07T09:36:19.9112019Z * [new branch] gh/v0i0/4/base -> origin/gh/v0i0/4/base 2025-09-07T09:36:19.9113509Z * [new branch] gh/v0i0/4/head -> origin/gh/v0i0/4/head 2025-09-07T09:36:19.9115220Z * [new branch] gh/v0i0/4/orig -> origin/gh/v0i0/4/orig 2025-09-07T09:36:19.9117628Z * [new branch] gh/v0i0/6/base -> origin/gh/v0i0/6/base 2025-09-07T09:36:19.9119126Z * [new branch] gh/v0i0/6/head -> origin/gh/v0i0/6/head 2025-09-07T09:36:19.9120671Z * [new branch] gh/v0i0/6/orig -> origin/gh/v0i0/6/orig 2025-09-07T09:36:19.9122875Z * [new branch] gh/v0i0/7/base -> origin/gh/v0i0/7/base 2025-09-07T09:36:19.9124452Z * [new branch] gh/v0i0/7/head -> origin/gh/v0i0/7/head 2025-09-07T09:36:19.9126525Z * [new branch] gh/v0i0/7/orig -> origin/gh/v0i0/7/orig 2025-09-07T09:36:19.9128814Z * [new branch] gh/v0i0/8/base -> origin/gh/v0i0/8/base 2025-09-07T09:36:19.9130275Z * [new branch] gh/v0i0/8/head -> origin/gh/v0i0/8/head 2025-09-07T09:36:19.9131757Z * [new branch] gh/v0i0/8/orig -> origin/gh/v0i0/8/orig 2025-09-07T09:36:19.9133985Z * [new branch] gh/v0i0/9/base -> origin/gh/v0i0/9/base 2025-09-07T09:36:19.9135827Z * [new branch] gh/v0i0/9/head -> origin/gh/v0i0/9/head 2025-09-07T09:36:19.9137351Z * [new branch] gh/v0i0/9/orig -> origin/gh/v0i0/9/orig 2025-09-07T09:36:19.9140071Z * [new branch] gh/vkuzo/1/next -> origin/gh/vkuzo/1/next 2025-09-07T09:36:19.9142372Z * [new branch] gh/vkuzo/2/next -> origin/gh/vkuzo/2/next 2025-09-07T09:36:19.9144523Z * [new branch] gh/vkuzo/3/next -> origin/gh/vkuzo/3/next 2025-09-07T09:36:19.9147056Z * [new branch] gh/vkuzo/4/base -> origin/gh/vkuzo/4/base 2025-09-07T09:36:19.9148670Z * [new branch] gh/vkuzo/4/head -> origin/gh/vkuzo/4/head 2025-09-07T09:36:19.9150405Z * [new branch] gh/vkuzo/4/orig -> origin/gh/vkuzo/4/orig 2025-09-07T09:36:19.9152783Z * [new branch] gh/vkuzo/5/base -> origin/gh/vkuzo/5/base 2025-09-07T09:36:19.9154402Z * [new branch] gh/vkuzo/5/head -> origin/gh/vkuzo/5/head 2025-09-07T09:36:19.9156263Z * [new branch] gh/vkuzo/5/orig -> origin/gh/vkuzo/5/orig 2025-09-07T09:36:19.9158566Z * [new branch] gh/vkuzo/6/base -> origin/gh/vkuzo/6/base 2025-09-07T09:36:19.9160151Z * [new branch] gh/vkuzo/6/head -> origin/gh/vkuzo/6/head 2025-09-07T09:36:19.9161726Z * [new branch] gh/vkuzo/6/orig -> origin/gh/vkuzo/6/orig 2025-09-07T09:36:19.9163817Z * [new branch] gh/vkuzo/7/base -> origin/gh/vkuzo/7/base 2025-09-07T09:36:19.9165698Z * [new branch] gh/vkuzo/7/head -> origin/gh/vkuzo/7/head 2025-09-07T09:36:19.9167273Z * [new branch] gh/vkuzo/7/orig -> origin/gh/vkuzo/7/orig 2025-09-07T09:36:19.9170117Z * [new branch] gh/wconstab/419/base -> origin/gh/wconstab/419/base 2025-09-07T09:36:19.9171831Z * [new branch] gh/wconstab/419/head -> origin/gh/wconstab/419/head 2025-09-07T09:36:19.9173341Z * [new branch] gh/wconstab/419/orig -> origin/gh/wconstab/419/orig 2025-09-07T09:36:19.9176032Z * [new branch] gh/wconstab/424/base -> origin/gh/wconstab/424/base 2025-09-07T09:36:19.9177470Z * [new branch] gh/wconstab/424/head -> origin/gh/wconstab/424/head 2025-09-07T09:36:19.9178917Z * [new branch] gh/wconstab/424/orig -> origin/gh/wconstab/424/orig 2025-09-07T09:36:19.9181275Z * [new branch] gh/wconstab/435/base -> origin/gh/wconstab/435/base 2025-09-07T09:36:19.9186144Z * [new branch] gh/wconstab/435/head -> origin/gh/wconstab/435/head 2025-09-07T09:36:19.9187610Z * [new branch] gh/wconstab/435/orig -> origin/gh/wconstab/435/orig 2025-09-07T09:36:19.9189924Z * [new branch] gh/wconstab/438/base -> origin/gh/wconstab/438/base 2025-09-07T09:36:19.9191422Z * [new branch] gh/wconstab/438/head -> origin/gh/wconstab/438/head 2025-09-07T09:36:19.9192986Z * [new branch] gh/wconstab/438/orig -> origin/gh/wconstab/438/orig 2025-09-07T09:36:19.9195344Z * [new branch] gh/wconstab/440/base -> origin/gh/wconstab/440/base 2025-09-07T09:36:19.9197284Z * [new branch] gh/wconstab/440/head -> origin/gh/wconstab/440/head 2025-09-07T09:36:19.9198936Z * [new branch] gh/wconstab/440/orig -> origin/gh/wconstab/440/orig 2025-09-07T09:36:19.9201467Z * [new branch] gh/wconstab/441/base -> origin/gh/wconstab/441/base 2025-09-07T09:36:19.9202986Z * [new branch] gh/wconstab/441/head -> origin/gh/wconstab/441/head 2025-09-07T09:36:19.9204570Z * [new branch] gh/wconstab/441/orig -> origin/gh/wconstab/441/orig 2025-09-07T09:36:19.9207227Z * [new branch] gh/wconstab/442/base -> origin/gh/wconstab/442/base 2025-09-07T09:36:19.9208891Z * [new branch] gh/wconstab/442/head -> origin/gh/wconstab/442/head 2025-09-07T09:36:19.9210659Z * [new branch] gh/wconstab/442/orig -> origin/gh/wconstab/442/orig 2025-09-07T09:36:19.9212841Z * [new branch] gh/wconstab/443/base -> origin/gh/wconstab/443/base 2025-09-07T09:36:19.9214358Z * [new branch] gh/wconstab/443/head -> origin/gh/wconstab/443/head 2025-09-07T09:36:19.9216216Z * [new branch] gh/wconstab/443/orig -> origin/gh/wconstab/443/orig 2025-09-07T09:36:19.9219755Z * [new branch] gh/wconstab/444/base -> origin/gh/wconstab/444/base 2025-09-07T09:36:19.9220376Z * [new branch] gh/wconstab/444/head -> origin/gh/wconstab/444/head 2025-09-07T09:36:19.9222236Z * [new branch] gh/wconstab/444/orig -> origin/gh/wconstab/444/orig 2025-09-07T09:36:19.9224892Z * [new branch] gh/wconstab/445/base -> origin/gh/wconstab/445/base 2025-09-07T09:36:19.9226492Z * [new branch] gh/wconstab/445/head -> origin/gh/wconstab/445/head 2025-09-07T09:36:19.9228005Z * [new branch] gh/wconstab/445/orig -> origin/gh/wconstab/445/orig 2025-09-07T09:36:19.9230760Z * [new branch] gh/wconstab/446/base -> origin/gh/wconstab/446/base 2025-09-07T09:36:19.9232415Z * [new branch] gh/wconstab/446/head -> origin/gh/wconstab/446/head 2025-09-07T09:36:19.9234393Z * [new branch] gh/wconstab/446/orig -> origin/gh/wconstab/446/orig 2025-09-07T09:36:19.9236928Z * [new branch] gh/wconstab/447/base -> origin/gh/wconstab/447/base 2025-09-07T09:36:19.9238519Z * [new branch] gh/wconstab/447/head -> origin/gh/wconstab/447/head 2025-09-07T09:36:19.9240087Z * [new branch] gh/wconstab/447/orig -> origin/gh/wconstab/447/orig 2025-09-07T09:36:19.9243207Z * [new branch] gh/weifengpy/27/base -> origin/gh/weifengpy/27/base 2025-09-07T09:36:19.9244492Z * [new branch] gh/weifengpy/27/head -> origin/gh/weifengpy/27/head 2025-09-07T09:36:19.9246272Z * [new branch] gh/weifengpy/27/orig -> origin/gh/weifengpy/27/orig 2025-09-07T09:36:19.9248540Z * [new branch] gh/weifengpy/30/base -> origin/gh/weifengpy/30/base 2025-09-07T09:36:19.9250043Z * [new branch] gh/weifengpy/30/head -> origin/gh/weifengpy/30/head 2025-09-07T09:36:19.9251703Z * [new branch] gh/weifengpy/30/orig -> origin/gh/weifengpy/30/orig 2025-09-07T09:36:19.9254596Z * [new branch] gh/williamwen42/196/base -> origin/gh/williamwen42/196/base 2025-09-07T09:36:19.9256411Z * [new branch] gh/williamwen42/196/head -> origin/gh/williamwen42/196/head 2025-09-07T09:36:19.9258021Z * [new branch] gh/williamwen42/196/orig -> origin/gh/williamwen42/196/orig 2025-09-07T09:36:19.9260300Z * [new branch] gh/williamwen42/250/base -> origin/gh/williamwen42/250/base 2025-09-07T09:36:19.9262161Z * [new branch] gh/williamwen42/250/head -> origin/gh/williamwen42/250/head 2025-09-07T09:36:19.9263621Z * [new branch] gh/williamwen42/250/orig -> origin/gh/williamwen42/250/orig 2025-09-07T09:36:19.9266117Z * [new branch] gh/williamwen42/258/base -> origin/gh/williamwen42/258/base 2025-09-07T09:36:19.9267941Z * [new branch] gh/williamwen42/258/head -> origin/gh/williamwen42/258/head 2025-09-07T09:36:19.9269300Z * [new branch] gh/williamwen42/258/orig -> origin/gh/williamwen42/258/orig 2025-09-07T09:36:19.9271497Z * [new branch] gh/williamwen42/266/base -> origin/gh/williamwen42/266/base 2025-09-07T09:36:19.9273119Z * [new branch] gh/williamwen42/266/head -> origin/gh/williamwen42/266/head 2025-09-07T09:36:19.9274754Z * [new branch] gh/williamwen42/266/orig -> origin/gh/williamwen42/266/orig 2025-09-07T09:36:19.9277259Z * [new branch] gh/williamwen42/267/base -> origin/gh/williamwen42/267/base 2025-09-07T09:36:19.9278837Z * [new branch] gh/williamwen42/267/head -> origin/gh/williamwen42/267/head 2025-09-07T09:36:19.9280427Z * [new branch] gh/williamwen42/267/orig -> origin/gh/williamwen42/267/orig 2025-09-07T09:36:19.9282895Z * [new branch] gh/williamwen42/270/base -> origin/gh/williamwen42/270/base 2025-09-07T09:36:19.9284634Z * [new branch] gh/williamwen42/270/head -> origin/gh/williamwen42/270/head 2025-09-07T09:36:19.9286485Z * [new branch] gh/williamwen42/270/orig -> origin/gh/williamwen42/270/orig 2025-09-07T09:36:19.9288764Z * [new branch] gh/williamwen42/271/base -> origin/gh/williamwen42/271/base 2025-09-07T09:36:19.9290240Z * [new branch] gh/williamwen42/271/head -> origin/gh/williamwen42/271/head 2025-09-07T09:36:19.9291918Z * [new branch] gh/williamwen42/271/orig -> origin/gh/williamwen42/271/orig 2025-09-07T09:36:19.9294176Z * [new branch] gh/williamwen42/272/base -> origin/gh/williamwen42/272/base 2025-09-07T09:36:19.9295801Z * [new branch] gh/williamwen42/272/head -> origin/gh/williamwen42/272/head 2025-09-07T09:36:19.9297599Z * [new branch] gh/williamwen42/272/orig -> origin/gh/williamwen42/272/orig 2025-09-07T09:36:19.9302438Z * [new branch] gh/williamwen42/274/base -> origin/gh/williamwen42/274/base 2025-09-07T09:36:19.9302681Z * [new branch] gh/williamwen42/274/head -> origin/gh/williamwen42/274/head 2025-09-07T09:36:19.9302975Z * [new branch] gh/williamwen42/274/orig -> origin/gh/williamwen42/274/orig 2025-09-07T09:36:19.9305687Z * [new branch] gh/williamwen42/275/base -> origin/gh/williamwen42/275/base 2025-09-07T09:36:19.9309298Z * [new branch] gh/williamwen42/275/head -> origin/gh/williamwen42/275/head 2025-09-07T09:36:19.9310592Z * [new branch] gh/williamwen42/276/base -> origin/gh/williamwen42/276/base 2025-09-07T09:36:19.9311103Z * [new branch] gh/williamwen42/276/head -> origin/gh/williamwen42/276/head 2025-09-07T09:36:19.9312756Z * [new branch] gh/williamwen42/276/orig -> origin/gh/williamwen42/276/orig 2025-09-07T09:36:19.9315222Z * [new branch] gh/williamwen42/277/base -> origin/gh/williamwen42/277/base 2025-09-07T09:36:19.9316831Z * [new branch] gh/williamwen42/277/head -> origin/gh/williamwen42/277/head 2025-09-07T09:36:19.9318364Z * [new branch] gh/williamwen42/277/orig -> origin/gh/williamwen42/277/orig 2025-09-07T09:36:19.9320622Z * [new branch] gh/williamwen42/278/base -> origin/gh/williamwen42/278/base 2025-09-07T09:36:19.9322264Z * [new branch] gh/williamwen42/278/head -> origin/gh/williamwen42/278/head 2025-09-07T09:36:19.9323808Z * [new branch] gh/williamwen42/278/orig -> origin/gh/williamwen42/278/orig 2025-09-07T09:36:19.9326302Z * [new branch] gh/williamwen42/279/base -> origin/gh/williamwen42/279/base 2025-09-07T09:36:19.9327772Z * [new branch] gh/williamwen42/279/head -> origin/gh/williamwen42/279/head 2025-09-07T09:36:19.9329325Z * [new branch] gh/williamwen42/279/orig -> origin/gh/williamwen42/279/orig 2025-09-07T09:36:19.9331680Z * [new branch] gh/williamwen42/280/base -> origin/gh/williamwen42/280/base 2025-09-07T09:36:19.9333429Z * [new branch] gh/williamwen42/280/head -> origin/gh/williamwen42/280/head 2025-09-07T09:36:19.9335110Z * [new branch] gh/williamwen42/280/orig -> origin/gh/williamwen42/280/orig 2025-09-07T09:36:19.9337433Z * [new branch] gh/williamwen42/281/base -> origin/gh/williamwen42/281/base 2025-09-07T09:36:19.9338831Z * [new branch] gh/williamwen42/281/head -> origin/gh/williamwen42/281/head 2025-09-07T09:36:19.9340319Z * [new branch] gh/williamwen42/281/orig -> origin/gh/williamwen42/281/orig 2025-09-07T09:36:19.9342541Z * [new branch] gh/williamwen42/282/base -> origin/gh/williamwen42/282/base 2025-09-07T09:36:19.9344168Z * [new branch] gh/williamwen42/282/head -> origin/gh/williamwen42/282/head 2025-09-07T09:36:19.9345864Z * [new branch] gh/williamwen42/282/orig -> origin/gh/williamwen42/282/orig 2025-09-07T09:36:19.9348288Z * [new branch] gh/williamwen42/283/base -> origin/gh/williamwen42/283/base 2025-09-07T09:36:19.9349949Z * [new branch] gh/williamwen42/283/head -> origin/gh/williamwen42/283/head 2025-09-07T09:36:19.9351466Z * [new branch] gh/williamwen42/283/orig -> origin/gh/williamwen42/283/orig 2025-09-07T09:36:19.9354031Z * [new branch] gh/williamwen42/284/base -> origin/gh/williamwen42/284/base 2025-09-07T09:36:19.9355774Z * [new branch] gh/williamwen42/284/head -> origin/gh/williamwen42/284/head 2025-09-07T09:36:19.9357309Z * [new branch] gh/williamwen42/284/orig -> origin/gh/williamwen42/284/orig 2025-09-07T09:36:19.9359435Z * [new branch] gh/williamwen42/285/base -> origin/gh/williamwen42/285/base 2025-09-07T09:36:19.9361013Z * [new branch] gh/williamwen42/285/head -> origin/gh/williamwen42/285/head 2025-09-07T09:36:19.9362507Z * [new branch] gh/williamwen42/285/orig -> origin/gh/williamwen42/285/orig 2025-09-07T09:36:19.9364532Z * [new branch] gh/williamwen42/286/base -> origin/gh/williamwen42/286/base 2025-09-07T09:36:19.9366360Z * [new branch] gh/williamwen42/286/head -> origin/gh/williamwen42/286/head 2025-09-07T09:36:19.9367956Z * [new branch] gh/williamwen42/286/orig -> origin/gh/williamwen42/286/orig 2025-09-07T09:36:19.9370446Z * [new branch] gh/williamwen42/287/base -> origin/gh/williamwen42/287/base 2025-09-07T09:36:19.9371923Z * [new branch] gh/williamwen42/287/head -> origin/gh/williamwen42/287/head 2025-09-07T09:36:19.9373489Z * [new branch] gh/williamwen42/287/orig -> origin/gh/williamwen42/287/orig 2025-09-07T09:36:19.9376299Z * [new branch] gh/williamwen42/288/base -> origin/gh/williamwen42/288/base 2025-09-07T09:36:19.9377781Z * [new branch] gh/williamwen42/288/head -> origin/gh/williamwen42/288/head 2025-09-07T09:36:19.9379272Z * [new branch] gh/williamwen42/288/orig -> origin/gh/williamwen42/288/orig 2025-09-07T09:36:19.9381699Z * [new branch] gh/williamwen42/289/base -> origin/gh/williamwen42/289/base 2025-09-07T09:36:19.9383408Z * [new branch] gh/williamwen42/289/head -> origin/gh/williamwen42/289/head 2025-09-07T09:36:19.9385177Z * [new branch] gh/williamwen42/289/orig -> origin/gh/williamwen42/289/orig 2025-09-07T09:36:19.9388372Z * [new branch] gh/wychi/1/base -> origin/gh/wychi/1/base 2025-09-07T09:36:19.9389855Z * [new branch] gh/wychi/1/head -> origin/gh/wychi/1/head 2025-09-07T09:36:19.9391516Z * [new branch] gh/wychi/1/orig -> origin/gh/wychi/1/orig 2025-09-07T09:36:19.9394229Z * [new branch] gh/xmfan/169/base -> origin/gh/xmfan/169/base 2025-09-07T09:36:19.9396060Z * [new branch] gh/xmfan/169/head -> origin/gh/xmfan/169/head 2025-09-07T09:36:19.9398228Z * [new branch] gh/xmfan/170/base -> origin/gh/xmfan/170/base 2025-09-07T09:36:19.9399750Z * [new branch] gh/xmfan/170/head -> origin/gh/xmfan/170/head 2025-09-07T09:36:19.9402055Z * [new branch] gh/xmfan/18/base -> origin/gh/xmfan/18/base 2025-09-07T09:36:19.9403678Z * [new branch] gh/xmfan/18/head -> origin/gh/xmfan/18/head 2025-09-07T09:36:19.9406025Z * [new branch] gh/xmfan/229/base -> origin/gh/xmfan/229/base 2025-09-07T09:36:19.9407505Z * [new branch] gh/xmfan/229/head -> origin/gh/xmfan/229/head 2025-09-07T09:36:19.9409031Z * [new branch] gh/xmfan/229/orig -> origin/gh/xmfan/229/orig 2025-09-07T09:36:19.9411287Z * [new branch] gh/xmfan/237/base -> origin/gh/xmfan/237/base 2025-09-07T09:36:19.9412749Z * [new branch] gh/xmfan/237/head -> origin/gh/xmfan/237/head 2025-09-07T09:36:19.9414170Z * [new branch] gh/xmfan/237/orig -> origin/gh/xmfan/237/orig 2025-09-07T09:36:19.9416851Z * [new branch] gh/xmfan/244/base -> origin/gh/xmfan/244/base 2025-09-07T09:36:19.9418236Z * [new branch] gh/xmfan/244/head -> origin/gh/xmfan/244/head 2025-09-07T09:36:19.9419816Z * [new branch] gh/xmfan/244/orig -> origin/gh/xmfan/244/orig 2025-09-07T09:36:19.9422188Z * [new branch] gh/xmfan/246/base -> origin/gh/xmfan/246/base 2025-09-07T09:36:19.9423839Z * [new branch] gh/xmfan/246/head -> origin/gh/xmfan/246/head 2025-09-07T09:36:19.9425379Z * [new branch] gh/xmfan/246/orig -> origin/gh/xmfan/246/orig 2025-09-07T09:36:19.9427651Z * [new branch] gh/xmfan/253/base -> origin/gh/xmfan/253/base 2025-09-07T09:36:19.9429195Z * [new branch] gh/xmfan/253/head -> origin/gh/xmfan/253/head 2025-09-07T09:36:19.9430820Z * [new branch] gh/xmfan/253/orig -> origin/gh/xmfan/253/orig 2025-09-07T09:36:19.9432913Z * [new branch] gh/xmfan/254/base -> origin/gh/xmfan/254/base 2025-09-07T09:36:19.9434439Z * [new branch] gh/xmfan/254/head -> origin/gh/xmfan/254/head 2025-09-07T09:36:19.9436294Z * [new branch] gh/xmfan/254/orig -> origin/gh/xmfan/254/orig 2025-09-07T09:36:19.9438666Z * [new branch] gh/xmfan/260/base -> origin/gh/xmfan/260/base 2025-09-07T09:36:19.9440078Z * [new branch] gh/xmfan/260/head -> origin/gh/xmfan/260/head 2025-09-07T09:36:19.9441586Z * [new branch] gh/xmfan/260/orig -> origin/gh/xmfan/260/orig 2025-09-07T09:36:19.9443846Z * [new branch] gh/xmfan/262/base -> origin/gh/xmfan/262/base 2025-09-07T09:36:19.9445415Z * [new branch] gh/xmfan/262/head -> origin/gh/xmfan/262/head 2025-09-07T09:36:19.9447231Z * [new branch] gh/xmfan/262/orig -> origin/gh/xmfan/262/orig 2025-09-07T09:36:19.9449439Z * [new branch] gh/xmfan/263/base -> origin/gh/xmfan/263/base 2025-09-07T09:36:19.9451084Z * [new branch] gh/xmfan/263/head -> origin/gh/xmfan/263/head 2025-09-07T09:36:19.9452514Z * [new branch] gh/xmfan/263/orig -> origin/gh/xmfan/263/orig 2025-09-07T09:36:19.9454671Z * [new branch] gh/xmfan/264/base -> origin/gh/xmfan/264/base 2025-09-07T09:36:19.9456405Z * [new branch] gh/xmfan/264/head -> origin/gh/xmfan/264/head 2025-09-07T09:36:19.9458026Z * [new branch] gh/xmfan/264/orig -> origin/gh/xmfan/264/orig 2025-09-07T09:36:19.9460289Z * [new branch] gh/xmfan/274/base -> origin/gh/xmfan/274/base 2025-09-07T09:36:19.9461944Z * [new branch] gh/xmfan/274/head -> origin/gh/xmfan/274/head 2025-09-07T09:36:19.9463505Z * [new branch] gh/xmfan/274/orig -> origin/gh/xmfan/274/orig 2025-09-07T09:36:19.9466034Z * [new branch] gh/xmfan/276/base -> origin/gh/xmfan/276/base 2025-09-07T09:36:19.9467626Z * [new branch] gh/xmfan/276/head -> origin/gh/xmfan/276/head 2025-09-07T09:36:19.9469199Z * [new branch] gh/xmfan/276/orig -> origin/gh/xmfan/276/orig 2025-09-07T09:36:19.9471378Z * [new branch] gh/xmfan/277/base -> origin/gh/xmfan/277/base 2025-09-07T09:36:19.9472922Z * [new branch] gh/xmfan/277/head -> origin/gh/xmfan/277/head 2025-09-07T09:36:19.9474330Z * [new branch] gh/xmfan/277/orig -> origin/gh/xmfan/277/orig 2025-09-07T09:36:19.9476936Z * [new branch] gh/xmfan/278/base -> origin/gh/xmfan/278/base 2025-09-07T09:36:19.9478400Z * [new branch] gh/xmfan/278/head -> origin/gh/xmfan/278/head 2025-09-07T09:36:19.9479948Z * [new branch] gh/xmfan/278/orig -> origin/gh/xmfan/278/orig 2025-09-07T09:36:19.9482131Z * [new branch] gh/xmfan/279/base -> origin/gh/xmfan/279/base 2025-09-07T09:36:19.9483654Z * [new branch] gh/xmfan/279/head -> origin/gh/xmfan/279/head 2025-09-07T09:36:19.9485314Z * [new branch] gh/xmfan/279/orig -> origin/gh/xmfan/279/orig 2025-09-07T09:36:19.9487575Z * [new branch] gh/xmfan/280/base -> origin/gh/xmfan/280/base 2025-09-07T09:36:19.9489189Z * [new branch] gh/xmfan/280/head -> origin/gh/xmfan/280/head 2025-09-07T09:36:19.9490699Z * [new branch] gh/xmfan/280/orig -> origin/gh/xmfan/280/orig 2025-09-07T09:36:19.9492859Z * [new branch] gh/xmfan/281/base -> origin/gh/xmfan/281/base 2025-09-07T09:36:19.9494552Z * [new branch] gh/xmfan/281/head -> origin/gh/xmfan/281/head 2025-09-07T09:36:19.9496415Z * [new branch] gh/xmfan/281/orig -> origin/gh/xmfan/281/orig 2025-09-07T09:36:19.9498644Z * [new branch] gh/xmfan/282/base -> origin/gh/xmfan/282/base 2025-09-07T09:36:19.9500212Z * [new branch] gh/xmfan/282/head -> origin/gh/xmfan/282/head 2025-09-07T09:36:19.9502560Z * [new branch] gh/xmfan/283/base -> origin/gh/xmfan/283/base 2025-09-07T09:36:19.9504336Z * [new branch] gh/xmfan/283/head -> origin/gh/xmfan/283/head 2025-09-07T09:36:19.9505980Z * [new branch] gh/xmfan/283/orig -> origin/gh/xmfan/283/orig 2025-09-07T09:36:19.9508714Z * [new branch] gh/xuanzhang816/14/base -> origin/gh/xuanzhang816/14/base 2025-09-07T09:36:19.9513635Z * [new branch] gh/xuanzhang816/14/head -> origin/gh/xuanzhang816/14/head 2025-09-07T09:36:19.9515302Z * [new branch] gh/xuanzhang816/14/orig -> origin/gh/xuanzhang816/14/orig 2025-09-07T09:36:19.9517748Z * [new branch] gh/xuanzhang816/19/base -> origin/gh/xuanzhang816/19/base 2025-09-07T09:36:19.9519362Z * [new branch] gh/xuanzhang816/19/head -> origin/gh/xuanzhang816/19/head 2025-09-07T09:36:19.9520845Z * [new branch] gh/xuanzhang816/19/orig -> origin/gh/xuanzhang816/19/orig 2025-09-07T09:36:19.9523039Z * [new branch] gh/xuanzhang816/22/base -> origin/gh/xuanzhang816/22/base 2025-09-07T09:36:19.9524553Z * [new branch] gh/xuanzhang816/22/head -> origin/gh/xuanzhang816/22/head 2025-09-07T09:36:19.9526350Z * [new branch] gh/xuanzhang816/22/orig -> origin/gh/xuanzhang816/22/orig 2025-09-07T09:36:19.9528680Z * [new branch] gh/xuanzhang816/23/base -> origin/gh/xuanzhang816/23/base 2025-09-07T09:36:19.9530099Z * [new branch] gh/xuanzhang816/23/head -> origin/gh/xuanzhang816/23/head 2025-09-07T09:36:19.9531626Z * [new branch] gh/xuanzhang816/23/orig -> origin/gh/xuanzhang816/23/orig 2025-09-07T09:36:19.9533708Z * [new branch] gh/xuanzhang816/24/base -> origin/gh/xuanzhang816/24/base 2025-09-07T09:36:19.9535545Z * [new branch] gh/xuanzhang816/24/head -> origin/gh/xuanzhang816/24/head 2025-09-07T09:36:19.9537218Z * [new branch] gh/xuanzhang816/24/orig -> origin/gh/xuanzhang816/24/orig 2025-09-07T09:36:19.9539342Z * [new branch] gh/xuanzhang816/25/base -> origin/gh/xuanzhang816/25/base 2025-09-07T09:36:19.9540869Z * [new branch] gh/xuanzhang816/25/head -> origin/gh/xuanzhang816/25/head 2025-09-07T09:36:19.9542495Z * [new branch] gh/xuanzhang816/25/orig -> origin/gh/xuanzhang816/25/orig 2025-09-07T09:36:19.9544659Z * [new branch] gh/xuanzhang816/26/base -> origin/gh/xuanzhang816/26/base 2025-09-07T09:36:19.9546590Z * [new branch] gh/xuanzhang816/26/head -> origin/gh/xuanzhang816/26/head 2025-09-07T09:36:19.9548170Z * [new branch] gh/xuanzhang816/26/orig -> origin/gh/xuanzhang816/26/orig 2025-09-07T09:36:19.9551084Z * [new branch] gh/yanbing-j/11/base -> origin/gh/yanbing-j/11/base 2025-09-07T09:36:19.9552544Z * [new branch] gh/yanbing-j/11/head -> origin/gh/yanbing-j/11/head 2025-09-07T09:36:19.9554109Z * [new branch] gh/yanbing-j/11/orig -> origin/gh/yanbing-j/11/orig 2025-09-07T09:36:19.9556697Z * [new branch] gh/yanbing-j/12/base -> origin/gh/yanbing-j/12/base 2025-09-07T09:36:19.9558105Z * [new branch] gh/yanbing-j/12/head -> origin/gh/yanbing-j/12/head 2025-09-07T09:36:19.9559634Z * [new branch] gh/yanbing-j/12/orig -> origin/gh/yanbing-j/12/orig 2025-09-07T09:36:19.9561978Z * [new branch] gh/yanbing-j/13/base -> origin/gh/yanbing-j/13/base 2025-09-07T09:36:19.9563527Z * [new branch] gh/yanbing-j/13/head -> origin/gh/yanbing-j/13/head 2025-09-07T09:36:19.9565268Z * [new branch] gh/yanbing-j/13/orig -> origin/gh/yanbing-j/13/orig 2025-09-07T09:36:19.9567573Z * [new branch] gh/yanbing-j/14/base -> origin/gh/yanbing-j/14/base 2025-09-07T09:36:19.9569092Z * [new branch] gh/yanbing-j/14/head -> origin/gh/yanbing-j/14/head 2025-09-07T09:36:19.9570842Z * [new branch] gh/yanbing-j/14/orig -> origin/gh/yanbing-j/14/orig 2025-09-07T09:36:19.9572959Z * [new branch] gh/yanbing-j/15/base -> origin/gh/yanbing-j/15/base 2025-09-07T09:36:19.9574454Z * [new branch] gh/yanbing-j/15/head -> origin/gh/yanbing-j/15/head 2025-09-07T09:36:19.9576185Z * [new branch] gh/yanbing-j/15/orig -> origin/gh/yanbing-j/15/orig 2025-09-07T09:36:19.9578339Z * [new branch] gh/yanbing-j/18/base -> origin/gh/yanbing-j/18/base 2025-09-07T09:36:19.9579835Z * [new branch] gh/yanbing-j/18/head -> origin/gh/yanbing-j/18/head 2025-09-07T09:36:19.9581397Z * [new branch] gh/yanbing-j/18/orig -> origin/gh/yanbing-j/18/orig 2025-09-07T09:36:19.9583722Z * [new branch] gh/yanbing-j/19/base -> origin/gh/yanbing-j/19/base 2025-09-07T09:36:19.9585450Z * [new branch] gh/yanbing-j/19/head -> origin/gh/yanbing-j/19/head 2025-09-07T09:36:19.9587130Z * [new branch] gh/yanbing-j/19/orig -> origin/gh/yanbing-j/19/orig 2025-09-07T09:36:19.9589463Z * [new branch] gh/yanbing-j/20/base -> origin/gh/yanbing-j/20/base 2025-09-07T09:36:19.9590925Z * [new branch] gh/yanbing-j/20/head -> origin/gh/yanbing-j/20/head 2025-09-07T09:36:19.9592437Z * [new branch] gh/yanbing-j/20/orig -> origin/gh/yanbing-j/20/orig 2025-09-07T09:36:19.9594661Z * [new branch] gh/yanbing-j/21/base -> origin/gh/yanbing-j/21/base 2025-09-07T09:36:19.9596583Z * [new branch] gh/yanbing-j/21/head -> origin/gh/yanbing-j/21/head 2025-09-07T09:36:19.9598710Z * [new branch] gh/yanbing-j/22/base -> origin/gh/yanbing-j/22/base 2025-09-07T09:36:19.9600238Z * [new branch] gh/yanbing-j/22/head -> origin/gh/yanbing-j/22/head 2025-09-07T09:36:19.9601866Z * [new branch] gh/yanbing-j/22/orig -> origin/gh/yanbing-j/22/orig 2025-09-07T09:36:19.9603956Z * [new branch] gh/yanbing-j/23/base -> origin/gh/yanbing-j/23/base 2025-09-07T09:36:19.9605750Z * [new branch] gh/yanbing-j/23/head -> origin/gh/yanbing-j/23/head 2025-09-07T09:36:19.9607282Z * [new branch] gh/yanbing-j/23/orig -> origin/gh/yanbing-j/23/orig 2025-09-07T09:36:19.9609602Z * [new branch] gh/yanbing-j/24/base -> origin/gh/yanbing-j/24/base 2025-09-07T09:36:19.9611157Z * [new branch] gh/yanbing-j/24/head -> origin/gh/yanbing-j/24/head 2025-09-07T09:36:19.9612607Z * [new branch] gh/yanbing-j/24/orig -> origin/gh/yanbing-j/24/orig 2025-09-07T09:36:19.9614794Z * [new branch] gh/yanbing-j/25/base -> origin/gh/yanbing-j/25/base 2025-09-07T09:36:19.9616694Z * [new branch] gh/yanbing-j/25/head -> origin/gh/yanbing-j/25/head 2025-09-07T09:36:19.9618043Z * [new branch] gh/yanbing-j/25/orig -> origin/gh/yanbing-j/25/orig 2025-09-07T09:36:19.9620230Z * [new branch] gh/yanbing-j/26/base -> origin/gh/yanbing-j/26/base 2025-09-07T09:36:19.9621908Z * [new branch] gh/yanbing-j/26/head -> origin/gh/yanbing-j/26/head 2025-09-07T09:36:19.9623611Z * [new branch] gh/yanbing-j/26/orig -> origin/gh/yanbing-j/26/orig 2025-09-07T09:36:19.9626138Z * [new branch] gh/yanbing-j/36/base -> origin/gh/yanbing-j/36/base 2025-09-07T09:36:19.9627678Z * [new branch] gh/yanbing-j/36/head -> origin/gh/yanbing-j/36/head 2025-09-07T09:36:19.9629193Z * [new branch] gh/yanbing-j/36/orig -> origin/gh/yanbing-j/36/orig 2025-09-07T09:36:19.9631412Z * [new branch] gh/yanbing-j/37/base -> origin/gh/yanbing-j/37/base 2025-09-07T09:36:19.9632956Z * [new branch] gh/yanbing-j/37/head -> origin/gh/yanbing-j/37/head 2025-09-07T09:36:19.9634714Z * [new branch] gh/yanbing-j/37/orig -> origin/gh/yanbing-j/37/orig 2025-09-07T09:36:19.9637705Z * [new branch] gh/yangw-dev/12/base -> origin/gh/yangw-dev/12/base 2025-09-07T09:36:19.9639003Z * [new branch] gh/yangw-dev/12/head -> origin/gh/yangw-dev/12/head 2025-09-07T09:36:19.9640480Z * [new branch] gh/yangw-dev/12/orig -> origin/gh/yangw-dev/12/orig 2025-09-07T09:36:19.9652718Z * [new branch] gh/yangw-dev/13/base -> origin/gh/yangw-dev/13/base 2025-09-07T09:36:19.9653029Z * [new branch] gh/yangw-dev/13/head -> origin/gh/yangw-dev/13/head 2025-09-07T09:36:19.9653217Z * [new branch] gh/yangw-dev/13/orig -> origin/gh/yangw-dev/13/orig 2025-09-07T09:36:19.9653403Z * [new branch] gh/yangw-dev/14/base -> origin/gh/yangw-dev/14/base 2025-09-07T09:36:19.9653569Z * [new branch] gh/yangw-dev/14/head -> origin/gh/yangw-dev/14/head 2025-09-07T09:36:19.9653745Z * [new branch] gh/yangw-dev/14/orig -> origin/gh/yangw-dev/14/orig 2025-09-07T09:36:19.9653912Z * [new branch] gh/yangw-dev/15/base -> origin/gh/yangw-dev/15/base 2025-09-07T09:36:19.9655520Z * [new branch] gh/yangw-dev/15/head -> origin/gh/yangw-dev/15/head 2025-09-07T09:36:19.9657578Z * [new branch] gh/yangw-dev/15/orig -> origin/gh/yangw-dev/15/orig 2025-09-07T09:36:19.9659653Z * [new branch] gh/yangw-dev/16/base -> origin/gh/yangw-dev/16/base 2025-09-07T09:36:19.9661151Z * [new branch] gh/yangw-dev/16/head -> origin/gh/yangw-dev/16/head 2025-09-07T09:36:19.9662917Z * [new branch] gh/yangw-dev/16/orig -> origin/gh/yangw-dev/16/orig 2025-09-07T09:36:19.9665551Z * [new branch] gh/yangw-dev/17/base -> origin/gh/yangw-dev/17/base 2025-09-07T09:36:19.9667134Z * [new branch] gh/yangw-dev/17/head -> origin/gh/yangw-dev/17/head 2025-09-07T09:36:19.9668634Z * [new branch] gh/yangw-dev/17/orig -> origin/gh/yangw-dev/17/orig 2025-09-07T09:36:19.9670614Z * [new branch] gh/yangw-dev/18/base -> origin/gh/yangw-dev/18/base 2025-09-07T09:36:19.9672286Z * [new branch] gh/yangw-dev/18/head -> origin/gh/yangw-dev/18/head 2025-09-07T09:36:19.9673646Z * [new branch] gh/yangw-dev/18/orig -> origin/gh/yangw-dev/18/orig 2025-09-07T09:36:19.9676123Z * [new branch] gh/yangw-dev/19/base -> origin/gh/yangw-dev/19/base 2025-09-07T09:36:19.9677828Z * [new branch] gh/yangw-dev/19/head -> origin/gh/yangw-dev/19/head 2025-09-07T09:36:19.9679426Z * [new branch] gh/yangw-dev/19/orig -> origin/gh/yangw-dev/19/orig 2025-09-07T09:36:19.9681627Z * [new branch] gh/yangw-dev/20/base -> origin/gh/yangw-dev/20/base 2025-09-07T09:36:19.9683092Z * [new branch] gh/yangw-dev/20/head -> origin/gh/yangw-dev/20/head 2025-09-07T09:36:19.9684652Z * [new branch] gh/yangw-dev/20/orig -> origin/gh/yangw-dev/20/orig 2025-09-07T09:36:19.9687273Z * [new branch] gh/yangw-dev/21/base -> origin/gh/yangw-dev/21/base 2025-09-07T09:36:19.9688594Z * [new branch] gh/yangw-dev/21/head -> origin/gh/yangw-dev/21/head 2025-09-07T09:36:19.9690188Z * [new branch] gh/yangw-dev/21/orig -> origin/gh/yangw-dev/21/orig 2025-09-07T09:36:19.9692340Z * [new branch] gh/yangw-dev/22/base -> origin/gh/yangw-dev/22/base 2025-09-07T09:36:19.9693840Z * [new branch] gh/yangw-dev/22/head -> origin/gh/yangw-dev/22/head 2025-09-07T09:36:19.9695531Z * [new branch] gh/yangw-dev/22/orig -> origin/gh/yangw-dev/22/orig 2025-09-07T09:36:19.9697874Z * [new branch] gh/yangw-dev/23/base -> origin/gh/yangw-dev/23/base 2025-09-07T09:36:19.9699580Z * [new branch] gh/yangw-dev/23/head -> origin/gh/yangw-dev/23/head 2025-09-07T09:36:19.9700990Z * [new branch] gh/yangw-dev/23/orig -> origin/gh/yangw-dev/23/orig 2025-09-07T09:36:19.9703147Z * [new branch] gh/yangw-dev/24/base -> origin/gh/yangw-dev/24/base 2025-09-07T09:36:19.9704801Z * [new branch] gh/yangw-dev/24/head -> origin/gh/yangw-dev/24/head 2025-09-07T09:36:19.9706504Z * [new branch] gh/yangw-dev/24/orig -> origin/gh/yangw-dev/24/orig 2025-09-07T09:36:19.9708756Z * [new branch] gh/yangw-dev/25/base -> origin/gh/yangw-dev/25/base 2025-09-07T09:36:19.9710284Z * [new branch] gh/yangw-dev/25/head -> origin/gh/yangw-dev/25/head 2025-09-07T09:36:19.9711829Z * [new branch] gh/yangw-dev/25/orig -> origin/gh/yangw-dev/25/orig 2025-09-07T09:36:19.9714136Z * [new branch] gh/yangw-dev/26/base -> origin/gh/yangw-dev/26/base 2025-09-07T09:36:19.9715988Z * [new branch] gh/yangw-dev/26/head -> origin/gh/yangw-dev/26/head 2025-09-07T09:36:19.9717549Z * [new branch] gh/yangw-dev/26/orig -> origin/gh/yangw-dev/26/orig 2025-09-07T09:36:19.9719847Z * [new branch] gh/yangw-dev/27/base -> origin/gh/yangw-dev/27/base 2025-09-07T09:36:19.9721381Z * [new branch] gh/yangw-dev/27/head -> origin/gh/yangw-dev/27/head 2025-09-07T09:36:19.9722955Z * [new branch] gh/yangw-dev/27/orig -> origin/gh/yangw-dev/27/orig 2025-09-07T09:36:19.9726079Z * [new branch] gh/ydwu4/233/base -> origin/gh/ydwu4/233/base 2025-09-07T09:36:19.9727615Z * [new branch] gh/ydwu4/233/head -> origin/gh/ydwu4/233/head 2025-09-07T09:36:19.9729121Z * [new branch] gh/ydwu4/233/orig -> origin/gh/ydwu4/233/orig 2025-09-07T09:36:19.9731533Z * [new branch] gh/ydwu4/246/base -> origin/gh/ydwu4/246/base 2025-09-07T09:36:19.9733026Z * [new branch] gh/ydwu4/246/head -> origin/gh/ydwu4/246/head 2025-09-07T09:36:19.9734599Z * [new branch] gh/ydwu4/246/orig -> origin/gh/ydwu4/246/orig 2025-09-07T09:36:19.9737231Z * [new branch] gh/ydwu4/253/base -> origin/gh/ydwu4/253/base 2025-09-07T09:36:19.9738846Z * [new branch] gh/ydwu4/253/head -> origin/gh/ydwu4/253/head 2025-09-07T09:36:19.9740379Z * [new branch] gh/ydwu4/253/orig -> origin/gh/ydwu4/253/orig 2025-09-07T09:36:19.9742749Z * [new branch] gh/ydwu4/255/base -> origin/gh/ydwu4/255/base 2025-09-07T09:36:19.9744335Z * [new branch] gh/ydwu4/255/head -> origin/gh/ydwu4/255/head 2025-09-07T09:36:19.9746095Z * [new branch] gh/ydwu4/255/orig -> origin/gh/ydwu4/255/orig 2025-09-07T09:36:19.9748377Z * [new branch] gh/ydwu4/259/base -> origin/gh/ydwu4/259/base 2025-09-07T09:36:19.9750147Z * [new branch] gh/ydwu4/259/head -> origin/gh/ydwu4/259/head 2025-09-07T09:36:19.9751692Z * [new branch] gh/ydwu4/259/orig -> origin/gh/ydwu4/259/orig 2025-09-07T09:36:19.9753892Z * [new branch] gh/ydwu4/262/base -> origin/gh/ydwu4/262/base 2025-09-07T09:36:19.9755792Z * [new branch] gh/ydwu4/262/head -> origin/gh/ydwu4/262/head 2025-09-07T09:36:19.9757351Z * [new branch] gh/ydwu4/262/orig -> origin/gh/ydwu4/262/orig 2025-09-07T09:36:19.9759573Z * [new branch] gh/ydwu4/263/base -> origin/gh/ydwu4/263/base 2025-09-07T09:36:19.9761058Z * [new branch] gh/ydwu4/263/head -> origin/gh/ydwu4/263/head 2025-09-07T09:36:19.9762566Z * [new branch] gh/ydwu4/263/orig -> origin/gh/ydwu4/263/orig 2025-09-07T09:36:19.9764911Z * [new branch] gh/ydwu4/269/base -> origin/gh/ydwu4/269/base 2025-09-07T09:36:19.9766901Z * [new branch] gh/ydwu4/269/head -> origin/gh/ydwu4/269/head 2025-09-07T09:36:19.9768244Z * [new branch] gh/ydwu4/269/orig -> origin/gh/ydwu4/269/orig 2025-09-07T09:36:19.9770412Z * [new branch] gh/ydwu4/270/base -> origin/gh/ydwu4/270/base 2025-09-07T09:36:19.9772039Z * [new branch] gh/ydwu4/270/head -> origin/gh/ydwu4/270/head 2025-09-07T09:36:19.9773692Z * [new branch] gh/ydwu4/270/orig -> origin/gh/ydwu4/270/orig 2025-09-07T09:36:19.9777567Z * [new branch] gh/ydwu4/272/base -> origin/gh/ydwu4/272/base 2025-09-07T09:36:19.9779167Z * [new branch] gh/ydwu4/272/head -> origin/gh/ydwu4/272/head 2025-09-07T09:36:19.9780688Z * [new branch] gh/ydwu4/272/orig -> origin/gh/ydwu4/272/orig 2025-09-07T09:36:19.9782960Z * [new branch] gh/ydwu4/275/base -> origin/gh/ydwu4/275/base 2025-09-07T09:36:19.9784405Z * [new branch] gh/ydwu4/275/head -> origin/gh/ydwu4/275/head 2025-09-07T09:36:19.9786271Z * [new branch] gh/ydwu4/275/orig -> origin/gh/ydwu4/275/orig 2025-09-07T09:36:19.9788330Z * [new branch] gh/ydwu4/276/base -> origin/gh/ydwu4/276/base 2025-09-07T09:36:19.9789858Z * [new branch] gh/ydwu4/276/head -> origin/gh/ydwu4/276/head 2025-09-07T09:36:19.9791339Z * [new branch] gh/ydwu4/276/orig -> origin/gh/ydwu4/276/orig 2025-09-07T09:36:19.9793752Z * [new branch] gh/ydwu4/279/base -> origin/gh/ydwu4/279/base 2025-09-07T09:36:19.9795653Z * [new branch] gh/ydwu4/279/head -> origin/gh/ydwu4/279/head 2025-09-07T09:36:19.9797273Z * [new branch] gh/ydwu4/279/orig -> origin/gh/ydwu4/279/orig 2025-09-07T09:36:19.9799726Z * [new branch] gh/ydwu4/283/base -> origin/gh/ydwu4/283/base 2025-09-07T09:36:19.9801308Z * [new branch] gh/ydwu4/283/head -> origin/gh/ydwu4/283/head 2025-09-07T09:36:19.9802782Z * [new branch] gh/ydwu4/283/orig -> origin/gh/ydwu4/283/orig 2025-09-07T09:36:19.9805134Z * [new branch] gh/ydwu4/289/base -> origin/gh/ydwu4/289/base 2025-09-07T09:36:19.9806883Z * [new branch] gh/ydwu4/289/head -> origin/gh/ydwu4/289/head 2025-09-07T09:36:19.9808333Z * [new branch] gh/ydwu4/289/orig -> origin/gh/ydwu4/289/orig 2025-09-07T09:36:19.9810576Z * [new branch] gh/ydwu4/290/base -> origin/gh/ydwu4/290/base 2025-09-07T09:36:19.9812230Z * [new branch] gh/ydwu4/290/head -> origin/gh/ydwu4/290/head 2025-09-07T09:36:19.9813923Z * [new branch] gh/ydwu4/290/orig -> origin/gh/ydwu4/290/orig 2025-09-07T09:36:19.9816439Z * [new branch] gh/ydwu4/291/base -> origin/gh/ydwu4/291/base 2025-09-07T09:36:19.9817989Z * [new branch] gh/ydwu4/291/head -> origin/gh/ydwu4/291/head 2025-09-07T09:36:19.9819578Z * [new branch] gh/ydwu4/291/orig -> origin/gh/ydwu4/291/orig 2025-09-07T09:36:19.9822033Z * [new branch] gh/ydwu4/292/base -> origin/gh/ydwu4/292/base 2025-09-07T09:36:19.9823489Z * [new branch] gh/ydwu4/292/head -> origin/gh/ydwu4/292/head 2025-09-07T09:36:19.9825169Z * [new branch] gh/ydwu4/292/orig -> origin/gh/ydwu4/292/orig 2025-09-07T09:36:19.9827422Z * [new branch] gh/ydwu4/293/base -> origin/gh/ydwu4/293/base 2025-09-07T09:36:19.9828890Z * [new branch] gh/ydwu4/293/head -> origin/gh/ydwu4/293/head 2025-09-07T09:36:19.9830480Z * [new branch] gh/ydwu4/293/orig -> origin/gh/ydwu4/293/orig 2025-09-07T09:36:19.9832785Z * [new branch] gh/ydwu4/294/base -> origin/gh/ydwu4/294/base 2025-09-07T09:36:19.9834511Z * [new branch] gh/ydwu4/294/head -> origin/gh/ydwu4/294/head 2025-09-07T09:36:19.9836190Z * [new branch] gh/ydwu4/294/orig -> origin/gh/ydwu4/294/orig 2025-09-07T09:36:19.9838381Z * [new branch] gh/ydwu4/295/base -> origin/gh/ydwu4/295/base 2025-09-07T09:36:19.9839979Z * [new branch] gh/ydwu4/295/head -> origin/gh/ydwu4/295/head 2025-09-07T09:36:19.9841481Z * [new branch] gh/ydwu4/295/orig -> origin/gh/ydwu4/295/orig 2025-09-07T09:36:19.9843841Z * [new branch] gh/ydwu4/296/base -> origin/gh/ydwu4/296/base 2025-09-07T09:36:19.9845652Z * [new branch] gh/ydwu4/296/head -> origin/gh/ydwu4/296/head 2025-09-07T09:36:19.9847306Z * [new branch] gh/ydwu4/296/orig -> origin/gh/ydwu4/296/orig 2025-09-07T09:36:19.9850291Z * [new branch] gh/ydwu4/300/base -> origin/gh/ydwu4/300/base 2025-09-07T09:36:19.9852293Z * [new branch] gh/ydwu4/300/head -> origin/gh/ydwu4/300/head 2025-09-07T09:36:19.9853965Z * [new branch] gh/ydwu4/300/orig -> origin/gh/ydwu4/300/orig 2025-09-07T09:36:19.9857075Z * [new branch] gh/ydwu4/301/base -> origin/gh/ydwu4/301/base 2025-09-07T09:36:19.9858543Z * [new branch] gh/ydwu4/301/head -> origin/gh/ydwu4/301/head 2025-09-07T09:36:19.9860580Z * [new branch] gh/ydwu4/301/orig -> origin/gh/ydwu4/301/orig 2025-09-07T09:36:19.9863005Z * [new branch] gh/ydwu4/302/base -> origin/gh/ydwu4/302/base 2025-09-07T09:36:19.9864638Z * [new branch] gh/ydwu4/302/head -> origin/gh/ydwu4/302/head 2025-09-07T09:36:19.9866412Z * [new branch] gh/ydwu4/302/orig -> origin/gh/ydwu4/302/orig 2025-09-07T09:36:19.9868422Z * [new branch] gh/ydwu4/303/base -> origin/gh/ydwu4/303/base 2025-09-07T09:36:19.9869955Z * [new branch] gh/ydwu4/303/head -> origin/gh/ydwu4/303/head 2025-09-07T09:36:19.9871746Z * [new branch] gh/ydwu4/303/orig -> origin/gh/ydwu4/303/orig 2025-09-07T09:36:19.9873758Z * [new branch] gh/ydwu4/304/base -> origin/gh/ydwu4/304/base 2025-09-07T09:36:19.9875573Z * [new branch] gh/ydwu4/304/head -> origin/gh/ydwu4/304/head 2025-09-07T09:36:19.9877189Z * [new branch] gh/ydwu4/304/orig -> origin/gh/ydwu4/304/orig 2025-09-07T09:36:19.9879536Z * [new branch] gh/ydwu4/305/base -> origin/gh/ydwu4/305/base 2025-09-07T09:36:19.9881112Z * [new branch] gh/ydwu4/305/head -> origin/gh/ydwu4/305/head 2025-09-07T09:36:19.9882666Z * [new branch] gh/ydwu4/305/orig -> origin/gh/ydwu4/305/orig 2025-09-07T09:36:19.9884922Z * [new branch] gh/ydwu4/306/base -> origin/gh/ydwu4/306/base 2025-09-07T09:36:19.9886904Z * [new branch] gh/ydwu4/306/head -> origin/gh/ydwu4/306/head 2025-09-07T09:36:19.9888485Z * [new branch] gh/ydwu4/306/orig -> origin/gh/ydwu4/306/orig 2025-09-07T09:36:19.9890657Z * [new branch] gh/ydwu4/307/base -> origin/gh/ydwu4/307/base 2025-09-07T09:36:19.9892151Z * [new branch] gh/ydwu4/307/head -> origin/gh/ydwu4/307/head 2025-09-07T09:36:19.9893749Z * [new branch] gh/ydwu4/307/orig -> origin/gh/ydwu4/307/orig 2025-09-07T09:36:19.9896309Z * [new branch] gh/ydwu4/308/base -> origin/gh/ydwu4/308/base 2025-09-07T09:36:19.9897917Z * [new branch] gh/ydwu4/308/head -> origin/gh/ydwu4/308/head 2025-09-07T09:36:19.9899356Z * [new branch] gh/ydwu4/308/orig -> origin/gh/ydwu4/308/orig 2025-09-07T09:36:19.9901617Z * [new branch] gh/ydwu4/309/base -> origin/gh/ydwu4/309/base 2025-09-07T09:36:19.9903368Z * [new branch] gh/ydwu4/309/head -> origin/gh/ydwu4/309/head 2025-09-07T09:36:19.9904795Z * [new branch] gh/ydwu4/309/orig -> origin/gh/ydwu4/309/orig 2025-09-07T09:36:19.9907399Z * [new branch] gh/ydwu4/310/base -> origin/gh/ydwu4/310/base 2025-09-07T09:36:19.9909202Z * [new branch] gh/ydwu4/310/head -> origin/gh/ydwu4/310/head 2025-09-07T09:36:19.9910686Z * [new branch] gh/ydwu4/310/orig -> origin/gh/ydwu4/310/orig 2025-09-07T09:36:19.9912849Z * [new branch] gh/ydwu4/311/base -> origin/gh/ydwu4/311/base 2025-09-07T09:36:19.9914427Z * [new branch] gh/ydwu4/311/head -> origin/gh/ydwu4/311/head 2025-09-07T09:36:19.9916314Z * [new branch] gh/ydwu4/311/orig -> origin/gh/ydwu4/311/orig 2025-09-07T09:36:19.9918516Z * [new branch] gh/ydwu4/312/base -> origin/gh/ydwu4/312/base 2025-09-07T09:36:19.9920058Z * [new branch] gh/ydwu4/312/head -> origin/gh/ydwu4/312/head 2025-09-07T09:36:19.9921579Z * [new branch] gh/ydwu4/312/orig -> origin/gh/ydwu4/312/orig 2025-09-07T09:36:19.9923996Z * [new branch] gh/ydwu4/313/base -> origin/gh/ydwu4/313/base 2025-09-07T09:36:19.9925997Z * [new branch] gh/ydwu4/313/head -> origin/gh/ydwu4/313/head 2025-09-07T09:36:19.9927745Z * [new branch] gh/ydwu4/313/orig -> origin/gh/ydwu4/313/orig 2025-09-07T09:36:19.9930035Z * [new branch] gh/ydwu4/314/base -> origin/gh/ydwu4/314/base 2025-09-07T09:36:19.9931713Z * [new branch] gh/ydwu4/314/head -> origin/gh/ydwu4/314/head 2025-09-07T09:36:19.9933284Z * [new branch] gh/ydwu4/314/orig -> origin/gh/ydwu4/314/orig 2025-09-07T09:36:19.9938367Z * [new branch] gh/ydwu4/315/base -> origin/gh/ydwu4/315/base 2025-09-07T09:36:19.9939850Z * [new branch] gh/ydwu4/315/head -> origin/gh/ydwu4/315/head 2025-09-07T09:36:19.9941696Z * [new branch] gh/ydwu4/315/orig -> origin/gh/ydwu4/315/orig 2025-09-07T09:36:19.9944128Z * [new branch] gh/ydwu4/316/base -> origin/gh/ydwu4/316/base 2025-09-07T09:36:19.9946070Z * [new branch] gh/ydwu4/316/head -> origin/gh/ydwu4/316/head 2025-09-07T09:36:19.9947674Z * [new branch] gh/ydwu4/316/orig -> origin/gh/ydwu4/316/orig 2025-09-07T09:36:19.9950103Z * [new branch] gh/ydwu4/317/base -> origin/gh/ydwu4/317/base 2025-09-07T09:36:19.9951563Z * [new branch] gh/ydwu4/317/head -> origin/gh/ydwu4/317/head 2025-09-07T09:36:19.9953123Z * [new branch] gh/ydwu4/317/orig -> origin/gh/ydwu4/317/orig 2025-09-07T09:36:19.9955580Z * [new branch] gh/ydwu4/318/base -> origin/gh/ydwu4/318/base 2025-09-07T09:36:19.9957194Z * [new branch] gh/ydwu4/318/head -> origin/gh/ydwu4/318/head 2025-09-07T09:36:19.9958743Z * [new branch] gh/ydwu4/318/orig -> origin/gh/ydwu4/318/orig 2025-09-07T09:36:19.9960861Z * [new branch] gh/ydwu4/319/base -> origin/gh/ydwu4/319/base 2025-09-07T09:36:19.9962396Z * [new branch] gh/ydwu4/319/head -> origin/gh/ydwu4/319/head 2025-09-07T09:36:19.9964000Z * [new branch] gh/ydwu4/319/orig -> origin/gh/ydwu4/319/orig 2025-09-07T09:36:19.9966626Z * [new branch] gh/ydwu4/320/base -> origin/gh/ydwu4/320/base 2025-09-07T09:36:19.9968147Z * [new branch] gh/ydwu4/320/head -> origin/gh/ydwu4/320/head 2025-09-07T09:36:19.9969624Z * [new branch] gh/ydwu4/320/orig -> origin/gh/ydwu4/320/orig 2025-09-07T09:36:19.9971682Z * [new branch] gh/ydwu4/321/base -> origin/gh/ydwu4/321/base 2025-09-07T09:36:19.9973435Z * [new branch] gh/ydwu4/321/head -> origin/gh/ydwu4/321/head 2025-09-07T09:36:19.9974753Z * [new branch] gh/ydwu4/321/orig -> origin/gh/ydwu4/321/orig 2025-09-07T09:36:19.9977324Z * [new branch] gh/ydwu4/322/base -> origin/gh/ydwu4/322/base 2025-09-07T09:36:19.9978847Z * [new branch] gh/ydwu4/322/head -> origin/gh/ydwu4/322/head 2025-09-07T09:36:19.9980378Z * [new branch] gh/ydwu4/322/orig -> origin/gh/ydwu4/322/orig 2025-09-07T09:36:19.9982734Z * [new branch] gh/ydwu4/323/base -> origin/gh/ydwu4/323/base 2025-09-07T09:36:19.9984352Z * [new branch] gh/ydwu4/323/head -> origin/gh/ydwu4/323/head 2025-09-07T09:36:19.9986306Z * [new branch] gh/ydwu4/323/orig -> origin/gh/ydwu4/323/orig 2025-09-07T09:36:19.9988534Z * [new branch] gh/ydwu4/324/base -> origin/gh/ydwu4/324/base 2025-09-07T09:36:19.9990155Z * [new branch] gh/ydwu4/324/head -> origin/gh/ydwu4/324/head 2025-09-07T09:36:19.9991721Z * [new branch] gh/ydwu4/324/orig -> origin/gh/ydwu4/324/orig 2025-09-07T09:36:19.9994481Z * [new branch] gh/yf225/133/base -> origin/gh/yf225/133/base 2025-09-07T09:36:19.9996341Z * [new branch] gh/yf225/133/head -> origin/gh/yf225/133/head 2025-09-07T09:36:19.9998732Z * [new branch] gh/yf225/171/base -> origin/gh/yf225/171/base 2025-09-07T09:36:20.0000351Z * [new branch] gh/yf225/171/head -> origin/gh/yf225/171/head 2025-09-07T09:36:20.0001875Z * [new branch] gh/yf225/171/orig -> origin/gh/yf225/171/orig 2025-09-07T09:36:20.0004248Z * [new branch] gh/yf225/172/base -> origin/gh/yf225/172/base 2025-09-07T09:36:20.0005999Z * [new branch] gh/yf225/172/head -> origin/gh/yf225/172/head 2025-09-07T09:36:20.0007555Z * [new branch] gh/yf225/172/orig -> origin/gh/yf225/172/orig 2025-09-07T09:36:20.0009646Z * [new branch] gh/yf225/93/base -> origin/gh/yf225/93/base 2025-09-07T09:36:20.0011097Z * [new branch] gh/yf225/93/head -> origin/gh/yf225/93/head 2025-09-07T09:36:20.0014322Z * [new branch] gh/yifuwang/152/base -> origin/gh/yifuwang/152/base 2025-09-07T09:36:20.0016456Z * [new branch] gh/yifuwang/152/head -> origin/gh/yifuwang/152/head 2025-09-07T09:36:20.0017993Z * [new branch] gh/yifuwang/152/orig -> origin/gh/yifuwang/152/orig 2025-09-07T09:36:20.0020260Z * [new branch] gh/yifuwang/195/base -> origin/gh/yifuwang/195/base 2025-09-07T09:36:20.0022020Z * [new branch] gh/yifuwang/195/head -> origin/gh/yifuwang/195/head 2025-09-07T09:36:20.0023660Z * [new branch] gh/yifuwang/195/orig -> origin/gh/yifuwang/195/orig 2025-09-07T09:36:20.0026827Z * [new branch] gh/yiming0416/1/base -> origin/gh/yiming0416/1/base 2025-09-07T09:36:20.0028366Z * [new branch] gh/yiming0416/1/head -> origin/gh/yiming0416/1/head 2025-09-07T09:36:20.0030442Z * [new branch] gh/yiming0416/2/base -> origin/gh/yiming0416/2/base 2025-09-07T09:36:20.0031960Z * [new branch] gh/yiming0416/2/head -> origin/gh/yiming0416/2/head 2025-09-07T09:36:20.0034847Z * [new branch] gh/ysiraichi/79/base -> origin/gh/ysiraichi/79/base 2025-09-07T09:36:20.0036745Z * [new branch] gh/ysiraichi/79/head -> origin/gh/ysiraichi/79/head 2025-09-07T09:36:20.0038542Z * [new branch] gh/ysiraichi/79/orig -> origin/gh/ysiraichi/79/orig 2025-09-07T09:36:20.0040918Z * [new branch] gh/ysiraichi/88/base -> origin/gh/ysiraichi/88/base 2025-09-07T09:36:20.0042560Z * [new branch] gh/ysiraichi/88/head -> origin/gh/ysiraichi/88/head 2025-09-07T09:36:20.0043947Z * [new branch] gh/ysiraichi/88/orig -> origin/gh/ysiraichi/88/orig 2025-09-07T09:36:20.0047222Z * [new branch] gh/zhxchen17/25/base -> origin/gh/zhxchen17/25/base 2025-09-07T09:36:20.0048739Z * [new branch] gh/zhxchen17/25/head -> origin/gh/zhxchen17/25/head 2025-09-07T09:36:20.0050259Z * [new branch] gh/zhxchen17/25/orig -> origin/gh/zhxchen17/25/orig 2025-09-07T09:36:20.0052687Z * [new branch] gh/zhxchen17/31/base -> origin/gh/zhxchen17/31/base 2025-09-07T09:36:20.0054181Z * [new branch] gh/zhxchen17/31/head -> origin/gh/zhxchen17/31/head 2025-09-07T09:36:20.0056025Z * [new branch] gh/zhxchen17/31/orig -> origin/gh/zhxchen17/31/orig 2025-09-07T09:36:20.0058205Z * [new branch] gh/zhxchen17/34/base -> origin/gh/zhxchen17/34/base 2025-09-07T09:36:20.0059793Z * [new branch] gh/zhxchen17/34/head -> origin/gh/zhxchen17/34/head 2025-09-07T09:36:20.0061909Z * [new branch] gh/zhxchen17/35/base -> origin/gh/zhxchen17/35/base 2025-09-07T09:36:20.0063561Z * [new branch] gh/zhxchen17/35/head -> origin/gh/zhxchen17/35/head 2025-09-07T09:36:20.0066410Z * [new branch] gh/zhxchen17/37/base -> origin/gh/zhxchen17/37/base 2025-09-07T09:36:20.0067844Z * [new branch] gh/zhxchen17/37/head -> origin/gh/zhxchen17/37/head 2025-09-07T09:36:20.0069433Z * [new branch] gh/zhxchen17/37/orig -> origin/gh/zhxchen17/37/orig 2025-09-07T09:36:20.0071870Z * [new branch] gh/zhxchen17/38/base -> origin/gh/zhxchen17/38/base 2025-09-07T09:36:20.0073396Z * [new branch] gh/zhxchen17/38/head -> origin/gh/zhxchen17/38/head 2025-09-07T09:36:20.0075108Z * [new branch] gh/zhxchen17/38/orig -> origin/gh/zhxchen17/38/orig 2025-09-07T09:36:20.0077428Z * [new branch] gh/zhxchen17/39/base -> origin/gh/zhxchen17/39/base 2025-09-07T09:36:20.0079015Z * [new branch] gh/zhxchen17/39/head -> origin/gh/zhxchen17/39/head 2025-09-07T09:36:20.0080609Z * [new branch] gh/zhxchen17/39/orig -> origin/gh/zhxchen17/39/orig 2025-09-07T09:36:20.0082870Z * [new branch] gh/zhxchen17/40/base -> origin/gh/zhxchen17/40/base 2025-09-07T09:36:20.0084382Z * [new branch] gh/zhxchen17/40/head -> origin/gh/zhxchen17/40/head 2025-09-07T09:36:20.0086404Z * [new branch] gh/zhxchen17/40/orig -> origin/gh/zhxchen17/40/orig 2025-09-07T09:36:20.0088676Z * [new branch] gh/zhxchen17/41/base -> origin/gh/zhxchen17/41/base 2025-09-07T09:36:20.0090289Z * [new branch] gh/zhxchen17/41/head -> origin/gh/zhxchen17/41/head 2025-09-07T09:36:20.0092107Z * [new branch] gh/zhxchen17/41/orig -> origin/gh/zhxchen17/41/orig 2025-09-07T09:36:20.0094604Z * [new branch] gh/zhxchen17/42/base -> origin/gh/zhxchen17/42/base 2025-09-07T09:36:20.0096662Z * [new branch] gh/zhxchen17/42/head -> origin/gh/zhxchen17/42/head 2025-09-07T09:36:20.0098338Z * [new branch] gh/zhxchen17/42/orig -> origin/gh/zhxchen17/42/orig 2025-09-07T09:36:20.0100803Z * [new branch] gh/zhxchen17/43/base -> origin/gh/zhxchen17/43/base 2025-09-07T09:36:20.0102857Z * [new branch] gh/zhxchen17/43/head -> origin/gh/zhxchen17/43/head 2025-09-07T09:36:20.0104451Z * [new branch] gh/zhxchen17/43/orig -> origin/gh/zhxchen17/43/orig 2025-09-07T09:36:20.0107044Z * [new branch] gh/zhxchen17/44/base -> origin/gh/zhxchen17/44/base 2025-09-07T09:36:20.0108688Z * [new branch] gh/zhxchen17/44/head -> origin/gh/zhxchen17/44/head 2025-09-07T09:36:20.0110267Z * [new branch] gh/zhxchen17/44/orig -> origin/gh/zhxchen17/44/orig 2025-09-07T09:36:20.0112406Z * [new branch] gh/zhxchen17/45/base -> origin/gh/zhxchen17/45/base 2025-09-07T09:36:20.0113911Z * [new branch] gh/zhxchen17/45/head -> origin/gh/zhxchen17/45/head 2025-09-07T09:36:20.0115804Z * [new branch] gh/zhxchen17/45/orig -> origin/gh/zhxchen17/45/orig 2025-09-07T09:36:20.0118741Z * [new branch] gh/zklaus/10/base -> origin/gh/zklaus/10/base 2025-09-07T09:36:20.0120201Z * [new branch] gh/zklaus/10/head -> origin/gh/zklaus/10/head 2025-09-07T09:36:20.0121774Z * [new branch] gh/zklaus/10/orig -> origin/gh/zklaus/10/orig 2025-09-07T09:36:20.0124061Z * [new branch] gh/zklaus/11/base -> origin/gh/zklaus/11/base 2025-09-07T09:36:20.0126019Z * [new branch] gh/zklaus/11/head -> origin/gh/zklaus/11/head 2025-09-07T09:36:20.0127566Z * [new branch] gh/zklaus/11/orig -> origin/gh/zklaus/11/orig 2025-09-07T09:36:20.0129601Z * [new branch] gh/zklaus/12/base -> origin/gh/zklaus/12/base 2025-09-07T09:36:20.0131296Z * [new branch] gh/zklaus/12/head -> origin/gh/zklaus/12/head 2025-09-07T09:36:20.0132760Z * [new branch] gh/zklaus/12/orig -> origin/gh/zklaus/12/orig 2025-09-07T09:36:20.0135608Z * [new branch] gh/zklaus/14/base -> origin/gh/zklaus/14/base 2025-09-07T09:36:20.0137082Z * [new branch] gh/zklaus/14/head -> origin/gh/zklaus/14/head 2025-09-07T09:36:20.0138695Z * [new branch] gh/zklaus/14/orig -> origin/gh/zklaus/14/orig 2025-09-07T09:36:20.0140852Z * [new branch] gh/zklaus/15/base -> origin/gh/zklaus/15/base 2025-09-07T09:36:20.0142584Z * [new branch] gh/zklaus/15/head -> origin/gh/zklaus/15/head 2025-09-07T09:36:20.0144054Z * [new branch] gh/zklaus/15/orig -> origin/gh/zklaus/15/orig 2025-09-07T09:36:20.0146776Z * [new branch] gh/zklaus/16/base -> origin/gh/zklaus/16/base 2025-09-07T09:36:20.0148398Z * [new branch] gh/zklaus/16/head -> origin/gh/zklaus/16/head 2025-09-07T09:36:20.0149908Z * [new branch] gh/zklaus/16/orig -> origin/gh/zklaus/16/orig 2025-09-07T09:36:20.0152118Z * [new branch] gh/zklaus/17/base -> origin/gh/zklaus/17/base 2025-09-07T09:36:20.0153600Z * [new branch] gh/zklaus/17/head -> origin/gh/zklaus/17/head 2025-09-07T09:36:20.0155338Z * [new branch] gh/zklaus/17/orig -> origin/gh/zklaus/17/orig 2025-09-07T09:36:20.0157721Z * [new branch] gh/zklaus/18/base -> origin/gh/zklaus/18/base 2025-09-07T09:36:20.0159035Z * [new branch] gh/zklaus/18/head -> origin/gh/zklaus/18/head 2025-09-07T09:36:20.0160611Z * [new branch] gh/zklaus/18/orig -> origin/gh/zklaus/18/orig 2025-09-07T09:36:20.0162865Z * [new branch] gh/zklaus/19/base -> origin/gh/zklaus/19/base 2025-09-07T09:36:20.0164574Z * [new branch] gh/zklaus/19/head -> origin/gh/zklaus/19/head 2025-09-07T09:36:20.0166281Z * [new branch] gh/zklaus/19/orig -> origin/gh/zklaus/19/orig 2025-09-07T09:36:20.0168461Z * [new branch] gh/zklaus/20/base -> origin/gh/zklaus/20/base 2025-09-07T09:36:20.0169900Z * [new branch] gh/zklaus/20/head -> origin/gh/zklaus/20/head 2025-09-07T09:36:20.0171605Z * [new branch] gh/zklaus/20/orig -> origin/gh/zklaus/20/orig 2025-09-07T09:36:20.0173773Z * [new branch] gh/zklaus/7/base -> origin/gh/zklaus/7/base 2025-09-07T09:36:20.0175451Z * [new branch] gh/zklaus/7/head -> origin/gh/zklaus/7/head 2025-09-07T09:36:20.0177226Z * [new branch] gh/zklaus/7/orig -> origin/gh/zklaus/7/orig 2025-09-07T09:36:20.0179234Z * [new branch] gh/zklaus/9/base -> origin/gh/zklaus/9/base 2025-09-07T09:36:20.0180863Z * [new branch] gh/zklaus/9/head -> origin/gh/zklaus/9/head 2025-09-07T09:36:20.0182564Z * [new branch] gh/zklaus/9/orig -> origin/gh/zklaus/9/orig 2025-09-07T09:36:20.0185595Z * [new branch] gh/zou3519/1175/base -> origin/gh/zou3519/1175/base 2025-09-07T09:36:20.0187292Z * [new branch] gh/zou3519/1175/head -> origin/gh/zou3519/1175/head 2025-09-07T09:36:20.0188811Z * [new branch] gh/zou3519/1175/orig -> origin/gh/zou3519/1175/orig 2025-09-07T09:36:20.0190994Z * [new branch] gh/zou3519/1177/base -> origin/gh/zou3519/1177/base 2025-09-07T09:36:20.0192598Z * [new branch] gh/zou3519/1177/head -> origin/gh/zou3519/1177/head 2025-09-07T09:36:20.0194280Z * [new branch] gh/zou3519/1177/orig -> origin/gh/zou3519/1177/orig 2025-09-07T09:36:20.0196865Z * [new branch] gh/zou3519/1191/base -> origin/gh/zou3519/1191/base 2025-09-07T09:36:20.0198376Z * [new branch] gh/zou3519/1191/head -> origin/gh/zou3519/1191/head 2025-09-07T09:36:20.0199904Z * [new branch] gh/zou3519/1191/orig -> origin/gh/zou3519/1191/orig 2025-09-07T09:36:20.0202321Z * [new branch] gh/zou3519/1192/base -> origin/gh/zou3519/1192/base 2025-09-07T09:36:20.0204004Z * [new branch] gh/zou3519/1192/head -> origin/gh/zou3519/1192/head 2025-09-07T09:36:20.0205899Z * [new branch] gh/zou3519/1192/orig -> origin/gh/zou3519/1192/orig 2025-09-07T09:36:20.0207903Z * [new branch] gh/zou3519/1193/base -> origin/gh/zou3519/1193/base 2025-09-07T09:36:20.0209445Z * [new branch] gh/zou3519/1193/head -> origin/gh/zou3519/1193/head 2025-09-07T09:36:20.0211237Z * [new branch] gh/zou3519/1193/orig -> origin/gh/zou3519/1193/orig 2025-09-07T09:36:20.0213185Z * [new branch] gh/zou3519/1194/base -> origin/gh/zou3519/1194/base 2025-09-07T09:36:20.0214705Z * [new branch] gh/zou3519/1194/head -> origin/gh/zou3519/1194/head 2025-09-07T09:36:20.0216577Z * [new branch] gh/zou3519/1194/orig -> origin/gh/zou3519/1194/orig 2025-09-07T09:36:20.0218791Z * [new branch] gh/zou3519/1195/base -> origin/gh/zou3519/1195/base 2025-09-07T09:36:20.0220386Z * [new branch] gh/zou3519/1195/head -> origin/gh/zou3519/1195/head 2025-09-07T09:36:20.0222269Z * [new branch] gh/zou3519/1195/orig -> origin/gh/zou3519/1195/orig 2025-09-07T09:36:20.0224350Z * [new branch] gh/zou3519/1196/base -> origin/gh/zou3519/1196/base 2025-09-07T09:36:20.0226367Z * [new branch] gh/zou3519/1196/head -> origin/gh/zou3519/1196/head 2025-09-07T09:36:20.0228030Z * [new branch] gh/zou3519/1196/orig -> origin/gh/zou3519/1196/orig 2025-09-07T09:36:20.0230129Z * [new branch] gh/zou3519/1197/base -> origin/gh/zou3519/1197/base 2025-09-07T09:36:20.0231695Z * [new branch] gh/zou3519/1197/head -> origin/gh/zou3519/1197/head 2025-09-07T09:36:20.0233384Z * [new branch] gh/zou3519/1197/orig -> origin/gh/zou3519/1197/orig 2025-09-07T09:36:20.0236714Z * [new branch] gh/zpcore/1/base -> origin/gh/zpcore/1/base 2025-09-07T09:36:20.0238160Z * [new branch] gh/zpcore/1/head -> origin/gh/zpcore/1/head 2025-09-07T09:36:20.0240366Z * [new branch] gh/zpcore/10/base -> origin/gh/zpcore/10/base 2025-09-07T09:36:20.0241880Z * [new branch] gh/zpcore/10/head -> origin/gh/zpcore/10/head 2025-09-07T09:36:20.0243613Z * [new branch] gh/zpcore/10/orig -> origin/gh/zpcore/10/orig 2025-09-07T09:36:20.0246109Z * [new branch] gh/zpcore/11/base -> origin/gh/zpcore/11/base 2025-09-07T09:36:20.0247686Z * [new branch] gh/zpcore/11/head -> origin/gh/zpcore/11/head 2025-09-07T09:36:20.0249223Z * [new branch] gh/zpcore/11/orig -> origin/gh/zpcore/11/orig 2025-09-07T09:36:20.0251641Z * [new branch] gh/zpcore/12/base -> origin/gh/zpcore/12/base 2025-09-07T09:36:20.0253328Z * [new branch] gh/zpcore/12/head -> origin/gh/zpcore/12/head 2025-09-07T09:36:20.0255189Z * [new branch] gh/zpcore/12/orig -> origin/gh/zpcore/12/orig 2025-09-07T09:36:20.0257533Z * [new branch] gh/zpcore/13/base -> origin/gh/zpcore/13/base 2025-09-07T09:36:20.0259083Z * [new branch] gh/zpcore/13/head -> origin/gh/zpcore/13/head 2025-09-07T09:36:20.0260694Z * [new branch] gh/zpcore/13/orig -> origin/gh/zpcore/13/orig 2025-09-07T09:36:20.0263484Z * [new branch] gh/zpcore/14/base -> origin/gh/zpcore/14/base 2025-09-07T09:36:20.0264921Z * [new branch] gh/zpcore/14/head -> origin/gh/zpcore/14/head 2025-09-07T09:36:20.0267360Z * [new branch] gh/zpcore/2/base -> origin/gh/zpcore/2/base 2025-09-07T09:36:20.0268959Z * [new branch] gh/zpcore/2/head -> origin/gh/zpcore/2/head 2025-09-07T09:36:20.0271006Z * [new branch] gh/zpcore/3/base -> origin/gh/zpcore/3/base 2025-09-07T09:36:20.0272638Z * [new branch] gh/zpcore/3/head -> origin/gh/zpcore/3/head 2025-09-07T09:36:20.0274594Z * [new branch] gh/zpcore/4/base -> origin/gh/zpcore/4/base 2025-09-07T09:36:20.0276428Z * [new branch] gh/zpcore/4/head -> origin/gh/zpcore/4/head 2025-09-07T09:36:20.0278497Z * [new branch] gh/zpcore/5/base -> origin/gh/zpcore/5/base 2025-09-07T09:36:20.0279910Z * [new branch] gh/zpcore/5/head -> origin/gh/zpcore/5/head 2025-09-07T09:36:20.0282100Z * [new branch] gh/zpcore/6/base -> origin/gh/zpcore/6/base 2025-09-07T09:36:20.0283519Z * [new branch] gh/zpcore/6/head -> origin/gh/zpcore/6/head 2025-09-07T09:36:20.0285732Z * [new branch] gh/zpcore/7/base -> origin/gh/zpcore/7/base 2025-09-07T09:36:20.0287286Z * [new branch] gh/zpcore/7/head -> origin/gh/zpcore/7/head 2025-09-07T09:36:20.0289449Z * [new branch] gh/zpcore/8/base -> origin/gh/zpcore/8/base 2025-09-07T09:36:20.0291160Z * [new branch] gh/zpcore/8/head -> origin/gh/zpcore/8/head 2025-09-07T09:36:20.0293100Z * [new branch] google-main -> origin/google-main 2025-09-07T09:36:20.0295506Z * [new branch] guangyey/external_stream -> origin/guangyey/external_stream 2025-09-07T09:36:20.0297259Z * [new branch] guangyey/host_alloc -> origin/guangyey/host_alloc 2025-09-07T09:36:20.0298541Z * [new branch] guangyey/reimport -> origin/guangyey/reimport 2025-09-07T09:36:20.0300137Z * [new branch] guangyey/test_2025 -> origin/guangyey/test_2025 2025-09-07T09:36:20.0302788Z * [new branch] guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9 2025-09-07T09:36:20.0304892Z * [new branch] haozhe/bf16-dynamic-shape -> origin/haozhe/bf16-dynamic-shape 2025-09-07T09:36:20.0307001Z * [new branch] hc_baseline -> origin/hc_baseline 2025-09-07T09:36:20.0308932Z * [new branch] hf_update -> origin/hf_update 2025-09-07T09:36:20.0310683Z * [new branch] hhh_decomp_mul -> origin/hhh_decomp_mul 2025-09-07T09:36:20.0312655Z * [new branch] hhh_rand -> origin/hhh_rand 2025-09-07T09:36:20.0314847Z * [new branch] hoy/mmsplitk -> origin/hoy/mmsplitk 2025-09-07T09:36:20.0316660Z * [new branch] hoy/triton-PR3973 -> origin/hoy/triton-PR3973 2025-09-07T09:36:20.0318079Z * [new branch] hoy/triton-coalescing-baseline -> origin/hoy/triton-coalescing-baseline 2025-09-07T09:36:20.0319549Z * [new branch] hoy/triton-coalescing-new -> origin/hoy/triton-coalescing-new 2025-09-07T09:36:20.0320965Z * [new branch] hoy/triton-coalescing-vec -> origin/hoy/triton-coalescing-vec 2025-09-07T09:36:20.0323021Z * [new branch] inductordecompfix -> origin/inductordecompfix 2025-09-07T09:36:20.0324891Z * [new branch] inline -> origin/inline 2025-09-07T09:36:20.0327025Z * [new branch] inlining -> origin/inlining 2025-09-07T09:36:20.0328927Z * [new branch] inlining-ezyang -> origin/inlining-ezyang 2025-09-07T09:36:20.0330710Z * [new branch] install-torchao-0.13.0 -> origin/install-torchao-0.13.0 2025-09-07T09:36:20.0332402Z * [new branch] int8_sdpa -> origin/int8_sdpa 2025-09-07T09:36:20.0334241Z * [new branch] invoke-subgraph -> origin/invoke-subgraph 2025-09-07T09:36:20.0336382Z * [new branch] issue#58739 -> origin/issue#58739 2025-09-07T09:36:20.0338759Z * [new branch] jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2 2025-09-07T09:36:20.0340274Z * [new branch] jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2 2025-09-07T09:36:20.0342716Z * [new branch] jeanschmidt/disable_rocm_build_tests -> origin/jeanschmidt/disable_rocm_build_tests 2025-09-07T09:36:20.0344555Z * [new branch] jithunnair-amd-patch-1 -> origin/jithunnair-amd-patch-1 2025-09-07T09:36:20.0346732Z * [new branch] jithunnair-amd-patch-2 -> origin/jithunnair-amd-patch-2 2025-09-07T09:36:20.0349014Z * [new branch] justinchu/attention-tests -> origin/justinchu/attention-tests 2025-09-07T09:36:20.0350431Z * [new branch] justinchu/native-qdq -> origin/justinchu/native-qdq 2025-09-07T09:36:20.0352080Z * [new branch] justinchu/ort-122 -> origin/justinchu/ort-122 2025-09-07T09:36:20.0354444Z * [new branch] justinchuby/dynamo-true -> origin/justinchuby/dynamo-true 2025-09-07T09:36:20.0356907Z * [new branch] kainan666/xlf_debug -> origin/kainan666/xlf_debug 2025-09-07T09:36:20.0358752Z * [new branch] kainan_test -> origin/kainan_test 2025-09-07T09:36:20.0360414Z * [new branch] learnablebias -> origin/learnablebias 2025-09-07T09:36:20.0362950Z * [new branch] leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues 2025-09-07T09:36:20.0365444Z * [new branch] lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error 2025-09-07T09:36:20.0367812Z * [new branch] liaoxuan/shm_all_reduce -> origin/liaoxuan/shm_all_reduce 2025-09-07T09:36:20.0369274Z * [new branch] liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax 2025-09-07T09:36:20.0370687Z * [new branch] liaoxuan/test_int8_sdpa -> origin/liaoxuan/test_int8_sdpa 2025-09-07T09:36:20.0372506Z * [new branch] lintbuilddocker -> origin/lintbuilddocker 2025-09-07T09:36:20.0374218Z * [new branch] llama4-stable -> origin/llama4-stable 2025-09-07T09:36:20.0376391Z * [new branch] logdetfix -> origin/logdetfix 2025-09-07T09:36:20.0379320Z * [new branch] lts/release/1.8 -> origin/lts/release/1.8 2025-09-07T09:36:20.0381902Z * [new branch] lucaskabela/#94773 -> origin/lucaskabela/#94773 2025-09-07T09:36:20.0383205Z * [new branch] lucaskabela/flop_counter -> origin/lucaskabela/flop_counter 2025-09-07T09:36:20.0384819Z * [new branch] lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp 2025-09-07T09:36:20.0386635Z * [new branch] lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo 2025-09-07T09:36:20.0387964Z * [new branch] lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr 2025-09-07T09:36:20.0389486Z * [new branch] lucaskabela/issue_120648 -> origin/lucaskabela/issue_120648 2025-09-07T09:36:20.0390977Z * [new branch] lucaskabela/misc_typing_dynamo -> origin/lucaskabela/misc_typing_dynamo 2025-09-07T09:36:20.0392605Z * [new branch] lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr 2025-09-07T09:36:20.0393879Z * [new branch] lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata 2025-09-07T09:36:20.0395704Z * [new branch] lucaskabela/rnn_decomp -> origin/lucaskabela/rnn_decomp 2025-09-07T09:36:20.0397297Z * [new branch] lucaskabela/typing_backends -> origin/lucaskabela/typing_backends 2025-09-07T09:36:20.0398619Z * [new branch] lucaskabela/typing_symbolic_convert -> origin/lucaskabela/typing_symbolic_convert 2025-09-07T09:36:20.0400294Z * [new branch] lucaskabela/typing_utils_improvements -> origin/lucaskabela/typing_utils_improvements 2025-09-07T09:36:20.0402041Z * [new branch] main -> origin/main 2025-09-07T09:36:20.0404222Z * [new branch] main-enable-b200-distributed-tests -> origin/main-enable-b200-distributed-tests 2025-09-07T09:36:20.0406080Z * [new branch] malfet-patch-1 -> origin/malfet-patch-1 2025-09-07T09:36:20.0408083Z * [new branch] malfet-patch-12 -> origin/malfet-patch-12 2025-09-07T09:36:20.0409914Z * [new branch] malfet-patch-14 -> origin/malfet-patch-14 2025-09-07T09:36:20.0411897Z * [new branch] malfet-patch-6 -> origin/malfet-patch-6 2025-09-07T09:36:20.0413666Z * [new branch] malfet-patch-8 -> origin/malfet-patch-8 2025-09-07T09:36:20.0416587Z * [new branch] malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch 2025-09-07T09:36:20.0418019Z * [new branch] malfet/delete-upsteam-cuda -> origin/malfet/delete-upsteam-cuda 2025-09-07T09:36:20.0419729Z * [new branch] malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im 2025-09-07T09:36:20.0422060Z * [new branch] manuel/test-ops-common-allow-mps -> origin/manuel/test-ops-common-allow-mps 2025-09-07T09:36:20.0423797Z * [new branch] metascroy-patch-1 -> origin/metascroy-patch-1 2025-09-07T09:36:20.0426423Z * [new branch] mlazos/S429861-debug -> origin/mlazos/S429861-debug 2025-09-07T09:36:20.0428019Z * [new branch] mlazos/aa -> origin/mlazos/aa 2025-09-07T09:36:20.0429483Z * [new branch] mlazos/arg-renames -> origin/mlazos/arg-renames 2025-09-07T09:36:20.0430955Z * [new branch] mlazos/backup-test-branch -> origin/mlazos/backup-test-branch 2025-09-07T09:36:20.0432353Z * [new branch] mlazos/bad-cudagraphs -> origin/mlazos/bad-cudagraphs 2025-09-07T09:36:20.0433903Z * [new branch] mlazos/baseline -> origin/mlazos/baseline 2025-09-07T09:36:20.0435750Z * [new branch] mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks 2025-09-07T09:36:20.0437226Z * [new branch] mlazos/beta-tensor -> origin/mlazos/beta-tensor 2025-09-07T09:36:20.0438878Z * [new branch] mlazos/better-msg -> origin/mlazos/better-msg 2025-09-07T09:36:20.0440068Z * [new branch] mlazos/buffers -> origin/mlazos/buffers 2025-09-07T09:36:20.0441502Z * [new branch] mlazos/buffers2 -> origin/mlazos/buffers2 2025-09-07T09:36:20.0443114Z * [new branch] mlazos/buffers3 -> origin/mlazos/buffers3 2025-09-07T09:36:20.0445205Z * [new branch] mlazos/ck2 -> origin/mlazos/ck2 2025-09-07T09:36:20.0446869Z * [new branch] mlazos/combokernels -> origin/mlazos/combokernels 2025-09-07T09:36:20.0448351Z * [new branch] mlazos/ctx-cleanup -> origin/mlazos/ctx-cleanup 2025-09-07T09:36:20.0449740Z * [new branch] mlazos/cuda-cmd-log -> origin/mlazos/cuda-cmd-log 2025-09-07T09:36:20.0451649Z * [new branch] mlazos/cudagraph-tests -> origin/mlazos/cudagraph-tests 2025-09-07T09:36:20.0452992Z * [new branch] mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement 2025-09-07T09:36:20.0454492Z * [new branch] mlazos/cutlass-test -> origin/mlazos/cutlass-test 2025-09-07T09:36:20.0456538Z * [new branch] mlazos/cutlass-topo-bug -> origin/mlazos/cutlass-topo-bug 2025-09-07T09:36:20.0457879Z * [new branch] mlazos/data-gather -> origin/mlazos/data-gather 2025-09-07T09:36:20.0459387Z * [new branch] mlazos/data-ptrs2 -> origin/mlazos/data-ptrs2 2025-09-07T09:36:20.0460984Z * [new branch] mlazos/data-ptrs3 -> origin/mlazos/data-ptrs3 2025-09-07T09:36:20.0462752Z * [new branch] mlazos/dataclass-proxy -> origin/mlazos/dataclass-proxy 2025-09-07T09:36:20.0464352Z * [new branch] mlazos/dc-attrs -> origin/mlazos/dc-attrs 2025-09-07T09:36:20.0466221Z * [new branch] mlazos/dc-helion -> origin/mlazos/dc-helion 2025-09-07T09:36:20.0467615Z * [new branch] mlazos/dict-fix -> origin/mlazos/dict-fix 2025-09-07T09:36:20.0469199Z * [new branch] mlazos/disable-closures -> origin/mlazos/disable-closures 2025-09-07T09:36:20.0470672Z * [new branch] mlazos/disable-tf -> origin/mlazos/disable-tf 2025-09-07T09:36:20.0472178Z * [new branch] mlazos/dupe-fix -> origin/mlazos/dupe-fix 2025-09-07T09:36:20.0473675Z * [new branch] mlazos/dyn-batch -> origin/mlazos/dyn-batch 2025-09-07T09:36:20.0475528Z * [new branch] mlazos/evt -> origin/mlazos/evt 2025-09-07T09:36:20.0477226Z * [new branch] mlazos/exp_disable -> origin/mlazos/exp_disable 2025-09-07T09:36:20.0478671Z * [new branch] mlazos/extract-examples -> origin/mlazos/extract-examples 2025-09-07T09:36:20.0480329Z * [new branch] mlazos/foreach-op -> origin/mlazos/foreach-op 2025-09-07T09:36:20.0481838Z * [new branch] mlazos/fp8 -> origin/mlazos/fp8 2025-09-07T09:36:20.0483384Z * [new branch] mlazos/fp8-bias -> origin/mlazos/fp8-bias 2025-09-07T09:36:20.0485118Z * [new branch] mlazos/fp8-bias-fusion -> origin/mlazos/fp8-bias-fusion 2025-09-07T09:36:20.0486798Z * [new branch] mlazos/fp8-fixes -> origin/mlazos/fp8-fixes 2025-09-07T09:36:20.0488302Z * [new branch] mlazos/freezing -> origin/mlazos/freezing 2025-09-07T09:36:20.0489884Z * [new branch] mlazos/h-comp -> origin/mlazos/h-comp 2025-09-07T09:36:20.0491473Z * [new branch] mlazos/h-comp2 -> origin/mlazos/h-comp2 2025-09-07T09:36:20.0493062Z * [new branch] mlazos/hash-hop -> origin/mlazos/hash-hop 2025-09-07T09:36:20.0494724Z * [new branch] mlazos/hc -> origin/mlazos/hc 2025-09-07T09:36:20.0496792Z * [new branch] mlazos/hc-cycles -> origin/mlazos/hc-cycles 2025-09-07T09:36:20.0498201Z * [new branch] mlazos/hc-fixes -> origin/mlazos/hc-fixes 2025-09-07T09:36:20.0499928Z * [new branch] mlazos/hc-fixes3 -> origin/mlazos/hc-fixes3 2025-09-07T09:36:20.0501805Z * [new branch] mlazos/hc-fixes4 -> origin/mlazos/hc-fixes4 2025-09-07T09:36:20.0503374Z * [new branch] mlazos/hc-hf -> origin/mlazos/hc-hf 2025-09-07T09:36:20.0504900Z * [new branch] mlazos/hc-mut -> origin/mlazos/hc-mut 2025-09-07T09:36:20.0506783Z * [new branch] mlazos/hc10 -> origin/mlazos/hc10 2025-09-07T09:36:20.0508562Z * [new branch] mlazos/hc11 -> origin/mlazos/hc11 2025-09-07T09:36:20.0510115Z * [new branch] mlazos/hc12 -> origin/mlazos/hc12 2025-09-07T09:36:20.0511767Z * [new branch] mlazos/hc13 -> origin/mlazos/hc13 2025-09-07T09:36:20.0513275Z * [new branch] mlazos/hc14 -> origin/mlazos/hc14 2025-09-07T09:36:20.0514838Z * [new branch] mlazos/hc15 -> origin/mlazos/hc15 2025-09-07T09:36:20.0516757Z * [new branch] mlazos/hc2 -> origin/mlazos/hc2 2025-09-07T09:36:20.0518452Z * [new branch] mlazos/hc4 -> origin/mlazos/hc4 2025-09-07T09:36:20.0520033Z * [new branch] mlazos/hc5 -> origin/mlazos/hc5 2025-09-07T09:36:20.0521641Z * [new branch] mlazos/hc6 -> origin/mlazos/hc6 2025-09-07T09:36:20.0523358Z * [new branch] mlazos/hc7 -> origin/mlazos/hc7 2025-09-07T09:36:20.0524690Z * [new branch] mlazos/hc8 -> origin/mlazos/hc8 2025-09-07T09:36:20.0526883Z * [new branch] mlazos/hc9 -> origin/mlazos/hc9 2025-09-07T09:36:20.0528594Z * [new branch] mlazos/hc_baseline2 -> origin/mlazos/hc_baseline2 2025-09-07T09:36:20.0530247Z * [new branch] mlazos/init-per-param -> origin/mlazos/init-per-param 2025-09-07T09:36:20.0531810Z * [new branch] mlazos/init_per_param -> origin/mlazos/init_per_param 2025-09-07T09:36:20.0533610Z * [new branch] mlazos/less-guards -> origin/mlazos/less-guards 2025-09-07T09:36:20.0535364Z * [new branch] mlazos/lr-composibility -> origin/mlazos/lr-composibility 2025-09-07T09:36:20.0537032Z * [new branch] mlazos/main -> origin/mlazos/main 2025-09-07T09:36:20.0538831Z * [new branch] mlazos/main-test-enablement -> origin/mlazos/main-test-enablement 2025-09-07T09:36:20.0540359Z * [new branch] mlazos/main2 -> origin/mlazos/main2 2025-09-07T09:36:20.0542486Z * [new branch] mlazos/mark-static-update -> origin/mlazos/mark-static-update 2025-09-07T09:36:20.0543892Z * [new branch] mlazos/mcg -> origin/mlazos/mcg 2025-09-07T09:36:20.0545833Z * [new branch] mlazos/mcg2 -> origin/mlazos/mcg2 2025-09-07T09:36:20.0547653Z * [new branch] mlazos/meta-guards -> origin/mlazos/meta-guards 2025-09-07T09:36:20.0549993Z * [new branch] mlazos/mlazos/ck2 -> origin/mlazos/mlazos/ck2 2025-09-07T09:36:20.0551427Z * [new branch] mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam 2025-09-07T09:36:20.0553431Z * [new branch] mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup 2025-09-07T09:36:20.0554869Z * [new branch] mlazos/mod-fix -> origin/mlazos/mod-fix 2025-09-07T09:36:20.0556940Z * [new branch] mlazos/mode-fix -> origin/mlazos/mode-fix 2025-09-07T09:36:20.0558873Z * [new branch] mlazos/more-tests -> origin/mlazos/more-tests 2025-09-07T09:36:20.0560319Z * [new branch] mlazos/no-cpp -> origin/mlazos/no-cpp 2025-09-07T09:36:20.0562141Z * [new branch] mlazos/no-init-group-handling -> origin/mlazos/no-init-group-handling 2025-09-07T09:36:20.0563812Z * [new branch] mlazos/offsets -> origin/mlazos/offsets 2025-09-07T09:36:20.0565675Z * [new branch] mlazos/opt-bench-exp2 -> origin/mlazos/opt-bench-exp2 2025-09-07T09:36:20.0567468Z * [new branch] mlazos/opt-incr -> origin/mlazos/opt-incr 2025-09-07T09:36:20.0569231Z * [new branch] mlazos/proxy-ctors -> origin/mlazos/proxy-ctors 2025-09-07T09:36:20.0570895Z * [new branch] mlazos/quant-fix -> origin/mlazos/quant-fix 2025-09-07T09:36:20.0572679Z * [new branch] mlazos/resnet-fix -> origin/mlazos/resnet-fix 2025-09-07T09:36:20.0574418Z * [new branch] mlazos/revert-inline -> origin/mlazos/revert-inline 2025-09-07T09:36:20.0576350Z * [new branch] mlazos/rm-buf-names -> origin/mlazos/rm-buf-names 2025-09-07T09:36:20.0578031Z * [new branch] mlazos/rm-code -> origin/mlazos/rm-code 2025-09-07T09:36:20.0579761Z * [new branch] mlazos/rm-spam -> origin/mlazos/rm-spam 2025-09-07T09:36:20.0581587Z * [new branch] mlazos/rtp -> origin/mlazos/rtp 2025-09-07T09:36:20.0583322Z * [new branch] mlazos/static-idx-dbg -> origin/mlazos/static-idx-dbg 2025-09-07T09:36:20.0585289Z * [new branch] mlazos/static-inputs-log -> origin/mlazos/static-inputs-log 2025-09-07T09:36:20.0587258Z * [new branch] mlazos/sub-param-fix -> origin/mlazos/sub-param-fix 2025-09-07T09:36:20.0588872Z * [new branch] mlazos/td-fix2 -> origin/mlazos/td-fix2 2025-09-07T09:36:20.0590756Z * [new branch] mlazos/tensor-hasattr2 -> origin/mlazos/tensor-hasattr2 2025-09-07T09:36:20.0592284Z * [new branch] mlazos/test -> origin/mlazos/test 2025-09-07T09:36:20.0593993Z * [new branch] mlazos/tf-mode -> origin/mlazos/tf-mode 2025-09-07T09:36:20.0596213Z * [new branch] mlazos/tf-mode-backup2 -> origin/mlazos/tf-mode-backup2 2025-09-07T09:36:20.0597889Z * [new branch] mlazos/tf-mode-reland -> origin/mlazos/tf-mode-reland 2025-09-07T09:36:20.0599793Z * [new branch] mlazos/tf-mode-reland2 -> origin/mlazos/tf-mode-reland2 2025-09-07T09:36:20.0601599Z * [new branch] mlazos/tf-mode-reland3 -> origin/mlazos/tf-mode-reland3 2025-09-07T09:36:20.0603435Z * [new branch] mlazos/topo-fix -> origin/mlazos/topo-fix 2025-09-07T09:36:20.0605396Z * [new branch] mlazos/triton-no-epi -> origin/mlazos/triton-no-epi 2025-09-07T09:36:20.0607188Z * [new branch] mlazos/tune-proto -> origin/mlazos/tune-proto 2025-09-07T09:36:20.0608872Z * [new branch] mlazos/tuple-fixes -> origin/mlazos/tuple-fixes 2025-09-07T09:36:20.0610617Z * [new branch] mlazos/tuple-fixes2 -> origin/mlazos/tuple-fixes2 2025-09-07T09:36:20.0612425Z * [new branch] mlazos/tuple-handling -> origin/mlazos/tuple-handling 2025-09-07T09:36:20.0614110Z * [new branch] mlazos/user-streams -> origin/mlazos/user-streams 2025-09-07T09:36:20.0616021Z * [new branch] mlazos/vary-beta -> origin/mlazos/vary-beta 2025-09-07T09:36:20.0617799Z * [new branch] mlazos/vary-beta2 -> origin/mlazos/vary-beta2 2025-09-07T09:36:20.0619565Z * [new branch] mlazos/weird-perf1 -> origin/mlazos/weird-perf1 2025-09-07T09:36:20.0621528Z * [new branch] mm_out_dtype_compile -> origin/mm_out_dtype_compile 2025-09-07T09:36:20.0623623Z * [new branch] modify-setupvllm -> origin/modify-setupvllm 2025-09-07T09:36:20.0625489Z * [new branch] module-shim -> origin/module-shim 2025-09-07T09:36:20.0627510Z * [new branch] move-theme-out-docker -> origin/move-theme-out-docker 2025-09-07T09:36:20.0629877Z * [new branch] msaroufim/be1 -> origin/msaroufim/be1 2025-09-07T09:36:20.0631451Z * [new branch] msaroufim/cn_path -> origin/msaroufim/cn_path 2025-09-07T09:36:20.0633099Z * [new branch] msaroufim/dtensorfusedadam -> origin/msaroufim/dtensorfusedadam 2025-09-07T09:36:20.0634560Z * [new branch] msaroufim/reduce -> origin/msaroufim/reduce 2025-09-07T09:36:20.0637176Z * [new branch] mtia/basic-cmake -> origin/mtia/basic-cmake 2025-09-07T09:36:20.0638927Z * [new branch] muon_dev -> origin/muon_dev 2025-09-07T09:36:20.0640826Z * [new branch] muon_dev_1 -> origin/muon_dev_1 2025-09-07T09:36:20.0642822Z * [new branch] nativert_num_outputs -> origin/nativert_num_outputs 2025-09-07T09:36:20.0644769Z * [new branch] nativert_numoutputs -> origin/nativert_numoutputs 2025-09-07T09:36:20.0646958Z * [new branch] new-modifiy-setupvllm -> origin/new-modifiy-setupvllm 2025-09-07T09:36:20.0648752Z * [new branch] new-setupvllm -> origin/new-setupvllm 2025-09-07T09:36:20.0650639Z * [new branch] new_zeros_dtype -> origin/new_zeros_dtype 2025-09-07T09:36:20.0652415Z * [new branch] newtest-base -> origin/newtest-base 2025-09-07T09:36:20.0654706Z * [new branch] ngimel/cat_perf1 -> origin/ngimel/cat_perf1 2025-09-07T09:36:20.0656364Z * [new branch] ngimel/einsum_fix -> origin/ngimel/einsum_fix 2025-09-07T09:36:20.0657883Z * [new branch] ngimel/error_index_list -> origin/ngimel/error_index_list 2025-09-07T09:36:20.0659415Z * [new branch] ngimel/fabric_check -> origin/ngimel/fabric_check 2025-09-07T09:36:20.0660750Z * [new branch] ngimel/fabric_fix -> origin/ngimel/fabric_fix 2025-09-07T09:36:20.0662406Z * [new branch] ngimel/fix_driver_init_error -> origin/ngimel/fix_driver_init_error 2025-09-07T09:36:20.0664044Z * [new branch] ngimel/fix_nccl_segment_seg -> origin/ngimel/fix_nccl_segment_seg 2025-09-07T09:36:20.0665582Z * [new branch] ngimel/gg_new -> origin/ngimel/gg_new 2025-09-07T09:36:20.0667097Z * [new branch] ngimel/modeguard -> origin/ngimel/modeguard 2025-09-07T09:36:20.0668584Z * [new branch] ngimel/multicast_fix -> origin/ngimel/multicast_fix 2025-09-07T09:36:20.0670234Z * [new branch] ngimel/rocm_handle_type -> origin/ngimel/rocm_handle_type 2025-09-07T09:36:20.0671724Z * [new branch] ngimel/symm_handle_fabric -> origin/ngimel/symm_handle_fabric 2025-09-07T09:36:20.0673128Z * [new branch] ngimel/unbind_multimem -> origin/ngimel/unbind_multimem 2025-09-07T09:36:20.0675163Z * [new branch] nightly -> origin/nightly 2025-09-07T09:36:20.0677791Z * [new branch] nmacchioni-patch-10 -> origin/nmacchioni-patch-10 2025-09-07T09:36:20.0679656Z * [new branch] nmacchioni-patch-7 -> origin/nmacchioni-patch-7 2025-09-07T09:36:20.0681599Z * [new branch] nmacchioni-patch-8 -> origin/nmacchioni-patch-8 2025-09-07T09:36:20.0683744Z * [new branch] nmacchioni-patch-9 -> origin/nmacchioni-patch-9 2025-09-07T09:36:20.0686299Z * [new branch] nullplay/fuse_matmul -> origin/nullplay/fuse_matmul 2025-09-07T09:36:20.0688110Z * [new branch] nullplay_fuse_matmul -> origin/nullplay_fuse_matmul 2025-09-07T09:36:20.0690186Z * [new branch] one-off -> origin/one-off 2025-09-07T09:36:20.0693034Z * [new branch] orig/release/1.10 -> origin/orig/release/1.10 2025-09-07T09:36:20.0694547Z * [new branch] orig/release/1.11 -> origin/orig/release/1.11 2025-09-07T09:36:20.0696349Z * [new branch] orig/release/1.12 -> origin/orig/release/1.12 2025-09-07T09:36:20.0698104Z * [new branch] orig/release/1.13 -> origin/orig/release/1.13 2025-09-07T09:36:20.0699825Z * [new branch] orig/release/1.6 -> origin/orig/release/1.6 2025-09-07T09:36:20.0701662Z * [new branch] orig/release/1.7 -> origin/orig/release/1.7 2025-09-07T09:36:20.0703420Z * [new branch] orig/release/1.8 -> origin/orig/release/1.8 2025-09-07T09:36:20.0704918Z * [new branch] orig/release/1.9 -> origin/orig/release/1.9 2025-09-07T09:36:20.0706827Z * [new branch] orig/release/2.0 -> origin/orig/release/2.0 2025-09-07T09:36:20.0708513Z * [new branch] orig/release/2.1 -> origin/orig/release/2.1 2025-09-07T09:36:20.0710047Z * [new branch] orig/release/2.2 -> origin/orig/release/2.2 2025-09-07T09:36:20.0711677Z * [new branch] orig/release/2.3 -> origin/orig/release/2.3 2025-09-07T09:36:20.0713266Z * [new branch] orig/release/2.4 -> origin/orig/release/2.4 2025-09-07T09:36:20.0714863Z * [new branch] orig/release/2.5 -> origin/orig/release/2.5 2025-09-07T09:36:20.0716842Z * [new branch] orig/release/2.6 -> origin/orig/release/2.6 2025-09-07T09:36:20.0718410Z * [new branch] orig/release/2.7 -> origin/orig/release/2.7 2025-09-07T09:36:20.0719894Z * [new branch] orig/release/2.8 -> origin/orig/release/2.8 2025-09-07T09:36:20.0722294Z * [new branch] oulgen/fx_graph -> origin/oulgen/fx_graph 2025-09-07T09:36:20.0724196Z * [new branch] padded-tensor -> origin/padded-tensor 2025-09-07T09:36:20.0726351Z * [new branch] pca2 -> origin/pca2 2025-09-07T09:36:20.0728478Z * [new branch] pianpwk-patch-1 -> origin/pianpwk-patch-1 2025-09-07T09:36:20.0730735Z * [new branch] pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export 2025-09-07T09:36:20.0732185Z * [new branch] pianpwk/invalidate_fake_memo -> origin/pianpwk/invalidate_fake_memo 2025-09-07T09:36:20.0733814Z * [new branch] pianpwk/max_1_strides -> origin/pianpwk/max_1_strides 2025-09-07T09:36:20.0735305Z * [new branch] pianpwk/maybe_guard_rel -> origin/pianpwk/maybe_guard_rel 2025-09-07T09:36:20.0736995Z * [new branch] pianpwk/nonzero_memo -> origin/pianpwk/nonzero_memo 2025-09-07T09:36:20.0738442Z * [new branch] pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better 2025-09-07T09:36:20.0739979Z * [new branch] pianpwk/oblivious_slice_forward -> origin/pianpwk/oblivious_slice_forward 2025-09-07T09:36:20.0741286Z * [new branch] pianpwk/oblivious_where -> origin/pianpwk/oblivious_where 2025-09-07T09:36:20.0744063Z * [new branch] pianpwk/param_static_pgo -> origin/pianpwk/param_static_pgo 2025-09-07T09:36:20.0745320Z * [new branch] pianpwk/pre_forward_hook -> origin/pianpwk/pre_forward_hook 2025-09-07T09:36:20.0746306Z * [new branch] pianpwk/remove_guard_fail_break -> origin/pianpwk/remove_guard_fail_break 2025-09-07T09:36:20.0747968Z * [new branch] pianpwk/slice_fresh_symbols -> origin/pianpwk/slice_fresh_symbols 2025-09-07T09:36:20.0749402Z * [new branch] pianpwk/sym_tokens_draft -> origin/pianpwk/sym_tokens_draft 2025-09-07T09:36:20.0750863Z * [new branch] pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false 2025-09-07T09:36:20.0752425Z * [new branch] pianpwk/test_slice_fake_impl -> origin/pianpwk/test_slice_fake_impl 2025-09-07T09:36:20.0753835Z * [new branch] pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap 2025-09-07T09:36:20.0755699Z * [new branch] pianpwk/unbacked_channels_last -> origin/pianpwk/unbacked_channels_last 2025-09-07T09:36:20.0757179Z * [new branch] pianpwk/unbacked_safe_conv1d -> origin/pianpwk/unbacked_safe_conv1d 2025-09-07T09:36:20.0758714Z * [new branch] pianpwk/unbacked_sdpa_flash -> origin/pianpwk/unbacked_sdpa_flash 2025-09-07T09:36:20.0760454Z * [new branch] pianpwk/unbacked_should_swap -> origin/pianpwk/unbacked_should_swap 2025-09-07T09:36:20.0761950Z * [new branch] pianpwk/unbacked_should_swap_2 -> origin/pianpwk/unbacked_should_swap_2 2025-09-07T09:36:20.0763418Z * [new branch] pianpwk/unbacked_slice_binding -> origin/pianpwk/unbacked_slice_binding 2025-09-07T09:36:20.0765188Z * [new branch] pianpwk/unbacked_slice_forward -> origin/pianpwk/unbacked_slice_forward 2025-09-07T09:36:20.0767623Z * [new branch] pianpwk/user_symints -> origin/pianpwk/user_symints 2025-09-07T09:36:20.0769343Z * [new branch] pianpwk/wan21_reshape -> origin/pianpwk/wan21_reshape 2025-09-07T09:36:20.0771203Z * [new branch] pianpwk/whitelist_optimizer -> origin/pianpwk/whitelist_optimizer 2025-09-07T09:36:20.0773291Z * [new branch] pin-torchao -> origin/pin-torchao 2025-09-07T09:36:20.0775852Z * [new branch] piz/fall_back_missing_0716 -> origin/piz/fall_back_missing_0716 2025-09-07T09:36:20.0777371Z * [new branch] piz/improve_scatter_0808 -> origin/piz/improve_scatter_0808 2025-09-07T09:36:20.0779390Z * [new branch] pool-separate -> origin/pool-separate 2025-09-07T09:36:20.0781317Z * [new branch] pr-156087 -> origin/pr-156087 2025-09-07T09:36:20.0783819Z * [new branch] pr/131860 -> origin/pr/131860 2025-09-07T09:36:20.0785955Z * [new branch] predispatch_to -> origin/predispatch_to 2025-09-07T09:36:20.0787795Z * [new branch] pt-opt-cuda3 -> origin/pt-opt-cuda3 2025-09-07T09:36:20.0789705Z * [new branch] pyobjectslot -> origin/pyobjectslot 2025-09-07T09:36:20.0792158Z * [new branch] python_compiled_autograd -> origin/python_compiled_autograd 2025-09-07T09:36:20.0795081Z * [new branch] qchip/export-D54134695 -> origin/qchip/export-D54134695 2025-09-07T09:36:20.0797071Z * [new branch] quint-bits -> origin/quint-bits 2025-09-07T09:36:20.0799469Z * [new branch] release/1.10 -> origin/release/1.10 2025-09-07T09:36:20.0801103Z * [new branch] release/1.11 -> origin/release/1.11 2025-09-07T09:36:20.0802916Z * [new branch] release/1.12 -> origin/release/1.12 2025-09-07T09:36:20.0804441Z * [new branch] release/1.13 -> origin/release/1.13 2025-09-07T09:36:20.0806281Z * [new branch] release/1.4 -> origin/release/1.4 2025-09-07T09:36:20.0807616Z * [new branch] release/1.4.1 -> origin/release/1.4.1 2025-09-07T09:36:20.0809242Z * [new branch] release/1.5 -> origin/release/1.5 2025-09-07T09:36:20.0810809Z * [new branch] release/1.6 -> origin/release/1.6 2025-09-07T09:36:20.0812427Z * [new branch] release/1.7 -> origin/release/1.7 2025-09-07T09:36:20.0814272Z * [new branch] release/1.8 -> origin/release/1.8 2025-09-07T09:36:20.0815991Z * [new branch] release/1.9 -> origin/release/1.9 2025-09-07T09:36:20.0820191Z * [new branch] release/2.0 -> origin/release/2.0 2025-09-07T09:36:20.0822018Z * [new branch] release/2.1 -> origin/release/2.1 2025-09-07T09:36:20.0823718Z * [new branch] release/2.2 -> origin/release/2.2 2025-09-07T09:36:20.0825429Z * [new branch] release/2.3 -> origin/release/2.3 2025-09-07T09:36:20.0827245Z * [new branch] release/2.4 -> origin/release/2.4 2025-09-07T09:36:20.0828895Z * [new branch] release/2.5 -> origin/release/2.5 2025-09-07T09:36:20.0830548Z * [new branch] release/2.6 -> origin/release/2.6 2025-09-07T09:36:20.0832157Z * [new branch] release/2.7 -> origin/release/2.7 2025-09-07T09:36:20.0833833Z * [new branch] release/2.8 -> origin/release/2.8 2025-09-07T09:36:20.0836032Z * [new branch] release_notes -> origin/release_notes 2025-09-07T09:36:20.0837776Z * [new branch] remove-actionable-label -> origin/remove-actionable-label 2025-09-07T09:36:20.0839584Z * [new branch] remove-ao -> origin/remove-ao 2025-09-07T09:36:20.0841614Z * [new branch] removedeprecatedvllmtest -> origin/removedeprecatedvllmtest 2025-09-07T09:36:20.0843446Z * [new branch] replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836 2025-09-07T09:36:20.0845387Z * [new branch] replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248 2025-09-07T09:36:20.0847318Z * [new branch] replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324 2025-09-07T09:36:20.0849577Z * [new branch] replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020 2025-09-07T09:36:20.0851474Z * [new branch] replace-pytorch-labs-20250812-204125 -> origin/replace-pytorch-labs-20250812-204125 2025-09-07T09:36:20.0853212Z * [new branch] replace-pytorch-labs-20250812-205624 -> origin/replace-pytorch-labs-20250812-205624 2025-09-07T09:36:20.0857010Z * [new branch] revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head 2025-09-07T09:36:20.0860487Z * [new branch] revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head 2025-09-07T09:36:20.0864172Z * [new branch] revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head 2025-09-07T09:36:20.0866367Z * [new branch] revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ 2025-09-07T09:36:20.0868610Z * [new branch] rocm-monitoring -> origin/rocm-monitoring 2025-09-07T09:36:20.0870768Z * [new branch] ruisi/relax_memory -> origin/ruisi/relax_memory 2025-09-07T09:36:20.0872673Z * [new branch] run-torchbench-smoke-test-h100 -> origin/run-torchbench-smoke-test-h100 2025-09-07T09:36:20.0875389Z * [new branch] ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures 2025-09-07T09:36:20.0876645Z * [new branch] ryanguo99/fix-closure-var -> origin/ryanguo99/fix-closure-var 2025-09-07T09:36:20.0878969Z * [new branch] rzou/faketensor_bench -> origin/rzou/faketensor_bench 2025-09-07T09:36:20.0880571Z * [new branch] rzou/njt -> origin/rzou/njt 2025-09-07T09:36:20.0882204Z * [new branch] rzou/pca -> origin/rzou/pca 2025-09-07T09:36:20.0883649Z * [new branch] rzou/realprop -> origin/rzou/realprop 2025-09-07T09:36:20.0885527Z * [new branch] rzou/setup_context -> origin/rzou/setup_context 2025-09-07T09:36:20.0887961Z * [new branch] sanchitintel/refactor_aten_int8_woq_gemm -> origin/sanchitintel/refactor_aten_int8_woq_gemm 2025-09-07T09:36:20.0889605Z * [new branch] sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm 2025-09-07T09:36:20.0891281Z * [new branch] sapling-pr-archive-SS-JIA -> origin/sapling-pr-archive-SS-JIA 2025-09-07T09:36:20.0892871Z * [new branch] save -> origin/save 2025-09-07T09:36:20.0895253Z * [new branch] sdym/2.5.1 -> origin/sdym/2.5.1 2025-09-07T09:36:20.0897203Z * [new branch] seemethere-patch-1 -> origin/seemethere-patch-1 2025-09-07T09:36:20.0898867Z * [new branch] setupvllm -> origin/setupvllm 2025-09-07T09:36:20.0900645Z * [new branch] share_and_pin_fork -> origin/share_and_pin_fork 2025-09-07T09:36:20.0903158Z * [new branch] shengf/fx-xform-perf -> origin/shengf/fx-xform-perf 2025-09-07T09:36:20.0904713Z * [new branch] shikaili_fp8_allgather -> origin/shikaili_fp8_allgather 2025-09-07T09:36:20.0906778Z * [new branch] shoumikhin-patch-1 -> origin/shoumikhin-patch-1 2025-09-07T09:36:20.0908534Z * [new branch] shoumikhin-patch-12 -> origin/shoumikhin-patch-12 2025-09-07T09:36:20.0910302Z * [new branch] simplify-fq-per-channel -> origin/simplify-fq-per-channel 2025-09-07T09:36:20.0911991Z * [new branch] solve-accuracy-fix -> origin/solve-accuracy-fix 2025-09-07T09:36:20.0914113Z * [new branch] soulitzer/stash-tls-ac -> origin/soulitzer/stash-tls-ac 2025-09-07T09:36:20.0916726Z * [new branch] sqzhang/flight4 -> origin/sqzhang/flight4 2025-09-07T09:36:20.0918380Z * [new branch] sqzhang/flight4plus -> origin/sqzhang/flight4plus 2025-09-07T09:36:20.0920415Z * [new branch] sraikund/record_funct_test -> origin/sraikund/record_funct_test 2025-09-07T09:36:20.0922583Z * [new branch] sraikund16/test -> origin/sraikund16/test 2025-09-07T09:36:20.0924470Z * [new branch] stablize-compilation-time -> origin/stablize-compilation-time 2025-09-07T09:36:20.0926909Z * [new branch] standalone-templates -> origin/standalone-templates 2025-09-07T09:36:20.0928459Z * [new branch] standalone_package_weights -> origin/standalone_package_weights 2025-09-07T09:36:20.0930184Z * [new branch] starterTaskUpdate -> origin/starterTaskUpdate 2025-09-07T09:36:20.0931938Z * [new branch] subgraph_fuse -> origin/subgraph_fuse 2025-09-07T09:36:20.0933747Z * [new branch] support-uv-in-collect_env -> origin/support-uv-in-collect_env 2025-09-07T09:36:20.0935484Z * [new branch] sve-poc -> origin/sve-poc 2025-09-07T09:36:20.0937453Z * [new branch] svekars-patch-1 -> origin/svekars-patch-1 2025-09-07T09:36:20.0939188Z * [new branch] switch-bn -> origin/switch-bn 2025-09-07T09:36:20.0941131Z * [new branch] sympy-bottleneck-repro -> origin/sympy-bottleneck-repro 2025-09-07T09:36:20.0943362Z * [new branch] tenpercent/ck_rocm_ci_v3 -> origin/tenpercent/ck_rocm_ci_v3 2025-09-07T09:36:20.0945223Z * [new branch] tensordict_integration -> origin/tensordict_integration 2025-09-07T09:36:20.0947081Z * [new branch] test-7054 -> origin/test-7054 2025-09-07T09:36:20.0949006Z * [new branch] test-move-conda-builds -> origin/test-move-conda-builds 2025-09-07T09:36:20.0950869Z * [new branch] test-myst-markdown-docstring -> origin/test-myst-markdown-docstring 2025-09-07T09:36:20.0952731Z * [new branch] test-old -> origin/test-old 2025-09-07T09:36:20.0954437Z * [new branch] test-vec-migration-internally -> origin/test-vec-migration-internally 2025-09-07T09:36:20.0956829Z * [new branch] test/bmm_heur -> origin/test/bmm_heur 2025-09-07T09:36:20.0958403Z * [new branch] test/inductor -> origin/test/inductor 2025-09-07T09:36:20.0960594Z * [new branch] tianren/flex_paged_attn_fix -> origin/tianren/flex_paged_attn_fix 2025-09-07T09:36:20.0962187Z * [new branch] tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp 2025-09-07T09:36:20.0963793Z * [new branch] tianren/test -> origin/tianren/test 2025-09-07T09:36:20.0965570Z * [new branch] tidy_performance_cyy -> origin/tidy_performance_cyy 2025-09-07T09:36:20.0967607Z * [new branch] torchtitan_ep -> origin/torchtitan_ep 2025-09-07T09:36:20.0969416Z * [new branch] trace_fsdp_torchtune_lora -> origin/trace_fsdp_torchtune_lora 2025-09-07T09:36:20.0971069Z * [new branch] traceable_fsdp_unit_tests -> origin/traceable_fsdp_unit_tests 2025-09-07T09:36:20.0972809Z * [new branch] tree_loop_vec_base -> origin/tree_loop_vec_base 2025-09-07T09:36:20.0974617Z * [new branch] tree_vec_base -> origin/tree_vec_base 2025-09-07T09:36:20.0976682Z * [new branch] triton-update -> origin/triton-update 2025-09-07T09:36:20.0978353Z * [new branch] triton_kernel -> origin/triton_kernel 2025-09-07T09:36:20.0980008Z * [new branch] triton_kernel_perf -> origin/triton_kernel_perf 2025-09-07T09:36:20.0981745Z * [new branch] tt_pkg_1908 -> origin/tt_pkg_1908 2025-09-07T09:36:20.0983784Z * [new branch] tweak-transformer-dependabot -> origin/tweak-transformer-dependabot 2025-09-07T09:36:20.0985525Z * [new branch] type_dec -> origin/type_dec 2025-09-07T09:36:20.0987491Z * [new branch] udate-sphinx-dependancies -> origin/udate-sphinx-dependancies 2025-09-07T09:36:20.0989966Z * [new branch] update-audio-commit-hash/16818882925-1712-1 -> origin/update-audio-commit-hash/16818882925-1712-1 2025-09-07T09:36:20.0991359Z * [new branch] update-audio-commit-hash/16895560422-1720-1 -> origin/update-audio-commit-hash/16895560422-1720-1 2025-09-07T09:36:20.0993033Z * [new branch] update-audio-commit-hash/16924174496-1738-1 -> origin/update-audio-commit-hash/16924174496-1738-1 2025-09-07T09:36:20.0994462Z * [new branch] update-audio-commit-hash/17002010821-1749-1 -> origin/update-audio-commit-hash/17002010821-1749-1 2025-09-07T09:36:20.0996270Z * [new branch] update-audio-commit-hash/17056004427-1766-1 -> origin/update-audio-commit-hash/17056004427-1766-1 2025-09-07T09:36:20.0997842Z * [new branch] update-audio-commit-hash/17085054029-1767-1 -> origin/update-audio-commit-hash/17085054029-1767-1 2025-09-07T09:36:20.0999242Z * [new branch] update-audio-commit-hash/17142507405-1771-1 -> origin/update-audio-commit-hash/17142507405-1771-1 2025-09-07T09:36:20.1000623Z * [new branch] update-audio-commit-hash/17168762740-1773-1 -> origin/update-audio-commit-hash/17168762740-1773-1 2025-09-07T09:36:20.1002204Z * [new branch] update-audio-commit-hash/17311174639-1780-1 -> origin/update-audio-commit-hash/17311174639-1780-1 2025-09-07T09:36:20.1003568Z * [new branch] update-audio-commit-hash/17336898740-1781-1 -> origin/update-audio-commit-hash/17336898740-1781-1 2025-09-07T09:36:20.1005189Z * [new branch] update-audio-commit-hash/17389727684-1786-1 -> origin/update-audio-commit-hash/17389727684-1786-1 2025-09-07T09:36:20.1007421Z * [new branch] update-audio-commit-hash/17449538142-1790-1 -> origin/update-audio-commit-hash/17449538142-1790-1 2025-09-07T09:36:20.1008533Z * [new branch] update-audio-commit-hash/17507351808-1794-1 -> origin/update-audio-commit-hash/17507351808-1794-1 2025-09-07T09:36:20.1010217Z * [new branch] update-dynamic-shapes-doc -> origin/update-dynamic-shapes-doc 2025-09-07T09:36:20.1012510Z * [new branch] update-executorch-commit-hash/15694981040-1626-1 -> origin/update-executorch-commit-hash/15694981040-1626-1 2025-09-07T09:36:20.1014697Z * [new branch] update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2 2025-09-07T09:36:20.1017339Z * [new branch] update-vision-commit-hash/15336342773-1607-1 -> origin/update-vision-commit-hash/15336342773-1607-1 2025-09-07T09:36:20.1019515Z * [new branch] update-vllm-commit-hash/16737365217-1704-1 -> origin/update-vllm-commit-hash/16737365217-1704-1 2025-09-07T09:36:20.1021124Z * [new branch] update-vllm-commit-hash/16843157111-1713-1 -> origin/update-vllm-commit-hash/16843157111-1713-1 2025-09-07T09:36:20.1022685Z * [new branch] update-vllm-commit-hash/16855312394-1714-1 -> origin/update-vllm-commit-hash/16855312394-1714-1 2025-09-07T09:36:20.1024312Z * [new branch] update-vllm-commit-hash/16924174496-1738-1 -> origin/update-vllm-commit-hash/16924174496-1738-1 2025-09-07T09:36:20.1025960Z * [new branch] update-vllm-commit-hash/16952608705-1745-1 -> origin/update-vllm-commit-hash/16952608705-1745-1 2025-09-07T09:36:20.1027484Z * [new branch] update-vllm-commit-hash/16979836546-1748-1 -> origin/update-vllm-commit-hash/16979836546-1748-1 2025-09-07T09:36:20.1028937Z * [new branch] update-vllm-commit-hash/17014576881-1756-1 -> origin/update-vllm-commit-hash/17014576881-1756-1 2025-09-07T09:36:20.1030372Z * [new branch] update-vllm-commit-hash/17027830869-1761-1 -> origin/update-vllm-commit-hash/17027830869-1761-1 2025-09-07T09:36:20.1032092Z * [new branch] update-vllm-commit-hash/17056004427-1766-1 -> origin/update-vllm-commit-hash/17056004427-1766-1 2025-09-07T09:36:20.1033491Z * [new branch] update-vllm-commit-hash/17085054029-1767-1 -> origin/update-vllm-commit-hash/17085054029-1767-1 2025-09-07T09:36:20.1035178Z * [new branch] update-vllm-commit-hash/17113610216-1768-1 -> origin/update-vllm-commit-hash/17113610216-1768-1 2025-09-07T09:36:20.1036885Z * [new branch] update-vllm-commit-hash/17142507405-1771-1 -> origin/update-vllm-commit-hash/17142507405-1771-1 2025-09-07T09:36:20.1038410Z * [new branch] update-vllm-commit-hash/17181878974-1774-1 -> origin/update-vllm-commit-hash/17181878974-1774-1 2025-09-07T09:36:20.1039890Z * [new branch] update-vllm-commit-hash/17311174639-1780-1 -> origin/update-vllm-commit-hash/17311174639-1780-1 2025-09-07T09:36:20.1041581Z * [new branch] update-vllm-commit-hash/17336898740-1781-1 -> origin/update-vllm-commit-hash/17336898740-1781-1 2025-09-07T09:36:20.1042996Z * [new branch] update-vllm-commit-hash/17364352302-1785-1 -> origin/update-vllm-commit-hash/17364352302-1785-1 2025-09-07T09:36:20.1044466Z * [new branch] update-vllm-commit-hash/17389727684-1786-1 -> origin/update-vllm-commit-hash/17389727684-1786-1 2025-09-07T09:36:20.1046321Z * [new branch] update-vllm-commit-hash/17449538142-1790-1 -> origin/update-vllm-commit-hash/17449538142-1790-1 2025-09-07T09:36:20.1047764Z * [new branch] update-vllm-commit-hash/17480069797-1791-1 -> origin/update-vllm-commit-hash/17480069797-1791-1 2025-09-07T09:36:20.1049503Z * [new branch] update-vllm-commit-hash/17507351808-1794-1 -> origin/update-vllm-commit-hash/17507351808-1794-1 2025-09-07T09:36:20.1051666Z * [new branch] update-xla-commit-hash/16873912760-198-1 -> origin/update-xla-commit-hash/16873912760-198-1 2025-09-07T09:36:20.1053410Z * [new branch] update-xla-commit-hash/17034266655-199-1 -> origin/update-xla-commit-hash/17034266655-199-1 2025-09-07T09:36:20.1054674Z * [new branch] update-xla-commit-hash/17202464405-200-1 -> origin/update-xla-commit-hash/17202464405-200-1 2025-09-07T09:36:20.1056854Z * [new branch] update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388 2025-09-07T09:36:20.1058576Z * [new branch] update_executorch_pin -> origin/update_executorch_pin 2025-09-07T09:36:20.1060297Z * [new branch] update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736 2025-09-07T09:36:20.1062253Z * [new branch] update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173 2025-09-07T09:36:20.1063980Z * [new branch] update_slow_tests_1752478971 -> origin/update_slow_tests_1752478971 2025-09-07T09:36:20.1066043Z * [new branch] update_slow_tests_1755502951 -> origin/update_slow_tests_1755502951 2025-09-07T09:36:20.1067846Z * [new branch] update_slow_tests_1756107664 -> origin/update_slow_tests_1756107664 2025-09-07T09:36:20.1069585Z * [new branch] update_submodule_FBGEMM -> origin/update_submodule_FBGEMM 2025-09-07T09:36:20.1071361Z * [new branch] update_submodule_kineto -> origin/update_submodule_kineto 2025-09-07T09:36:20.1072962Z * [new branch] update_submodule_tensorpipe -> origin/update_submodule_tensorpipe 2025-09-07T09:36:20.1074755Z * [new branch] v0.1.2 -> origin/v0.1.2 2025-09-07T09:36:20.1077043Z * [new branch] v1.0.1 -> origin/v1.0.1 2025-09-07T09:36:20.1078876Z * [new branch] v1.0.3 -> origin/v1.0.3 2025-09-07T09:36:20.1080717Z * [new branch] v1.1.0 -> origin/v1.1.0 2025-09-07T09:36:20.1082582Z * [new branch] v1.2.0 -> origin/v1.2.0 2025-09-07T09:36:20.1084392Z * [new branch] v1.3.0 -> origin/v1.3.0 2025-09-07T09:36:20.1086638Z * [new branch] v1.3.1 -> origin/v1.3.1 2025-09-07T09:36:20.1088333Z * [new branch] validate_fn -> origin/validate_fn 2025-09-07T09:36:20.1090235Z * [new branch] validations_2.6 -> origin/validations_2.6 2025-09-07T09:36:20.1092033Z * [new branch] validations_2.8 -> origin/validations_2.8 2025-09-07T09:36:20.1094392Z * [new branch] viable/strict -> origin/viable/strict 2025-09-07T09:36:20.1096409Z * [new branch] vllmbuildci -> origin/vllmbuildci 2025-09-07T09:36:20.1098251Z * [new branch] vllmpin -> origin/vllmpin 2025-09-07T09:36:20.1100568Z * [new branch] wdvr/conda_devcontainer -> origin/wdvr/conda_devcontainer 2025-09-07T09:36:20.1102154Z * [new branch] wdvr/iss_145259 -> origin/wdvr/iss_145259 2025-09-07T09:36:20.1103996Z * [new branch] weight_sharing_cpp -> origin/weight_sharing_cpp 2025-09-07T09:36:20.1106694Z * [new branch] whc/flight4 -> origin/whc/flight4 2025-09-07T09:36:20.1108077Z * [new branch] whc/flight51 -> origin/whc/flight51 2025-09-07T09:36:20.1109690Z * [new branch] whc/flight53 -> origin/whc/flight53 2025-09-07T09:36:20.1111441Z * [new branch] whc/stage2 -> origin/whc/stage2 2025-09-07T09:36:20.1112984Z * [new branch] whc/uneven -> origin/whc/uneven 2025-09-07T09:36:20.1114633Z * [new branch] whc/uneven-merge -> origin/whc/uneven-merge 2025-09-07T09:36:20.1116682Z * [new branch] win_warnings -> origin/win_warnings 2025-09-07T09:36:20.1118701Z * [new branch] windows_libtorch_free -> origin/windows_libtorch_free 2025-09-07T09:36:20.1120064Z * [new branch] workonoldcommit -> origin/workonoldcommit 2025-09-07T09:36:20.1122134Z * [new branch] wychi-autotune-prune-configs-by-shared-mem -> origin/wychi-autotune-prune-configs-by-shared-mem 2025-09-07T09:36:20.1124157Z * [new branch] xmfan/ca_0516 -> origin/xmfan/ca_0516 2025-09-07T09:36:20.1126050Z * [new branch] xmfan/ca_1051b93192 -> origin/xmfan/ca_1051b93192 2025-09-07T09:36:20.1127525Z * [new branch] xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 2025-09-07T09:36:20.1128780Z * [new branch] xmfan/ca_5a2be192d1 -> origin/xmfan/ca_5a2be192d1 2025-09-07T09:36:20.1130310Z * [new branch] xmfan/ca_9d59b516e9 -> origin/xmfan/ca_9d59b516e9 2025-09-07T09:36:20.1131939Z * [new branch] xmfan/ca_api -> origin/xmfan/ca_api 2025-09-07T09:36:20.1133353Z * [new branch] xmfan/ca_apr8 -> origin/xmfan/ca_apr8 2025-09-07T09:36:20.1135145Z * [new branch] xmfan/ca_base -> origin/xmfan/ca_base 2025-09-07T09:36:20.1136696Z * [new branch] xmfan/ca_cudagraphs -> origin/xmfan/ca_cudagraphs 2025-09-07T09:36:20.1138200Z * [new branch] xmfan/ca_dynamic -> origin/xmfan/ca_dynamic 2025-09-07T09:36:20.1139802Z * [new branch] xmfan/ca_fix_dyn -> origin/xmfan/ca_fix_dyn 2025-09-07T09:36:20.1141231Z * [new branch] xmfan/ca_fix_lowering -> origin/xmfan/ca_fix_lowering 2025-09-07T09:36:20.1142929Z * [new branch] xmfan/ca_fix_polyfills -> origin/xmfan/ca_fix_polyfills 2025-09-07T09:36:20.1144383Z * [new branch] xmfan/ca_jan3 -> origin/xmfan/ca_jan3 2025-09-07T09:36:20.1146174Z * [new branch] xmfan/ca_jun18 -> origin/xmfan/ca_jun18 2025-09-07T09:36:20.1147733Z * [new branch] xmfan/ca_jun24 -> origin/xmfan/ca_jun24 2025-09-07T09:36:20.1149249Z * [new branch] xmfan/ca_mem_base -> origin/xmfan/ca_mem_base 2025-09-07T09:36:20.1150844Z * [new branch] xmfan/ca_mem_fix -> origin/xmfan/ca_mem_fix 2025-09-07T09:36:20.1152370Z * [new branch] xmfan/ca_memory_fix -> origin/xmfan/ca_memory_fix 2025-09-07T09:36:20.1153910Z * [new branch] xmfan/ca_memory_fix_rebased -> origin/xmfan/ca_memory_fix_rebased 2025-09-07T09:36:20.1155721Z * [new branch] xmfan/ca_memory_fix_rebased2 -> origin/xmfan/ca_memory_fix_rebased2 2025-09-07T09:36:20.1157249Z * [new branch] xmfan/ca_move_to_cuda -> origin/xmfan/ca_move_to_cuda 2025-09-07T09:36:20.1158721Z * [new branch] xmfan/ca_nested -> origin/xmfan/ca_nested 2025-09-07T09:36:20.1160351Z * [new branch] xmfan/ca_overhead -> origin/xmfan/ca_overhead 2025-09-07T09:36:20.1161924Z * [new branch] xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451 2025-09-07T09:36:20.1163405Z * [new branch] xmfan/ca_scalar -> origin/xmfan/ca_scalar 2025-09-07T09:36:20.1165238Z * [new branch] xmfan/ca_subclass_mem_fix -> origin/xmfan/ca_subclass_mem_fix 2025-09-07T09:36:20.1166858Z * [new branch] xmfan/ca_warm_mem -> origin/xmfan/ca_warm_mem 2025-09-07T09:36:20.1168390Z * [new branch] xmfan/ca_warm_mem_base -> origin/xmfan/ca_warm_mem_base 2025-09-07T09:36:20.1170007Z * [new branch] xmfan/cacu_jun18 -> origin/xmfan/cacu_jun18 2025-09-07T09:36:20.1171486Z * [new branch] xmfan/cacu_jun19 -> origin/xmfan/cacu_jun19 2025-09-07T09:36:20.1172961Z * [new branch] xmfan/cacu_jun4 -> origin/xmfan/cacu_jun4 2025-09-07T09:36:20.1174761Z * [new branch] xmfan/cacu_may27 -> origin/xmfan/cacu_may27 2025-09-07T09:36:20.1176550Z * [new branch] xmfan/disable_duck_shape -> origin/xmfan/disable_duck_shape 2025-09-07T09:36:20.1178078Z * [new branch] xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough 2025-09-07T09:36:20.1179522Z * [new branch] xmfan/issue_123374 -> origin/xmfan/issue_123374 2025-09-07T09:36:20.1181294Z * [new branch] xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 2025-09-07T09:36:20.1183009Z * [new branch] xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 2025-09-07T09:36:20.1184375Z * [new branch] xmfan/segfault_test -> origin/xmfan/segfault_test 2025-09-07T09:36:20.1186305Z * [new branch] xmfan/single_step -> origin/xmfan/single_step 2025-09-07T09:36:20.1187842Z * [new branch] xmfan/sth_0829 -> origin/xmfan/sth_0829 2025-09-07T09:36:20.1189523Z * [new branch] xmfan/test -> origin/xmfan/test 2025-09-07T09:36:20.1191933Z * [new branch] yguo/debug-0226-constexpr -> origin/yguo/debug-0226-constexpr 2025-09-07T09:36:20.1193507Z * [new branch] yguo/new_latest_changes -> origin/yguo/new_latest_changes 2025-09-07T09:36:20.1195266Z * [new branch] yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes 2025-09-07T09:36:20.1196920Z * [new branch] yihan_quantization -> origin/yihan_quantization 2025-09-07T09:36:20.1199253Z * [new branch] yiming/add_jit_trace_benchmark -> origin/yiming/add_jit_trace_benchmark 2025-09-07T09:36:20.1200623Z * [new branch] yiming/add_nativert_benchmark -> origin/yiming/add_nativert_benchmark 2025-09-07T09:36:20.1202082Z * [new branch] yiming/bootcamp -> origin/yiming/bootcamp 2025-09-07T09:36:20.1204503Z * [new branch] zainr/canary-test -> origin/zainr/canary-test 2025-09-07T09:36:20.1206448Z * [new branch] zainr/cleanup-gh-runners -> origin/zainr/cleanup-gh-runners 2025-09-07T09:36:20.1207810Z * [new branch] zainr/git-push-v2 -> origin/zainr/git-push-v2 2025-09-07T09:36:20.1209277Z * [new branch] zainr/pull-migration-c -> origin/zainr/pull-migration-c 2025-09-07T09:36:20.1210875Z * [new branch] zainr/test -> origin/zainr/test 2025-09-07T09:36:20.1212248Z * [new branch] zainr/test2 -> origin/zainr/test2 2025-09-07T09:36:20.1213765Z * [new branch] zainr/unstable -> origin/zainr/unstable 2025-09-07T09:36:20.1215345Z * [new branch] zainr/unstable-xla -> origin/zainr/unstable-xla 2025-09-07T09:36:20.1217440Z * [new branch] zasdfgbnm-patch-3 -> origin/zasdfgbnm-patch-3 2025-09-07T09:36:20.1219135Z * [new branch] zb2p -> origin/zb2p 2025-09-07T09:36:20.1221034Z * [new branch] zero_grad_optimization -> origin/zero_grad_optimization 2025-09-07T09:36:20.1223107Z * [new branch] zeros-and-scatter-part2 -> origin/zeros-and-scatter-part2 2025-09-07T09:36:20.1226400Z * [new branch] zhxchen17/scratch/0 -> origin/zhxchen17/scratch/0 2025-09-07T09:36:20.1228697Z * [new branch] zhxhcen17/moodycamel -> origin/zhxhcen17/moodycamel 2025-09-07T09:36:20.1230856Z * [new branch] zxiiro/main -> origin/zxiiro/main 2025-09-07T09:36:20.1232280Z * [new tag] bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug -> bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug 2025-09-07T09:36:20.1233463Z * [new tag] ci/binaries/77164 -> ci/binaries/77164 2025-09-07T09:36:20.1235120Z * [new tag] ciflow/binaries/156049 -> ciflow/binaries/156049 2025-09-07T09:36:20.1235847Z * [new tag] ciflow/binaries/156712 -> ciflow/binaries/156712 2025-09-07T09:36:20.1236825Z * [new tag] ciflow/binaries/157432 -> ciflow/binaries/157432 2025-09-07T09:36:20.1237694Z * [new tag] ciflow/binaries/157685 -> ciflow/binaries/157685 2025-09-07T09:36:20.1238368Z * [new tag] ciflow/binaries/157689 -> ciflow/binaries/157689 2025-09-07T09:36:20.1239380Z * [new tag] ciflow/binaries/158104 -> ciflow/binaries/158104 2025-09-07T09:36:20.1240392Z * [new tag] ciflow/binaries/160229 -> ciflow/binaries/160229 2025-09-07T09:36:20.1241306Z * [new tag] ciflow/binaries/160720 -> ciflow/binaries/160720 2025-09-07T09:36:20.1242106Z * [new tag] ciflow/binaries/162080 -> ciflow/binaries/162080 2025-09-07T09:36:20.1243151Z * [new tag] ciflow/binaries/162329 -> ciflow/binaries/162329 2025-09-07T09:36:20.1244366Z * [new tag] ciflow/binaries_libtorch/156049 -> ciflow/binaries_libtorch/156049 2025-09-07T09:36:20.1255966Z * [new tag] ciflow/binaries_libtorch/156711 -> ciflow/binaries_libtorch/156711 2025-09-07T09:36:20.1256305Z * [new tag] ciflow/binaries_libtorch/157432 -> ciflow/binaries_libtorch/157432 2025-09-07T09:36:20.1256722Z * [new tag] ciflow/binaries_wheel/156049 -> ciflow/binaries_wheel/156049 2025-09-07T09:36:20.1256909Z * [new tag] ciflow/binaries_wheel/156711 -> ciflow/binaries_wheel/156711 2025-09-07T09:36:20.1257082Z * [new tag] ciflow/binaries_wheel/157432 -> ciflow/binaries_wheel/157432 2025-09-07T09:36:20.1257254Z * [new tag] ciflow/binaries_wheel/162136 -> ciflow/binaries_wheel/162136 2025-09-07T09:36:20.1257428Z * [new tag] ciflow/binaries_wheel/162252 -> ciflow/binaries_wheel/162252 2025-09-07T09:36:20.1257596Z * [new tag] ciflow/binaries_wheel/162325 -> ciflow/binaries_wheel/162325 2025-09-07T09:36:20.1257791Z * [new tag] ciflow/h100-distributed/156703 -> ciflow/h100-distributed/156703 2025-09-07T09:36:20.1257952Z * [new tag] ciflow/h100-symm-mem/157635 -> ciflow/h100-symm-mem/157635 2025-09-07T09:36:20.1258118Z * [new tag] ciflow/h100-symm-mem/161984 -> ciflow/h100-symm-mem/161984 2025-09-07T09:36:20.1258273Z * [new tag] ciflow/h100-symm-mem/162003 -> ciflow/h100-symm-mem/162003 2025-09-07T09:36:20.1258442Z * [new tag] ciflow/h100-symm-mem/162011 -> ciflow/h100-symm-mem/162011 2025-09-07T09:36:20.1258596Z * [new tag] ciflow/h100-symm-mem/162026 -> ciflow/h100-symm-mem/162026 2025-09-07T09:36:20.1258754Z * [new tag] ciflow/h100-symm-mem/162033 -> ciflow/h100-symm-mem/162033 2025-09-07T09:36:20.1259602Z * [new tag] ciflow/h100-symm-mem/162040 -> ciflow/h100-symm-mem/162040 2025-09-07T09:36:20.1260516Z * [new tag] ciflow/h100-symm-mem/162041 -> ciflow/h100-symm-mem/162041 2025-09-07T09:36:20.1261222Z * [new tag] ciflow/h100-symm-mem/162142 -> ciflow/h100-symm-mem/162142 2025-09-07T09:36:20.1262261Z * [new tag] ciflow/h100-symm-mem/162150 -> ciflow/h100-symm-mem/162150 2025-09-07T09:36:20.1263098Z * [new tag] ciflow/h100-symm-mem/162243 -> ciflow/h100-symm-mem/162243 2025-09-07T09:36:20.1263948Z * [new tag] ciflow/h100-symm-mem/162320 -> ciflow/h100-symm-mem/162320 2025-09-07T09:36:20.1265388Z * [new tag] ciflow/h100/159158 -> ciflow/h100/159158 2025-09-07T09:36:20.1266736Z * [new tag] ciflow/h100/160480 -> ciflow/h100/160480 2025-09-07T09:36:20.1267741Z * [new tag] ciflow/h100/161749 -> ciflow/h100/161749 2025-09-07T09:36:20.1268899Z * [new tag] ciflow/h100/162022 -> ciflow/h100/162022 2025-09-07T09:36:20.1269455Z * [new tag] ciflow/h100/162278 -> ciflow/h100/162278 2025-09-07T09:36:20.1270938Z * [new tag] ciflow/inductor-perf-test-nightly-rocm/156592 -> ciflow/inductor-perf-test-nightly-rocm/156592 2025-09-07T09:36:20.1271916Z * [new tag] ciflow/inductor-perf-test-nightly/156592 -> ciflow/inductor-perf-test-nightly/156592 2025-09-07T09:36:20.1273037Z * [new tag] ciflow/inductor-periodic/162063 -> ciflow/inductor-periodic/162063 2025-09-07T09:36:20.1273829Z * [new tag] ciflow/inductor-periodic/162227 -> ciflow/inductor-periodic/162227 2025-09-07T09:36:20.1275156Z * [new tag] ciflow/inductor-periodic/162323 -> ciflow/inductor-periodic/162323 2025-09-07T09:36:20.1276363Z * [new tag] ciflow/inductor-rocm/154170 -> ciflow/inductor-rocm/154170 2025-09-07T09:36:20.1277302Z * [new tag] ciflow/inductor-rocm/159146 -> ciflow/inductor-rocm/159146 2025-09-07T09:36:20.1278192Z * [new tag] ciflow/inductor-rocm/159158 -> ciflow/inductor-rocm/159158 2025-09-07T09:36:20.1279165Z * [new tag] ciflow/inductor-rocm/161715 -> ciflow/inductor-rocm/161715 2025-09-07T09:36:20.1280203Z * [new tag] ciflow/inductor-rocm/162053 -> ciflow/inductor-rocm/162053 2025-09-07T09:36:20.1281245Z * [new tag] ciflow/inductor-rocm/162056 -> ciflow/inductor-rocm/162056 2025-09-07T09:36:20.1282349Z * [new tag] ciflow/inductor/137400 -> ciflow/inductor/137400 2025-09-07T09:36:20.1283197Z * [new tag] ciflow/inductor/148180 -> ciflow/inductor/148180 2025-09-07T09:36:20.1284040Z * [new tag] ciflow/inductor/148328 -> ciflow/inductor/148328 2025-09-07T09:36:20.1284874Z * [new tag] ciflow/inductor/148484 -> ciflow/inductor/148484 2025-09-07T09:36:20.1286041Z * [new tag] ciflow/inductor/148492 -> ciflow/inductor/148492 2025-09-07T09:36:20.1286888Z * [new tag] ciflow/inductor/152624 -> ciflow/inductor/152624 2025-09-07T09:36:20.1287676Z * [new tag] ciflow/inductor/154694 -> ciflow/inductor/154694 2025-09-07T09:36:20.1288559Z * [new tag] ciflow/inductor/156049 -> ciflow/inductor/156049 2025-09-07T09:36:20.1289349Z * [new tag] ciflow/inductor/156592 -> ciflow/inductor/156592 2025-09-07T09:36:20.1290257Z * [new tag] ciflow/inductor/157635 -> ciflow/inductor/157635 2025-09-07T09:36:20.1290990Z * [new tag] ciflow/inductor/157685 -> ciflow/inductor/157685 2025-09-07T09:36:20.1291942Z * [new tag] ciflow/inductor/157686 -> ciflow/inductor/157686 2025-09-07T09:36:20.1292847Z * [new tag] ciflow/inductor/157689 -> ciflow/inductor/157689 2025-09-07T09:36:20.1293796Z * [new tag] ciflow/inductor/157699 -> ciflow/inductor/157699 2025-09-07T09:36:20.1294692Z * [new tag] ciflow/inductor/157743 -> ciflow/inductor/157743 2025-09-07T09:36:20.1296067Z * [new tag] ciflow/inductor/157994 -> ciflow/inductor/157994 2025-09-07T09:36:20.1296843Z * [new tag] ciflow/inductor/158091 -> ciflow/inductor/158091 2025-09-07T09:36:20.1297758Z * [new tag] ciflow/inductor/158104 -> ciflow/inductor/158104 2025-09-07T09:36:20.1298742Z * [new tag] ciflow/inductor/158404 -> ciflow/inductor/158404 2025-09-07T09:36:20.1299603Z * [new tag] ciflow/inductor/158647 -> ciflow/inductor/158647 2025-09-07T09:36:20.1300672Z * [new tag] ciflow/inductor/158932 -> ciflow/inductor/158932 2025-09-07T09:36:20.1301588Z * [new tag] ciflow/inductor/159146 -> ciflow/inductor/159146 2025-09-07T09:36:20.1302497Z * [new tag] ciflow/inductor/159158 -> ciflow/inductor/159158 2025-09-07T09:36:20.1303673Z * [new tag] ciflow/inductor/159274 -> ciflow/inductor/159274 2025-09-07T09:36:20.1304397Z * [new tag] ciflow/inductor/159664 -> ciflow/inductor/159664 2025-09-07T09:36:20.1305913Z * [new tag] ciflow/inductor/159778 -> ciflow/inductor/159778 2025-09-07T09:36:20.1306823Z * [new tag] ciflow/inductor/159835 -> ciflow/inductor/159835 2025-09-07T09:36:20.1307902Z * [new tag] ciflow/inductor/159944 -> ciflow/inductor/159944 2025-09-07T09:36:20.1309116Z * [new tag] ciflow/inductor/160161 -> ciflow/inductor/160161 2025-09-07T09:36:20.1309978Z * [new tag] ciflow/inductor/160174 -> ciflow/inductor/160174 2025-09-07T09:36:20.1311020Z * [new tag] ciflow/inductor/160323 -> ciflow/inductor/160323 2025-09-07T09:36:20.1312256Z * [new tag] ciflow/inductor/160324 -> ciflow/inductor/160324 2025-09-07T09:36:20.1313421Z * [new tag] ciflow/inductor/160325 -> ciflow/inductor/160325 2025-09-07T09:36:20.1314415Z * [new tag] ciflow/inductor/160326 -> ciflow/inductor/160326 2025-09-07T09:36:20.1315611Z * [new tag] ciflow/inductor/160327 -> ciflow/inductor/160327 2025-09-07T09:36:20.1316650Z * [new tag] ciflow/inductor/160328 -> ciflow/inductor/160328 2025-09-07T09:36:20.1317638Z * [new tag] ciflow/inductor/160329 -> ciflow/inductor/160329 2025-09-07T09:36:20.1318517Z * [new tag] ciflow/inductor/160480 -> ciflow/inductor/160480 2025-09-07T09:36:20.1319372Z * [new tag] ciflow/inductor/160483 -> ciflow/inductor/160483 2025-09-07T09:36:20.1320469Z * [new tag] ciflow/inductor/160532 -> ciflow/inductor/160532 2025-09-07T09:36:20.1321985Z * [new tag] ciflow/inductor/160539 -> ciflow/inductor/160539 2025-09-07T09:36:20.1322825Z * [new tag] ciflow/inductor/160580 -> ciflow/inductor/160580 2025-09-07T09:36:20.1323728Z * [new tag] ciflow/inductor/160685 -> ciflow/inductor/160685 2025-09-07T09:36:20.1324650Z * [new tag] ciflow/inductor/160686 -> ciflow/inductor/160686 2025-09-07T09:36:20.1325879Z * [new tag] ciflow/inductor/160687 -> ciflow/inductor/160687 2025-09-07T09:36:20.1326720Z * [new tag] ciflow/inductor/160688 -> ciflow/inductor/160688 2025-09-07T09:36:20.1327630Z * [new tag] ciflow/inductor/160690 -> ciflow/inductor/160690 2025-09-07T09:36:20.1328530Z * [new tag] ciflow/inductor/160706 -> ciflow/inductor/160706 2025-09-07T09:36:20.1329586Z * [new tag] ciflow/inductor/160729 -> ciflow/inductor/160729 2025-09-07T09:36:20.1330459Z * [new tag] ciflow/inductor/160798 -> ciflow/inductor/160798 2025-09-07T09:36:20.1331555Z * [new tag] ciflow/inductor/160836 -> ciflow/inductor/160836 2025-09-07T09:36:20.1332509Z * [new tag] ciflow/inductor/160843 -> ciflow/inductor/160843 2025-09-07T09:36:20.1333729Z * [new tag] ciflow/inductor/160869 -> ciflow/inductor/160869 2025-09-07T09:36:20.1334703Z * [new tag] ciflow/inductor/160920 -> ciflow/inductor/160920 2025-09-07T09:36:20.1335831Z * [new tag] ciflow/inductor/160928 -> ciflow/inductor/160928 2025-09-07T09:36:20.1336847Z * [new tag] ciflow/inductor/160943 -> ciflow/inductor/160943 2025-09-07T09:36:20.1337717Z * [new tag] ciflow/inductor/161092 -> ciflow/inductor/161092 2025-09-07T09:36:20.1338682Z * [new tag] ciflow/inductor/161093 -> ciflow/inductor/161093 2025-09-07T09:36:20.1339552Z * [new tag] ciflow/inductor/161109 -> ciflow/inductor/161109 2025-09-07T09:36:20.1340743Z * [new tag] ciflow/inductor/161118 -> ciflow/inductor/161118 2025-09-07T09:36:20.1341833Z * [new tag] ciflow/inductor/161178 -> ciflow/inductor/161178 2025-09-07T09:36:20.1342974Z * [new tag] ciflow/inductor/161246 -> ciflow/inductor/161246 2025-09-07T09:36:20.1343872Z * [new tag] ciflow/inductor/161349 -> ciflow/inductor/161349 2025-09-07T09:36:20.1344821Z * [new tag] ciflow/inductor/161350 -> ciflow/inductor/161350 2025-09-07T09:36:20.1346026Z * [new tag] ciflow/inductor/161351 -> ciflow/inductor/161351 2025-09-07T09:36:20.1347026Z * [new tag] ciflow/inductor/161397 -> ciflow/inductor/161397 2025-09-07T09:36:20.1348068Z * [new tag] ciflow/inductor/161404 -> ciflow/inductor/161404 2025-09-07T09:36:20.1349096Z * [new tag] ciflow/inductor/161405 -> ciflow/inductor/161405 2025-09-07T09:36:20.1349981Z * [new tag] ciflow/inductor/161406 -> ciflow/inductor/161406 2025-09-07T09:36:20.1351037Z * [new tag] ciflow/inductor/161410 -> ciflow/inductor/161410 2025-09-07T09:36:20.1351938Z * [new tag] ciflow/inductor/161414 -> ciflow/inductor/161414 2025-09-07T09:36:20.1353163Z * [new tag] ciflow/inductor/161442 -> ciflow/inductor/161442 2025-09-07T09:36:20.1354374Z * [new tag] ciflow/inductor/161458 -> ciflow/inductor/161458 2025-09-07T09:36:20.1355418Z * [new tag] ciflow/inductor/161468 -> ciflow/inductor/161468 2025-09-07T09:36:20.1356430Z * [new tag] ciflow/inductor/161469 -> ciflow/inductor/161469 2025-09-07T09:36:20.1357458Z * [new tag] ciflow/inductor/161485 -> ciflow/inductor/161485 2025-09-07T09:36:20.1358449Z * [new tag] ciflow/inductor/161499 -> ciflow/inductor/161499 2025-09-07T09:36:20.1359408Z * [new tag] ciflow/inductor/161534 -> ciflow/inductor/161534 2025-09-07T09:36:20.1360382Z * [new tag] ciflow/inductor/161595 -> ciflow/inductor/161595 2025-09-07T09:36:20.1361351Z * [new tag] ciflow/inductor/161596 -> ciflow/inductor/161596 2025-09-07T09:36:20.1362801Z * [new tag] ciflow/inductor/161630 -> ciflow/inductor/161630 2025-09-07T09:36:20.1363746Z * [new tag] ciflow/inductor/161667 -> ciflow/inductor/161667 2025-09-07T09:36:20.1364719Z * [new tag] ciflow/inductor/161670 -> ciflow/inductor/161670 2025-09-07T09:36:20.1365980Z * [new tag] ciflow/inductor/161673 -> ciflow/inductor/161673 2025-09-07T09:36:20.1366935Z * [new tag] ciflow/inductor/161674 -> ciflow/inductor/161674 2025-09-07T09:36:20.1367870Z * [new tag] ciflow/inductor/161675 -> ciflow/inductor/161675 2025-09-07T09:36:20.1368849Z * [new tag] ciflow/inductor/161693 -> ciflow/inductor/161693 2025-09-07T09:36:20.1369826Z * [new tag] ciflow/inductor/161695 -> ciflow/inductor/161695 2025-09-07T09:36:20.1370857Z * [new tag] ciflow/inductor/161715 -> ciflow/inductor/161715 2025-09-07T09:36:20.1371846Z * [new tag] ciflow/inductor/161730 -> ciflow/inductor/161730 2025-09-07T09:36:20.1372828Z * [new tag] ciflow/inductor/161732 -> ciflow/inductor/161732 2025-09-07T09:36:20.1373958Z * [new tag] ciflow/inductor/161744 -> ciflow/inductor/161744 2025-09-07T09:36:20.1375104Z * [new tag] ciflow/inductor/161746 -> ciflow/inductor/161746 2025-09-07T09:36:20.1376157Z * [new tag] ciflow/inductor/161747 -> ciflow/inductor/161747 2025-09-07T09:36:20.1377145Z * [new tag] ciflow/inductor/161819 -> ciflow/inductor/161819 2025-09-07T09:36:20.1378101Z * [new tag] ciflow/inductor/161821 -> ciflow/inductor/161821 2025-09-07T09:36:20.1379320Z * [new tag] ciflow/inductor/161828 -> ciflow/inductor/161828 2025-09-07T09:36:20.1380072Z * [new tag] ciflow/inductor/161879 -> ciflow/inductor/161879 2025-09-07T09:36:20.1381074Z * [new tag] ciflow/inductor/161880 -> ciflow/inductor/161880 2025-09-07T09:36:20.1382205Z * [new tag] ciflow/inductor/161881 -> ciflow/inductor/161881 2025-09-07T09:36:20.1383333Z * [new tag] ciflow/inductor/161907 -> ciflow/inductor/161907 2025-09-07T09:36:20.1384318Z * [new tag] ciflow/inductor/161914 -> ciflow/inductor/161914 2025-09-07T09:36:20.1385716Z * [new tag] ciflow/inductor/161924 -> ciflow/inductor/161924 2025-09-07T09:36:20.1386989Z * [new tag] ciflow/inductor/161936 -> ciflow/inductor/161936 2025-09-07T09:36:20.1387966Z * [new tag] ciflow/inductor/161938 -> ciflow/inductor/161938 2025-09-07T09:36:20.1388984Z * [new tag] ciflow/inductor/161939 -> ciflow/inductor/161939 2025-09-07T09:36:20.1389983Z * [new tag] ciflow/inductor/161940 -> ciflow/inductor/161940 2025-09-07T09:36:20.1390978Z * [new tag] ciflow/inductor/161955 -> ciflow/inductor/161955 2025-09-07T09:36:20.1392022Z * [new tag] ciflow/inductor/161957 -> ciflow/inductor/161957 2025-09-07T09:36:20.1393038Z * [new tag] ciflow/inductor/161975 -> ciflow/inductor/161975 2025-09-07T09:36:20.1394030Z * [new tag] ciflow/inductor/161977 -> ciflow/inductor/161977 2025-09-07T09:36:20.1395157Z * [new tag] ciflow/inductor/161978 -> ciflow/inductor/161978 2025-09-07T09:36:20.1396243Z * [new tag] ciflow/inductor/161979 -> ciflow/inductor/161979 2025-09-07T09:36:20.1397214Z * [new tag] ciflow/inductor/161980 -> ciflow/inductor/161980 2025-09-07T09:36:20.1398217Z * [new tag] ciflow/inductor/161988 -> ciflow/inductor/161988 2025-09-07T09:36:20.1399296Z * [new tag] ciflow/inductor/161994 -> ciflow/inductor/161994 2025-09-07T09:36:20.1400300Z * [new tag] ciflow/inductor/162013 -> ciflow/inductor/162013 2025-09-07T09:36:20.1401292Z * [new tag] ciflow/inductor/162014 -> ciflow/inductor/162014 2025-09-07T09:36:20.1402274Z * [new tag] ciflow/inductor/162017 -> ciflow/inductor/162017 2025-09-07T09:36:20.1403282Z * [new tag] ciflow/inductor/162021 -> ciflow/inductor/162021 2025-09-07T09:36:20.1404284Z * [new tag] ciflow/inductor/162023 -> ciflow/inductor/162023 2025-09-07T09:36:20.1405537Z * [new tag] ciflow/inductor/162027 -> ciflow/inductor/162027 2025-09-07T09:36:20.1406650Z * [new tag] ciflow/inductor/162029 -> ciflow/inductor/162029 2025-09-07T09:36:20.1407711Z * [new tag] ciflow/inductor/162030 -> ciflow/inductor/162030 2025-09-07T09:36:20.1408751Z * [new tag] ciflow/inductor/162031 -> ciflow/inductor/162031 2025-09-07T09:36:20.1409775Z * [new tag] ciflow/inductor/162033 -> ciflow/inductor/162033 2025-09-07T09:36:20.1411058Z * [new tag] ciflow/inductor/162052 -> ciflow/inductor/162052 2025-09-07T09:36:20.1412119Z * [new tag] ciflow/inductor/162053 -> ciflow/inductor/162053 2025-09-07T09:36:20.1413109Z * [new tag] ciflow/inductor/162056 -> ciflow/inductor/162056 2025-09-07T09:36:20.1414108Z * [new tag] ciflow/inductor/162063 -> ciflow/inductor/162063 2025-09-07T09:36:20.1415213Z * [new tag] ciflow/inductor/162066 -> ciflow/inductor/162066 2025-09-07T09:36:20.1416382Z * [new tag] ciflow/inductor/162068 -> ciflow/inductor/162068 2025-09-07T09:36:20.1417701Z * [new tag] ciflow/inductor/162081 -> ciflow/inductor/162081 2025-09-07T09:36:20.1418597Z * [new tag] ciflow/inductor/162088 -> ciflow/inductor/162088 2025-09-07T09:36:20.1419622Z * [new tag] ciflow/inductor/162089 -> ciflow/inductor/162089 2025-09-07T09:36:20.1420719Z * [new tag] ciflow/inductor/162094 -> ciflow/inductor/162094 2025-09-07T09:36:20.1421810Z * [new tag] ciflow/inductor/162098 -> ciflow/inductor/162098 2025-09-07T09:36:20.1422916Z * [new tag] ciflow/inductor/162101 -> ciflow/inductor/162101 2025-09-07T09:36:20.1423911Z * [new tag] ciflow/inductor/162102 -> ciflow/inductor/162102 2025-09-07T09:36:20.1425137Z * [new tag] ciflow/inductor/162104 -> ciflow/inductor/162104 2025-09-07T09:36:20.1426315Z * [new tag] ciflow/inductor/162106 -> ciflow/inductor/162106 2025-09-07T09:36:20.1427415Z * [new tag] ciflow/inductor/162108 -> ciflow/inductor/162108 2025-09-07T09:36:20.1428516Z * [new tag] ciflow/inductor/162126 -> ciflow/inductor/162126 2025-09-07T09:36:20.1429587Z * [new tag] ciflow/inductor/162149 -> ciflow/inductor/162149 2025-09-07T09:36:20.1430622Z * [new tag] ciflow/inductor/162164 -> ciflow/inductor/162164 2025-09-07T09:36:20.1431658Z * [new tag] ciflow/inductor/162166 -> ciflow/inductor/162166 2025-09-07T09:36:20.1432669Z * [new tag] ciflow/inductor/162169 -> ciflow/inductor/162169 2025-09-07T09:36:20.1433771Z * [new tag] ciflow/inductor/162170 -> ciflow/inductor/162170 2025-09-07T09:36:20.1434790Z * [new tag] ciflow/inductor/162171 -> ciflow/inductor/162171 2025-09-07T09:36:20.1436085Z * [new tag] ciflow/inductor/162183 -> ciflow/inductor/162183 2025-09-07T09:36:20.1437167Z * [new tag] ciflow/inductor/162189 -> ciflow/inductor/162189 2025-09-07T09:36:20.1438261Z * [new tag] ciflow/inductor/162190 -> ciflow/inductor/162190 2025-09-07T09:36:20.1439324Z * [new tag] ciflow/inductor/162191 -> ciflow/inductor/162191 2025-09-07T09:36:20.1440389Z * [new tag] ciflow/inductor/162194 -> ciflow/inductor/162194 2025-09-07T09:36:20.1441590Z * [new tag] ciflow/inductor/162200 -> ciflow/inductor/162200 2025-09-07T09:36:20.1442690Z * [new tag] ciflow/inductor/162201 -> ciflow/inductor/162201 2025-09-07T09:36:20.1443763Z * [new tag] ciflow/inductor/162208 -> ciflow/inductor/162208 2025-09-07T09:36:20.1445089Z * [new tag] ciflow/inductor/162211 -> ciflow/inductor/162211 2025-09-07T09:36:20.1446217Z * [new tag] ciflow/inductor/162216 -> ciflow/inductor/162216 2025-09-07T09:36:20.1447277Z * [new tag] ciflow/inductor/162220 -> ciflow/inductor/162220 2025-09-07T09:36:20.1448472Z * [new tag] ciflow/inductor/162222 -> ciflow/inductor/162222 2025-09-07T09:36:20.1449534Z * [new tag] ciflow/inductor/162227 -> ciflow/inductor/162227 2025-09-07T09:36:20.1450601Z * [new tag] ciflow/inductor/162238 -> ciflow/inductor/162238 2025-09-07T09:36:20.1451687Z * [new tag] ciflow/inductor/162239 -> ciflow/inductor/162239 2025-09-07T09:36:20.1452748Z * [new tag] ciflow/inductor/162240 -> ciflow/inductor/162240 2025-09-07T09:36:20.1453847Z * [new tag] ciflow/inductor/162244 -> ciflow/inductor/162244 2025-09-07T09:36:20.1455085Z * [new tag] ciflow/inductor/162245 -> ciflow/inductor/162245 2025-09-07T09:36:20.1456370Z * [new tag] ciflow/inductor/162262 -> ciflow/inductor/162262 2025-09-07T09:36:20.1457418Z * [new tag] ciflow/inductor/162275 -> ciflow/inductor/162275 2025-09-07T09:36:20.1458743Z * [new tag] ciflow/inductor/162278 -> ciflow/inductor/162278 2025-09-07T09:36:20.1459690Z * [new tag] ciflow/inductor/162284 -> ciflow/inductor/162284 2025-09-07T09:36:20.1460800Z * [new tag] ciflow/inductor/162286 -> ciflow/inductor/162286 2025-09-07T09:36:20.1462004Z * [new tag] ciflow/inductor/162288 -> ciflow/inductor/162288 2025-09-07T09:36:20.1463105Z * [new tag] ciflow/inductor/162293 -> ciflow/inductor/162293 2025-09-07T09:36:20.1464180Z * [new tag] ciflow/inductor/162294 -> ciflow/inductor/162294 2025-09-07T09:36:20.1465579Z * [new tag] ciflow/inductor/162295 -> ciflow/inductor/162295 2025-09-07T09:36:20.1466589Z * [new tag] ciflow/inductor/162296 -> ciflow/inductor/162296 2025-09-07T09:36:20.1467679Z * [new tag] ciflow/inductor/162298 -> ciflow/inductor/162298 2025-09-07T09:36:20.1468795Z * [new tag] ciflow/inductor/162307 -> ciflow/inductor/162307 2025-09-07T09:36:20.1470077Z * [new tag] ciflow/inductor/162309 -> ciflow/inductor/162309 2025-09-07T09:36:20.1471405Z * [new tag] ciflow/inductor/162311 -> ciflow/inductor/162311 2025-09-07T09:36:20.1472486Z * [new tag] ciflow/inductor/162312 -> ciflow/inductor/162312 2025-09-07T09:36:20.1473614Z * [new tag] ciflow/inductor/162315 -> ciflow/inductor/162315 2025-09-07T09:36:20.1474650Z * [new tag] ciflow/inductor/162316 -> ciflow/inductor/162316 2025-09-07T09:36:20.1476063Z * [new tag] ciflow/inductor/162318 -> ciflow/inductor/162318 2025-09-07T09:36:20.1477168Z * [new tag] ciflow/inductor/162323 -> ciflow/inductor/162323 2025-09-07T09:36:20.1478306Z * [new tag] ciflow/inductor/162341 -> ciflow/inductor/162341 2025-09-07T09:36:20.1479415Z * [new tag] ciflow/inductor/162345 -> ciflow/inductor/162345 2025-09-07T09:36:20.1480742Z * [new tag] ciflow/inductor/3b9a386 -> ciflow/inductor/3b9a386 2025-09-07T09:36:20.1481981Z * [new tag] ciflow/inductor/3d4b92b -> ciflow/inductor/3d4b92b 2025-09-07T09:36:20.1483237Z * [new tag] ciflow/inductor/d224ac7 -> ciflow/inductor/d224ac7 2025-09-07T09:36:20.1484398Z * [new tag] ciflow/linux-aarch64/157994 -> ciflow/linux-aarch64/157994 2025-09-07T09:36:20.1485422Z * [new tag] ciflow/linux-aarch64/159737 -> ciflow/linux-aarch64/159737 2025-09-07T09:36:20.1486464Z * [new tag] ciflow/linux-aarch64/160078 -> ciflow/linux-aarch64/160078 2025-09-07T09:36:20.1487618Z * [new tag] ciflow/mps/157553 -> ciflow/mps/157553 2025-09-07T09:36:20.1488434Z * [new tag] ciflow/mps/157635 -> ciflow/mps/157635 2025-09-07T09:36:20.1489283Z * [new tag] ciflow/mps/161988 -> ciflow/mps/161988 2025-09-07T09:36:20.1490108Z * [new tag] ciflow/mps/162108 -> ciflow/mps/162108 2025-09-07T09:36:20.1490968Z * [new tag] ciflow/mps/162153 -> ciflow/mps/162153 2025-09-07T09:36:20.1491812Z * [new tag] ciflow/mps/162281 -> ciflow/mps/162281 2025-09-07T09:36:20.1492979Z * [new tag] ciflow/nightly/156049 -> ciflow/nightly/156049 2025-09-07T09:36:20.1493864Z * [new tag] ciflow/nightly/158104 -> ciflow/nightly/158104 2025-09-07T09:36:20.1494905Z * [new tag] ciflow/op-benchmark/157994 -> ciflow/op-benchmark/157994 2025-09-07T09:36:20.1496430Z * [new tag] ciflow/periodic-rocm-mi300/161529 -> ciflow/periodic-rocm-mi300/161529 2025-09-07T09:36:20.1497199Z * [new tag] ciflow/periodic-rocm-mi300/161715 -> ciflow/periodic-rocm-mi300/161715 2025-09-07T09:36:20.1498731Z * [new tag] ciflow/periodic/054a2fd -> ciflow/periodic/054a2fd 2025-09-07T09:36:20.1499394Z * [new tag] ciflow/periodic/156703 -> ciflow/periodic/156703 2025-09-07T09:36:20.1500286Z * [new tag] ciflow/periodic/161715 -> ciflow/periodic/161715 2025-09-07T09:36:20.1501137Z * [new tag] ciflow/periodic/162021 -> ciflow/periodic/162021 2025-09-07T09:36:20.1502087Z * [new tag] ciflow/periodic/162323 -> ciflow/periodic/162323 2025-09-07T09:36:20.1503139Z * [new tag] ciflow/periodic/2a6d37d -> ciflow/periodic/2a6d37d 2025-09-07T09:36:20.1504138Z * [new tag] ciflow/periodic/317eeb8 -> ciflow/periodic/317eeb8 2025-09-07T09:36:20.1505359Z * [new tag] ciflow/periodic/3c32 -> ciflow/periodic/3c32 2025-09-07T09:36:20.1506480Z * [new tag] ciflow/periodic/3e98831 -> ciflow/periodic/3e98831 2025-09-07T09:36:20.1507582Z * [new tag] ciflow/periodic/94512-point -> ciflow/periodic/94512-point 2025-09-07T09:36:20.1508990Z * [new tag] ciflow/periodic/csl/test87519 -> ciflow/periodic/csl/test87519 2025-09-07T09:36:20.1510183Z * [new tag] ciflow/periodic/csltest88275 -> ciflow/periodic/csltest88275 2025-09-07T09:36:20.1511172Z * [new tag] ciflow/periodic/csltest88761 -> ciflow/periodic/csltest88761 2025-09-07T09:36:20.1512255Z * [new tag] ciflow/periodic/release_1.12 -> ciflow/periodic/release_1.12 2025-09-07T09:36:20.1513429Z * [new tag] ciflow/periodic/release_1.12.0 -> ciflow/periodic/release_1.12.0 2025-09-07T09:36:20.1514591Z * [new tag] ciflow/periodic/sha-ec5b83 -> ciflow/periodic/sha-ec5b83 2025-09-07T09:36:20.1515954Z * [new tag] ciflow/rocm-mi300/154170 -> ciflow/rocm-mi300/154170 2025-09-07T09:36:20.1516983Z * [new tag] ciflow/rocm-mi300/158747 -> ciflow/rocm-mi300/158747 2025-09-07T09:36:20.1517747Z * [new tag] ciflow/rocm-mi300/159146 -> ciflow/rocm-mi300/159146 2025-09-07T09:36:20.1518600Z * [new tag] ciflow/rocm-mi300/159158 -> ciflow/rocm-mi300/159158 2025-09-07T09:36:20.1519461Z * [new tag] ciflow/rocm-mi300/161715 -> ciflow/rocm-mi300/161715 2025-09-07T09:36:20.1520237Z * [new tag] ciflow/rocm-mi300/161957 -> ciflow/rocm-mi300/161957 2025-09-07T09:36:20.1521113Z * [new tag] ciflow/rocm-mi300/162053 -> ciflow/rocm-mi300/162053 2025-09-07T09:36:20.1521962Z * [new tag] ciflow/rocm-mi300/162056 -> ciflow/rocm-mi300/162056 2025-09-07T09:36:20.1522949Z * [new tag] ciflow/rocm-mi300/162112 -> ciflow/rocm-mi300/162112 2025-09-07T09:36:20.1523839Z * [new tag] ciflow/rocm-mi300/162245 -> ciflow/rocm-mi300/162245 2025-09-07T09:36:20.1524569Z * [new tag] ciflow/rocm-mi300/162278 -> ciflow/rocm-mi300/162278 2025-09-07T09:36:20.1525700Z * [new tag] ciflow/rocm-mi300/162288 -> ciflow/rocm-mi300/162288 2025-09-07T09:36:20.1526875Z * [new tag] ciflow/rocm-mi355/162053 -> ciflow/rocm-mi355/162053 2025-09-07T09:36:20.1527630Z * [new tag] ciflow/rocm-mi355/162056 -> ciflow/rocm-mi355/162056 2025-09-07T09:36:20.1528766Z * [new tag] ciflow/rocm/148492 -> ciflow/rocm/148492 2025-09-07T09:36:20.1529675Z * [new tag] ciflow/rocm/154170 -> ciflow/rocm/154170 2025-09-07T09:36:20.1530694Z * [new tag] ciflow/rocm/156491 -> ciflow/rocm/156491 2025-09-07T09:36:20.1531541Z * [new tag] ciflow/rocm/156592 -> ciflow/rocm/156592 2025-09-07T09:36:20.1532389Z * [new tag] ciflow/rocm/158747 -> ciflow/rocm/158747 2025-09-07T09:36:20.1533255Z * [new tag] ciflow/rocm/159146 -> ciflow/rocm/159146 2025-09-07T09:36:20.1534557Z * [new tag] ciflow/rocm/159158 -> ciflow/rocm/159158 2025-09-07T09:36:20.1535335Z * [new tag] ciflow/rocm/161715 -> ciflow/rocm/161715 2025-09-07T09:36:20.1536421Z * [new tag] ciflow/rocm/161972 -> ciflow/rocm/161972 2025-09-07T09:36:20.1537250Z * [new tag] ciflow/rocm/162052 -> ciflow/rocm/162052 2025-09-07T09:36:20.1538132Z * [new tag] ciflow/rocm/162053 -> ciflow/rocm/162053 2025-09-07T09:36:20.1538941Z * [new tag] ciflow/rocm/162056 -> ciflow/rocm/162056 2025-09-07T09:36:20.1539797Z * [new tag] ciflow/rocm/162112 -> ciflow/rocm/162112 2025-09-07T09:36:20.1540649Z * [new tag] ciflow/rocm/162278 -> ciflow/rocm/162278 2025-09-07T09:36:20.1541606Z * [new tag] ciflow/rocm/162288 -> ciflow/rocm/162288 2025-09-07T09:36:20.1542492Z * [new tag] ciflow/rocm/162305 -> ciflow/rocm/162305 2025-09-07T09:36:20.1543811Z * [new tag] ciflow/slow/01c7106 -> ciflow/slow/01c7106 2025-09-07T09:36:20.1544778Z * [new tag] ciflow/slow/0577043 -> ciflow/slow/0577043 2025-09-07T09:36:20.1546518Z * [new tag] ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym -> ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym 2025-09-07T09:36:20.1547195Z * [new tag] ciflow/slow/0e81104 -> ciflow/slow/0e81104 2025-09-07T09:36:20.1548048Z * [new tag] ciflow/slow/161395 -> ciflow/slow/161395 2025-09-07T09:36:20.1549090Z * [new tag] ciflow/slow/1732077 -> ciflow/slow/1732077 2025-09-07T09:36:20.1550141Z * [new tag] ciflow/slow/187eb7c -> ciflow/slow/187eb7c 2025-09-07T09:36:20.1551099Z * [new tag] ciflow/slow/1faef89 -> ciflow/slow/1faef89 2025-09-07T09:36:20.1552072Z * [new tag] ciflow/slow/3920ec1 -> ciflow/slow/3920ec1 2025-09-07T09:36:20.1553084Z * [new tag] ciflow/slow/3b7c6b2 -> ciflow/slow/3b7c6b2 2025-09-07T09:36:20.1554088Z * [new tag] ciflow/slow/59a3759 -> ciflow/slow/59a3759 2025-09-07T09:36:20.1555213Z * [new tag] ciflow/slow/70ef0bb -> ciflow/slow/70ef0bb 2025-09-07T09:36:20.1556312Z * [new tag] ciflow/slow/788ff06 -> ciflow/slow/788ff06 2025-09-07T09:36:20.1557723Z * [new tag] ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym -> ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym 2025-09-07T09:36:20.1558394Z * [new tag] ciflow/slow/9d85864 -> ciflow/slow/9d85864 2025-09-07T09:36:20.1559449Z * [new tag] ciflow/slow/9ffad5b -> ciflow/slow/9ffad5b 2025-09-07T09:36:20.1560427Z * [new tag] ciflow/slow/a206e8b -> ciflow/slow/a206e8b 2025-09-07T09:36:20.1561466Z * [new tag] ciflow/slow/a837609 -> ciflow/slow/a837609 2025-09-07T09:36:20.1562523Z * [new tag] ciflow/slow/af841f3 -> ciflow/slow/af841f3 2025-09-07T09:36:20.1563886Z * [new tag] ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym -> ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym 2025-09-07T09:36:20.1564857Z * [new tag] ciflow/triton_binaries/162329 -> ciflow/triton_binaries/162329 2025-09-07T09:36:20.1566661Z * [new tag] ciflow/trunk/113258 -> ciflow/trunk/113258 2025-09-07T09:36:20.1567429Z * [new tag] ciflow/trunk/137400 -> ciflow/trunk/137400 2025-09-07T09:36:20.1568299Z * [new tag] ciflow/trunk/148180 -> ciflow/trunk/148180 2025-09-07T09:36:20.1569154Z * [new tag] ciflow/trunk/148328 -> ciflow/trunk/148328 2025-09-07T09:36:20.1569990Z * [new tag] ciflow/trunk/148492 -> ciflow/trunk/148492 2025-09-07T09:36:20.1571244Z * [new tag] ciflow/trunk/148919 -> ciflow/trunk/148919 2025-09-07T09:36:20.1571886Z * [new tag] ciflow/trunk/152624 -> ciflow/trunk/152624 2025-09-07T09:36:20.1572766Z * [new tag] ciflow/trunk/154170 -> ciflow/trunk/154170 2025-09-07T09:36:20.1573619Z * [new tag] ciflow/trunk/154694 -> ciflow/trunk/154694 2025-09-07T09:36:20.1574446Z * [new tag] ciflow/trunk/156049 -> ciflow/trunk/156049 2025-09-07T09:36:20.1575543Z * [new tag] ciflow/trunk/156703 -> ciflow/trunk/156703 2025-09-07T09:36:20.1576441Z * [new tag] ciflow/trunk/156711 -> ciflow/trunk/156711 2025-09-07T09:36:20.1577293Z * [new tag] ciflow/trunk/157432 -> ciflow/trunk/157432 2025-09-07T09:36:20.1578178Z * [new tag] ciflow/trunk/157685 -> ciflow/trunk/157685 2025-09-07T09:36:20.1579051Z * [new tag] ciflow/trunk/157689 -> ciflow/trunk/157689 2025-09-07T09:36:20.1579930Z * [new tag] ciflow/trunk/157699 -> ciflow/trunk/157699 2025-09-07T09:36:20.1580786Z * [new tag] ciflow/trunk/157813 -> ciflow/trunk/157813 2025-09-07T09:36:20.1581723Z * [new tag] ciflow/trunk/157994 -> ciflow/trunk/157994 2025-09-07T09:36:20.1582527Z * [new tag] ciflow/trunk/158091 -> ciflow/trunk/158091 2025-09-07T09:36:20.1583582Z * [new tag] ciflow/trunk/158104 -> ciflow/trunk/158104 2025-09-07T09:36:20.1584439Z * [new tag] ciflow/trunk/158404 -> ciflow/trunk/158404 2025-09-07T09:36:20.1585521Z * [new tag] ciflow/trunk/158647 -> ciflow/trunk/158647 2025-09-07T09:36:20.1586681Z * [new tag] ciflow/trunk/158846 -> ciflow/trunk/158846 2025-09-07T09:36:20.1587555Z * [new tag] ciflow/trunk/159158 -> ciflow/trunk/159158 2025-09-07T09:36:20.1588525Z * [new tag] ciflow/trunk/159682 -> ciflow/trunk/159682 2025-09-07T09:36:20.1589435Z * [new tag] ciflow/trunk/159835 -> ciflow/trunk/159835 2025-09-07T09:36:20.1590324Z * [new tag] ciflow/trunk/160161 -> ciflow/trunk/160161 2025-09-07T09:36:20.1591190Z * [new tag] ciflow/trunk/160236 -> ciflow/trunk/160236 2025-09-07T09:36:20.1592081Z * [new tag] ciflow/trunk/160329 -> ciflow/trunk/160329 2025-09-07T09:36:20.1592935Z * [new tag] ciflow/trunk/160480 -> ciflow/trunk/160480 2025-09-07T09:36:20.1593807Z * [new tag] ciflow/trunk/160483 -> ciflow/trunk/160483 2025-09-07T09:36:20.1594670Z * [new tag] ciflow/trunk/160532 -> ciflow/trunk/160532 2025-09-07T09:36:20.1595767Z * [new tag] ciflow/trunk/160836 -> ciflow/trunk/160836 2025-09-07T09:36:20.1596669Z * [new tag] ciflow/trunk/160843 -> ciflow/trunk/160843 2025-09-07T09:36:20.1597545Z * [new tag] ciflow/trunk/160869 -> ciflow/trunk/160869 2025-09-07T09:36:20.1598427Z * [new tag] ciflow/trunk/160928 -> ciflow/trunk/160928 2025-09-07T09:36:20.1599450Z * [new tag] ciflow/trunk/160940 -> ciflow/trunk/160940 2025-09-07T09:36:20.1600331Z * [new tag] ciflow/trunk/160943 -> ciflow/trunk/160943 2025-09-07T09:36:20.1601363Z * [new tag] ciflow/trunk/160953 -> ciflow/trunk/160953 2025-09-07T09:36:20.1602436Z * [new tag] ciflow/trunk/161035 -> ciflow/trunk/161035 2025-09-07T09:36:20.1603331Z * [new tag] ciflow/trunk/161178 -> ciflow/trunk/161178 2025-09-07T09:36:20.1604248Z * [new tag] ciflow/trunk/161349 -> ciflow/trunk/161349 2025-09-07T09:36:20.1605272Z * [new tag] ciflow/trunk/161350 -> ciflow/trunk/161350 2025-09-07T09:36:20.1606448Z * [new tag] ciflow/trunk/161351 -> ciflow/trunk/161351 2025-09-07T09:36:20.1607178Z * [new tag] ciflow/trunk/161395 -> ciflow/trunk/161395 2025-09-07T09:36:20.1608082Z * [new tag] ciflow/trunk/161405 -> ciflow/trunk/161405 2025-09-07T09:36:20.1609053Z * [new tag] ciflow/trunk/161406 -> ciflow/trunk/161406 2025-09-07T09:36:20.1609937Z * [new tag] ciflow/trunk/161410 -> ciflow/trunk/161410 2025-09-07T09:36:20.1610891Z * [new tag] ciflow/trunk/161468 -> ciflow/trunk/161468 2025-09-07T09:36:20.1611756Z * [new tag] ciflow/trunk/161499 -> ciflow/trunk/161499 2025-09-07T09:36:20.1612904Z * [new tag] ciflow/trunk/161527 -> ciflow/trunk/161527 2025-09-07T09:36:20.1613788Z * [new tag] ciflow/trunk/161534 -> ciflow/trunk/161534 2025-09-07T09:36:20.1614756Z * [new tag] ciflow/trunk/161591 -> ciflow/trunk/161591 2025-09-07T09:36:20.1615940Z * [new tag] ciflow/trunk/161595 -> ciflow/trunk/161595 2025-09-07T09:36:20.1616875Z * [new tag] ciflow/trunk/161596 -> ciflow/trunk/161596 2025-09-07T09:36:20.1617766Z * [new tag] ciflow/trunk/161633 -> ciflow/trunk/161633 2025-09-07T09:36:20.1618748Z * [new tag] ciflow/trunk/161634 -> ciflow/trunk/161634 2025-09-07T09:36:20.1619638Z * [new tag] ciflow/trunk/161635 -> ciflow/trunk/161635 2025-09-07T09:36:20.1620622Z * [new tag] ciflow/trunk/161667 -> ciflow/trunk/161667 2025-09-07T09:36:20.1621643Z * [new tag] ciflow/trunk/161670 -> ciflow/trunk/161670 2025-09-07T09:36:20.1622633Z * [new tag] ciflow/trunk/161692 -> ciflow/trunk/161692 2025-09-07T09:36:20.1623532Z * [new tag] ciflow/trunk/161693 -> ciflow/trunk/161693 2025-09-07T09:36:20.1624467Z * [new tag] ciflow/trunk/161695 -> ciflow/trunk/161695 2025-09-07T09:36:20.1625629Z * [new tag] ciflow/trunk/161730 -> ciflow/trunk/161730 2025-09-07T09:36:20.1626602Z * [new tag] ciflow/trunk/161744 -> ciflow/trunk/161744 2025-09-07T09:36:20.1627489Z * [new tag] ciflow/trunk/161749 -> ciflow/trunk/161749 2025-09-07T09:36:20.1628417Z * [new tag] ciflow/trunk/161881 -> ciflow/trunk/161881 2025-09-07T09:36:20.1629355Z * [new tag] ciflow/trunk/161924 -> ciflow/trunk/161924 2025-09-07T09:36:20.1630490Z * [new tag] ciflow/trunk/161926 -> ciflow/trunk/161926 2025-09-07T09:36:20.1631389Z * [new tag] ciflow/trunk/161936 -> ciflow/trunk/161936 2025-09-07T09:36:20.1632304Z * [new tag] ciflow/trunk/161952 -> ciflow/trunk/161952 2025-09-07T09:36:20.1633248Z * [new tag] ciflow/trunk/161955 -> ciflow/trunk/161955 2025-09-07T09:36:20.1634225Z * [new tag] ciflow/trunk/161957 -> ciflow/trunk/161957 2025-09-07T09:36:20.1635378Z * [new tag] ciflow/trunk/161959 -> ciflow/trunk/161959 2025-09-07T09:36:20.1636431Z * [new tag] ciflow/trunk/161977 -> ciflow/trunk/161977 2025-09-07T09:36:20.1637407Z * [new tag] ciflow/trunk/161988 -> ciflow/trunk/161988 2025-09-07T09:36:20.1638369Z * [new tag] ciflow/trunk/161994 -> ciflow/trunk/161994 2025-09-07T09:36:20.1639404Z * [new tag] ciflow/trunk/162007 -> ciflow/trunk/162007 2025-09-07T09:36:20.1640381Z * [new tag] ciflow/trunk/162013 -> ciflow/trunk/162013 2025-09-07T09:36:20.1641336Z * [new tag] ciflow/trunk/162017 -> ciflow/trunk/162017 2025-09-07T09:36:20.1642468Z * [new tag] ciflow/trunk/162021 -> ciflow/trunk/162021 2025-09-07T09:36:20.1643224Z * [new tag] ciflow/trunk/162022 -> ciflow/trunk/162022 2025-09-07T09:36:20.1644202Z * [new tag] ciflow/trunk/162040 -> ciflow/trunk/162040 2025-09-07T09:36:20.1645349Z * [new tag] ciflow/trunk/162041 -> ciflow/trunk/162041 2025-09-07T09:36:20.1646588Z * [new tag] ciflow/trunk/162062 -> ciflow/trunk/162062 2025-09-07T09:36:20.1647514Z * [new tag] ciflow/trunk/162066 -> ciflow/trunk/162066 2025-09-07T09:36:20.1648512Z * [new tag] ciflow/trunk/162089 -> ciflow/trunk/162089 2025-09-07T09:36:20.1649507Z * [new tag] ciflow/trunk/162099 -> ciflow/trunk/162099 2025-09-07T09:36:20.1650480Z * [new tag] ciflow/trunk/162104 -> ciflow/trunk/162104 2025-09-07T09:36:20.1651422Z * [new tag] ciflow/trunk/162106 -> ciflow/trunk/162106 2025-09-07T09:36:20.1652393Z * [new tag] ciflow/trunk/162112 -> ciflow/trunk/162112 2025-09-07T09:36:20.1653364Z * [new tag] ciflow/trunk/162119 -> ciflow/trunk/162119 2025-09-07T09:36:20.1654322Z * [new tag] ciflow/trunk/162142 -> ciflow/trunk/162142 2025-09-07T09:36:20.1655540Z * [new tag] ciflow/trunk/162169 -> ciflow/trunk/162169 2025-09-07T09:36:20.1656641Z * [new tag] ciflow/trunk/162183 -> ciflow/trunk/162183 2025-09-07T09:36:20.1657575Z * [new tag] ciflow/trunk/162190 -> ciflow/trunk/162190 2025-09-07T09:36:20.1658628Z * [new tag] ciflow/trunk/162194 -> ciflow/trunk/162194 2025-09-07T09:36:20.1659634Z * [new tag] ciflow/trunk/162200 -> ciflow/trunk/162200 2025-09-07T09:36:20.1660605Z * [new tag] ciflow/trunk/162206 -> ciflow/trunk/162206 2025-09-07T09:36:20.1661739Z * [new tag] ciflow/trunk/162208 -> ciflow/trunk/162208 2025-09-07T09:36:20.1662824Z * [new tag] ciflow/trunk/162222 -> ciflow/trunk/162222 2025-09-07T09:36:20.1663820Z * [new tag] ciflow/trunk/162238 -> ciflow/trunk/162238 2025-09-07T09:36:20.1664787Z * [new tag] ciflow/trunk/162244 -> ciflow/trunk/162244 2025-09-07T09:36:20.1666294Z * [new tag] ciflow/trunk/162267 -> ciflow/trunk/162267 2025-09-07T09:36:20.1667394Z * [new tag] ciflow/trunk/162269 -> ciflow/trunk/162269 2025-09-07T09:36:20.1668395Z * [new tag] ciflow/trunk/162278 -> ciflow/trunk/162278 2025-09-07T09:36:20.1669376Z * [new tag] ciflow/trunk/162286 -> ciflow/trunk/162286 2025-09-07T09:36:20.1670386Z * [new tag] ciflow/trunk/162288 -> ciflow/trunk/162288 2025-09-07T09:36:20.1671367Z * [new tag] ciflow/trunk/162293 -> ciflow/trunk/162293 2025-09-07T09:36:20.1672357Z * [new tag] ciflow/trunk/162310 -> ciflow/trunk/162310 2025-09-07T09:36:20.1673411Z * [new tag] ciflow/trunk/162311 -> ciflow/trunk/162311 2025-09-07T09:36:20.1674416Z * [new tag] ciflow/trunk/162315 -> ciflow/trunk/162315 2025-09-07T09:36:20.1675649Z * [new tag] ciflow/trunk/162325 -> ciflow/trunk/162325 2025-09-07T09:36:20.1676806Z * [new tag] ciflow/trunk/162328 -> ciflow/trunk/162328 2025-09-07T09:36:20.1677833Z * [new tag] ciflow/trunk/162329 -> ciflow/trunk/162329 2025-09-07T09:36:20.1679187Z * [new tag] ciflow/unstable/123 -> ciflow/unstable/123 2025-09-07T09:36:20.1680327Z * [new tag] ciflow/vllm/162292 -> ciflow/vllm/162292 2025-09-07T09:36:20.1681408Z * [new tag] ciflow/win-arm64/156049 -> ciflow/win-arm64/156049 2025-09-07T09:36:20.1682404Z * [new tag] ciflow/win-arm64/158104 -> ciflow/win-arm64/158104 2025-09-07T09:36:20.1683352Z * [new tag] ciflow/xpu/157699 -> ciflow/xpu/157699 2025-09-07T09:36:20.1684196Z * [new tag] ciflow/xpu/157994 -> ciflow/xpu/157994 2025-09-07T09:36:20.1685297Z * [new tag] ciflow/xpu/159459 -> ciflow/xpu/159459 2025-09-07T09:36:20.1686274Z * [new tag] ciflow/xpu/159718 -> ciflow/xpu/159718 2025-09-07T09:36:20.1687187Z * [new tag] ciflow/xpu/159944 -> ciflow/xpu/159944 2025-09-07T09:36:20.1688159Z * [new tag] ciflow/xpu/160867 -> ciflow/xpu/160867 2025-09-07T09:36:20.1689118Z * [new tag] ciflow/xpu/160938 -> ciflow/xpu/160938 2025-09-07T09:36:20.1689971Z * [new tag] ciflow/xpu/160940 -> ciflow/xpu/160940 2025-09-07T09:36:20.1690841Z * [new tag] ciflow/xpu/160953 -> ciflow/xpu/160953 2025-09-07T09:36:20.1691821Z * [new tag] ciflow/xpu/161045 -> ciflow/xpu/161045 2025-09-07T09:36:20.1692902Z * [new tag] ciflow/xpu/161058 -> ciflow/xpu/161058 2025-09-07T09:36:20.1693755Z * [new tag] ciflow/xpu/161246 -> ciflow/xpu/161246 2025-09-07T09:36:20.1694589Z * [new tag] ciflow/xpu/161397 -> ciflow/xpu/161397 2025-09-07T09:36:20.1695718Z * [new tag] ciflow/xpu/161485 -> ciflow/xpu/161485 2025-09-07T09:36:20.1696571Z * [new tag] ciflow/xpu/161988 -> ciflow/xpu/161988 2025-09-07T09:36:20.1697436Z * [new tag] ciflow/xpu/162062 -> ciflow/xpu/162062 2025-09-07T09:36:20.1698486Z * [new tag] cslpull75 -> cslpull75 2025-09-07T09:36:20.1699409Z * [new tag] cslpull76 -> cslpull76 2025-09-07T09:36:20.1700373Z * [new tag] cslpull77 -> cslpull77 2025-09-07T09:36:20.1701283Z * [new tag] cslpull78 -> cslpull78 2025-09-07T09:36:20.1702365Z * [new tag] cslpull79 -> cslpull79 2025-09-07T09:36:20.1703303Z * [new tag] cslpull80 -> cslpull80 2025-09-07T09:36:20.1704219Z * [new tag] cslpull81 -> cslpull81 2025-09-07T09:36:20.1705325Z * [new tag] cslpull82 -> cslpull82 2025-09-07T09:36:20.1706328Z * [new tag] cslpull83 -> cslpull83 2025-09-07T09:36:20.1707291Z * [new tag] cslpull84 -> cslpull84 2025-09-07T09:36:20.1708204Z * [new tag] cslpull85 -> cslpull85 2025-09-07T09:36:20.1709175Z * [new tag] cslpull86 -> cslpull86 2025-09-07T09:36:20.1710114Z * [new tag] cslpull87 -> cslpull87 2025-09-07T09:36:20.1711046Z * [new tag] cslpull88 -> cslpull88 2025-09-07T09:36:20.1712053Z * [new tag] cslpull89 -> cslpull89 2025-09-07T09:36:20.1712895Z * [new tag] cslpull90 -> cslpull90 2025-09-07T09:36:20.1714173Z * [new tag] cslpull91 -> cslpull91 2025-09-07T09:36:20.1715307Z * [new tag] cslpull92 -> cslpull92 2025-09-07T09:36:20.1716336Z * [new tag] flight_5 -> flight_5 2025-09-07T09:36:20.1717358Z * [new tag] flight_5.1 -> flight_5.1 2025-09-07T09:36:20.1718429Z * [new tag] flight_5.2 -> flight_5.2 2025-09-07T09:36:20.1719287Z * [new tag] flight_5.3 -> flight_5.3 2025-09-07T09:36:20.1720195Z * [new tag] forpull1 -> forpull1 2025-09-07T09:36:20.1721694Z * [new tag] malfet/tag-2ef5611 -> malfet/tag-2ef5611 2025-09-07T09:36:20.1722573Z * [new tag] malfet/tag-317b1a0 -> malfet/tag-317b1a0 2025-09-07T09:36:20.1723485Z * [new tag] malfet/tag-ec6f767 -> malfet/tag-ec6f767 2025-09-07T09:36:20.1724518Z * [new tag] nightly-binary -> nightly-binary 2025-09-07T09:36:20.1725676Z * [new tag] sqzhang_flight4_plus -> sqzhang_flight4_plus 2025-09-07T09:36:20.1726838Z * [new tag] sqzhang_flight_3 -> sqzhang_flight_3 2025-09-07T09:36:20.1728242Z * [new tag] trunk/00636e0171e7e733628c408084805442270cf608 -> trunk/00636e0171e7e733628c408084805442270cf608 2025-09-07T09:36:20.1729287Z * [new tag] trunk/019fed39aa6b2dd8c69347378d53423e5efae8d4 -> trunk/019fed39aa6b2dd8c69347378d53423e5efae8d4 2025-09-07T09:36:20.1730325Z * [new tag] trunk/01ab325cc2e0dc221af4d710974e1b9175066544 -> trunk/01ab325cc2e0dc221af4d710974e1b9175066544 2025-09-07T09:36:20.1731346Z * [new tag] trunk/01edcd4df8bf0c7b4cc2d3ec868bd2059eeea83b -> trunk/01edcd4df8bf0c7b4cc2d3ec868bd2059eeea83b 2025-09-07T09:36:20.1732399Z * [new tag] trunk/040d00af048967dde7938d358d7f5988cbd18388 -> trunk/040d00af048967dde7938d358d7f5988cbd18388 2025-09-07T09:36:20.1733346Z * [new tag] trunk/0447f2d99b4351b2ff129dce6eebb371024f73e5 -> trunk/0447f2d99b4351b2ff129dce6eebb371024f73e5 2025-09-07T09:36:20.1734326Z * [new tag] trunk/047603d35bdc70046216384838d6340feab79bf4 -> trunk/047603d35bdc70046216384838d6340feab79bf4 2025-09-07T09:36:20.1735488Z * [new tag] trunk/06da7c0730b3764f178ec3a90dedf4ffa4202d81 -> trunk/06da7c0730b3764f178ec3a90dedf4ffa4202d81 2025-09-07T09:36:20.1736738Z * [new tag] trunk/081cab045472ce045634548cc6c14a4870641e23 -> trunk/081cab045472ce045634548cc6c14a4870641e23 2025-09-07T09:36:20.1737731Z * [new tag] trunk/09587daf8c9f21f5340f73921ce5f23d1a4a4572 -> trunk/09587daf8c9f21f5340f73921ce5f23d1a4a4572 2025-09-07T09:36:20.1738761Z * [new tag] trunk/09be1890d72cc34fc946965dc4a27736bf0ca8c6 -> trunk/09be1890d72cc34fc946965dc4a27736bf0ca8c6 2025-09-07T09:36:20.1739789Z * [new tag] trunk/09d2f1b6315d6d416fbf452793d65795863ebc66 -> trunk/09d2f1b6315d6d416fbf452793d65795863ebc66 2025-09-07T09:36:20.1740722Z * [new tag] trunk/0af70e2353e1dcda83175fd4834ecb7b63e009e0 -> trunk/0af70e2353e1dcda83175fd4834ecb7b63e009e0 2025-09-07T09:36:20.1742719Z * [new tag] trunk/0c0e056a9e20c17271a6144dd32c0c7e3ba26736 -> trunk/0c0e056a9e20c17271a6144dd32c0c7e3ba26736 2025-09-07T09:36:20.1743824Z * [new tag] trunk/0cd6c56bdfa9178ff61be82ce3b178926ddb64a9 -> trunk/0cd6c56bdfa9178ff61be82ce3b178926ddb64a9 2025-09-07T09:36:20.1744833Z * [new tag] trunk/0d421ace32c1605ee8e452ee1eeb03bd243dd96c -> trunk/0d421ace32c1605ee8e452ee1eeb03bd243dd96c 2025-09-07T09:36:20.1746267Z * [new tag] trunk/0d71a9dd5b4b6d1dde58d91c9b71d96bc6a6a171 -> trunk/0d71a9dd5b4b6d1dde58d91c9b71d96bc6a6a171 2025-09-07T09:36:20.1747219Z * [new tag] trunk/0d84ff3b78f55492d3d4708458c92d776274939e -> trunk/0d84ff3b78f55492d3d4708458c92d776274939e 2025-09-07T09:36:20.1748229Z * [new tag] trunk/0f45aaf4414048b17d720d0915ce221a8de8ec63 -> trunk/0f45aaf4414048b17d720d0915ce221a8de8ec63 2025-09-07T09:36:20.1749238Z * [new tag] trunk/0ff8eabf1387de5acd6712a03bda61f1a3dfa27f -> trunk/0ff8eabf1387de5acd6712a03bda61f1a3dfa27f 2025-09-07T09:36:20.1750295Z * [new tag] trunk/104f2680e03d13a4765ca69f905d8f16fc0c822f -> trunk/104f2680e03d13a4765ca69f905d8f16fc0c822f 2025-09-07T09:36:20.1751360Z * [new tag] trunk/12814701555d3e41dfcdf8f9273af5821e322df0 -> trunk/12814701555d3e41dfcdf8f9273af5821e322df0 2025-09-07T09:36:20.1776733Z * [new tag] trunk/13b65196db422bdb394cb482e208c61ed448898c -> trunk/13b65196db422bdb394cb482e208c61ed448898c 2025-09-07T09:36:20.1777218Z * [new tag] trunk/13d66e2a66eceed14b8a8f5a971087df4f688a46 -> trunk/13d66e2a66eceed14b8a8f5a971087df4f688a46 2025-09-07T09:36:20.1777598Z * [new tag] trunk/145a3a7bda15e3963a33eb1b54bba5d4a270b225 -> trunk/145a3a7bda15e3963a33eb1b54bba5d4a270b225 2025-09-07T09:36:20.1777932Z * [new tag] trunk/146371483318e17929daefd37c8e459d9d6d47bb -> trunk/146371483318e17929daefd37c8e459d9d6d47bb 2025-09-07T09:36:20.1778277Z * [new tag] trunk/15c77a8cfd341e74fd124b077492ef2bfa51b339 -> trunk/15c77a8cfd341e74fd124b077492ef2bfa51b339 2025-09-07T09:36:20.1778608Z * [new tag] trunk/17fa8eec4a1e32939ab4d364ee6e75487a79b654 -> trunk/17fa8eec4a1e32939ab4d364ee6e75487a79b654 2025-09-07T09:36:20.1778941Z * [new tag] trunk/190c391a28845a14df26abb228d26aa813efb20c -> trunk/190c391a28845a14df26abb228d26aa813efb20c 2025-09-07T09:36:20.1779298Z * [new tag] trunk/1a588ace4667bde1331fbd8ed957157dca5cee68 -> trunk/1a588ace4667bde1331fbd8ed957157dca5cee68 2025-09-07T09:36:20.1779615Z * [new tag] trunk/1aa7476885e8f6e7b0ec3a5b6383aad9d3f343e7 -> trunk/1aa7476885e8f6e7b0ec3a5b6383aad9d3f343e7 2025-09-07T09:36:20.1779899Z * [new tag] trunk/1aeb421c342c9e9607842f4c87cb46e8e816ee53 -> trunk/1aeb421c342c9e9607842f4c87cb46e8e816ee53 2025-09-07T09:36:20.1780325Z * [new tag] trunk/1c1b28d5b6a942fafe23b2f09302d93c25226d4a -> trunk/1c1b28d5b6a942fafe23b2f09302d93c25226d4a 2025-09-07T09:36:20.1780614Z * [new tag] trunk/1ebd70d0c0d562d3be9abdee2a21906584af7d99 -> trunk/1ebd70d0c0d562d3be9abdee2a21906584af7d99 2025-09-07T09:36:20.1780907Z * [new tag] trunk/1ec2c15914da4ef7bd926ed9aebc8671c75fe965 -> trunk/1ec2c15914da4ef7bd926ed9aebc8671c75fe965 2025-09-07T09:36:20.1781190Z * [new tag] trunk/1f51056bd64e73d1aa81321bc3c098575b1bc78a -> trunk/1f51056bd64e73d1aa81321bc3c098575b1bc78a 2025-09-07T09:36:20.1781556Z * [new tag] trunk/1f820de639c75a1562d3fb03f160439f853ae07b -> trunk/1f820de639c75a1562d3fb03f160439f853ae07b 2025-09-07T09:36:20.1781833Z * [new tag] trunk/204697f0e695d82894c5010fbec664c4391f90cc -> trunk/204697f0e695d82894c5010fbec664c4391f90cc 2025-09-07T09:36:20.1782107Z * [new tag] trunk/20629b1619fe636227d01fc85ba221daa7185a05 -> trunk/20629b1619fe636227d01fc85ba221daa7185a05 2025-09-07T09:36:20.1782386Z * [new tag] trunk/20b47acef845e9c4f71da9429a396d293f50ebe7 -> trunk/20b47acef845e9c4f71da9429a396d293f50ebe7 2025-09-07T09:36:20.1782670Z * [new tag] trunk/20bfb2539d7c5250379648eda35f80b8a7d642dd -> trunk/20bfb2539d7c5250379648eda35f80b8a7d642dd 2025-09-07T09:36:20.1782950Z * [new tag] trunk/21fae99c180d17def562797ea0fb154d8fdf88e3 -> trunk/21fae99c180d17def562797ea0fb154d8fdf88e3 2025-09-07T09:36:20.1783236Z * [new tag] trunk/248355faf53f9f7ba2fd0a367d59600c6d991e7f -> trunk/248355faf53f9f7ba2fd0a367d59600c6d991e7f 2025-09-07T09:36:20.1783519Z * [new tag] trunk/25f4aaed9ec26f39c13862323ff8582006473d23 -> trunk/25f4aaed9ec26f39c13862323ff8582006473d23 2025-09-07T09:36:20.1783795Z * [new tag] trunk/261a84a1764412f8e659c956e3f81997ec3de9d5 -> trunk/261a84a1764412f8e659c956e3f81997ec3de9d5 2025-09-07T09:36:20.1784151Z * [new tag] trunk/28f4ab0737937858730f29f5c4e601e109cf9d5f -> trunk/28f4ab0737937858730f29f5c4e601e109cf9d5f 2025-09-07T09:36:20.1785516Z * [new tag] trunk/291cd11f2d5df6f48d348cce0e4e762f274f4dc4 -> trunk/291cd11f2d5df6f48d348cce0e4e762f274f4dc4 2025-09-07T09:36:20.1786678Z * [new tag] trunk/29280864d941e6108ab57f7298f520c0cf9696e9 -> trunk/29280864d941e6108ab57f7298f520c0cf9696e9 2025-09-07T09:36:20.1787999Z * [new tag] trunk/2a45837e98c63cae9d1a2e2133a727b829e549d5 -> trunk/2a45837e98c63cae9d1a2e2133a727b829e549d5 2025-09-07T09:36:20.1789093Z * [new tag] trunk/2a5c0785e2f975697fd7bdf1411de6e03dcaa1ef -> trunk/2a5c0785e2f975697fd7bdf1411de6e03dcaa1ef 2025-09-07T09:36:20.1791119Z * [new tag] trunk/2b8a83901c58a0858ea9e4ce00055f48e6ed164c -> trunk/2b8a83901c58a0858ea9e4ce00055f48e6ed164c 2025-09-07T09:36:20.1791690Z * [new tag] trunk/2ba65472dd54488a86a50326ea990195fc6732d6 -> trunk/2ba65472dd54488a86a50326ea990195fc6732d6 2025-09-07T09:36:20.1792520Z * [new tag] trunk/2c03f0acc53ed13fe8ebfe809129f25996e009a0 -> trunk/2c03f0acc53ed13fe8ebfe809129f25996e009a0 2025-09-07T09:36:20.1793702Z * [new tag] trunk/2dd529df0092799f68ee7afcf52338276906706a -> trunk/2dd529df0092799f68ee7afcf52338276906706a 2025-09-07T09:36:20.1794281Z * [new tag] trunk/2f6b4b1ad3f82bb3bd984f6e65744ea339ffb8b5 -> trunk/2f6b4b1ad3f82bb3bd984f6e65744ea339ffb8b5 2025-09-07T09:36:20.1795566Z * [new tag] trunk/2fa0520a64ed8aa734a56c4d124958f0b5711ca8 -> trunk/2fa0520a64ed8aa734a56c4d124958f0b5711ca8 2025-09-07T09:36:20.1796692Z * [new tag] trunk/302df2ac5dc4222294c09d48804a2dddb8f4bad8 -> trunk/302df2ac5dc4222294c09d48804a2dddb8f4bad8 2025-09-07T09:36:20.1797618Z * [new tag] trunk/33028597bfa2e0178e28c8cce33cb9b3800cac43 -> trunk/33028597bfa2e0178e28c8cce33cb9b3800cac43 2025-09-07T09:36:20.1798642Z * [new tag] trunk/34aa78274d6770086025a967fa63a86830e08176 -> trunk/34aa78274d6770086025a967fa63a86830e08176 2025-09-07T09:36:20.1799750Z * [new tag] trunk/3559c354ce6a14d11fe29fb12fa2747a2f2af449 -> trunk/3559c354ce6a14d11fe29fb12fa2747a2f2af449 2025-09-07T09:36:20.1800631Z * [new tag] trunk/36d207fcaaede0d1e58a5168084c307b32b6fd8b -> trunk/36d207fcaaede0d1e58a5168084c307b32b6fd8b 2025-09-07T09:36:20.1801519Z * [new tag] trunk/377033757ae5ca524ea842f1b0a5f446ed3d8fe0 -> trunk/377033757ae5ca524ea842f1b0a5f446ed3d8fe0 2025-09-07T09:36:20.1802655Z * [new tag] trunk/3771380f83fcac154a7c89ad679311d8c4818287 -> trunk/3771380f83fcac154a7c89ad679311d8c4818287 2025-09-07T09:36:20.1803693Z * [new tag] trunk/3a207816cc569f78863d86c01f2a3d265350e39f -> trunk/3a207816cc569f78863d86c01f2a3d265350e39f 2025-09-07T09:36:20.1804758Z * [new tag] trunk/3a20a20e7065ec927fdd216d4da3b04f879b3c67 -> trunk/3a20a20e7065ec927fdd216d4da3b04f879b3c67 2025-09-07T09:36:20.1806205Z * [new tag] trunk/3bbc2e3e4f025523eaa5dbff220b3e96bca608d0 -> trunk/3bbc2e3e4f025523eaa5dbff220b3e96bca608d0 2025-09-07T09:36:20.1807289Z * [new tag] trunk/3c0ff1b569c45cfa6935ad8031a9d4cf1551aa3f -> trunk/3c0ff1b569c45cfa6935ad8031a9d4cf1551aa3f 2025-09-07T09:36:20.1808328Z * [new tag] trunk/3c45af079afc92a03b03ddf4f9198902ffcf30cf -> trunk/3c45af079afc92a03b03ddf4f9198902ffcf30cf 2025-09-07T09:36:20.1809404Z * [new tag] trunk/3dde5d7f9bf80dd6623a712bc429e9e4302464b5 -> trunk/3dde5d7f9bf80dd6623a712bc429e9e4302464b5 2025-09-07T09:36:20.1810407Z * [new tag] trunk/403a3a393cda7e60f503f3b04b8805a845dcf45d -> trunk/403a3a393cda7e60f503f3b04b8805a845dcf45d 2025-09-07T09:36:20.1811497Z * [new tag] trunk/420c52ecf36f86d32da0853bfbe074b682b070aa -> trunk/420c52ecf36f86d32da0853bfbe074b682b070aa 2025-09-07T09:36:20.1812697Z * [new tag] trunk/43b7c86a2c0f91320f5c5f4827b111edff06fdb6 -> trunk/43b7c86a2c0f91320f5c5f4827b111edff06fdb6 2025-09-07T09:36:20.1813773Z * [new tag] trunk/451ed931562ec8b46d1f7e6c266a68132a119336 -> trunk/451ed931562ec8b46d1f7e6c266a68132a119336 2025-09-07T09:36:20.1814809Z * [new tag] trunk/480c7391126656154318fabf1d57ebc01e196e63 -> trunk/480c7391126656154318fabf1d57ebc01e196e63 2025-09-07T09:36:20.1816153Z * [new tag] trunk/48bedd753da22634aa94fbafeb731e82025404f3 -> trunk/48bedd753da22634aa94fbafeb731e82025404f3 2025-09-07T09:36:20.1817298Z * [new tag] trunk/494878a11b79071ada0b98f34042d47155be6d1c -> trunk/494878a11b79071ada0b98f34042d47155be6d1c 2025-09-07T09:36:20.1818269Z * [new tag] trunk/4ae57d448c0a7d37e4cfd5c27d977fad2cef4051 -> trunk/4ae57d448c0a7d37e4cfd5c27d977fad2cef4051 2025-09-07T09:36:20.1819361Z * [new tag] trunk/4cdaf8265d86f984254b62052da8c26ef61ef1cf -> trunk/4cdaf8265d86f984254b62052da8c26ef61ef1cf 2025-09-07T09:36:20.1820324Z * [new tag] trunk/4d4abec80f03cd8fdefe1d9cb3a60d3690cd777e -> trunk/4d4abec80f03cd8fdefe1d9cb3a60d3690cd777e 2025-09-07T09:36:20.1821604Z * [new tag] trunk/4e42aa8ffc44b8340eb0eeaf80a2cafc4763a186 -> trunk/4e42aa8ffc44b8340eb0eeaf80a2cafc4763a186 2025-09-07T09:36:20.1822796Z * [new tag] trunk/4f72d932feee0749397fec876dcd43994f50b215 -> trunk/4f72d932feee0749397fec876dcd43994f50b215 2025-09-07T09:36:20.1823904Z * [new tag] trunk/50fc22dedf3c4a27be61fa05551c4f320281b42d -> trunk/50fc22dedf3c4a27be61fa05551c4f320281b42d 2025-09-07T09:36:20.1825095Z * [new tag] trunk/5211f1f908907ffc064b56e43cf8659f7fc22aa9 -> trunk/5211f1f908907ffc064b56e43cf8659f7fc22aa9 2025-09-07T09:36:20.1828267Z * [new tag] trunk/524b78d4f67045b83bb69edc56ab16efe282971c -> trunk/524b78d4f67045b83bb69edc56ab16efe282971c 2025-09-07T09:36:20.1829428Z * [new tag] trunk/54e275e0d81fe1e1ccfa4fb5f2a5a9aaca00ca15 -> trunk/54e275e0d81fe1e1ccfa4fb5f2a5a9aaca00ca15 2025-09-07T09:36:20.1830398Z * [new tag] trunk/5561e45758d59c94605873d5db48ed459c004c3b -> trunk/5561e45758d59c94605873d5db48ed459c004c3b 2025-09-07T09:36:20.1831661Z * [new tag] trunk/57278d45f046d4f89f45d373b1af4dd56934ff24 -> trunk/57278d45f046d4f89f45d373b1af4dd56934ff24 2025-09-07T09:36:20.1832772Z * [new tag] trunk/5927a70934ccf7b70182d364c23245a7dd685503 -> trunk/5927a70934ccf7b70182d364c23245a7dd685503 2025-09-07T09:36:20.1833768Z * [new tag] trunk/5985e28912aeb40b103ebfcf2fd0665eb4a50599 -> trunk/5985e28912aeb40b103ebfcf2fd0665eb4a50599 2025-09-07T09:36:20.1834926Z * [new tag] trunk/5a2da090ed6db88bb657c4e51ec0b310cd08bff6 -> trunk/5a2da090ed6db88bb657c4e51ec0b310cd08bff6 2025-09-07T09:36:20.1836322Z * [new tag] trunk/5c473e9f5ee0ef0fc38e6cf34a95b547f8cdc8d5 -> trunk/5c473e9f5ee0ef0fc38e6cf34a95b547f8cdc8d5 2025-09-07T09:36:20.1837454Z * [new tag] trunk/5c67426d6847667a7c55a2dd01f470fa37238c18 -> trunk/5c67426d6847667a7c55a2dd01f470fa37238c18 2025-09-07T09:36:20.1838629Z * [new tag] trunk/5da573c42c332bc68d4b7946c69f690a876d951a -> trunk/5da573c42c332bc68d4b7946c69f690a876d951a 2025-09-07T09:36:20.1839778Z * [new tag] trunk/5e5870e858f60ff4bf87d03f3592097e934a9580 -> trunk/5e5870e858f60ff4bf87d03f3592097e934a9580 2025-09-07T09:36:20.1840883Z * [new tag] trunk/5f3cbc9442aa55b5afb29f4ac8ca9be569003e84 -> trunk/5f3cbc9442aa55b5afb29f4ac8ca9be569003e84 2025-09-07T09:36:20.1841988Z * [new tag] trunk/600c25e9a17fe56e3dee872be8854db08916ba0c -> trunk/600c25e9a17fe56e3dee872be8854db08916ba0c 2025-09-07T09:36:20.1843124Z * [new tag] trunk/601ae8e4831fc8123fffcfb8fd2e6b6381b42e14 -> trunk/601ae8e4831fc8123fffcfb8fd2e6b6381b42e14 2025-09-07T09:36:20.1844210Z * [new tag] trunk/6087ef41e54c2494b117ffd923faf20f515a6806 -> trunk/6087ef41e54c2494b117ffd923faf20f515a6806 2025-09-07T09:36:20.1845533Z * [new tag] trunk/626cb7df8161dd4ecb4fe43b60f37ce9076f56b1 -> trunk/626cb7df8161dd4ecb4fe43b60f37ce9076f56b1 2025-09-07T09:36:20.1846683Z * [new tag] trunk/62c3f9a97fd3dea7132a93066d32d893ffe101e6 -> trunk/62c3f9a97fd3dea7132a93066d32d893ffe101e6 2025-09-07T09:36:20.1847742Z * [new tag] trunk/63a9c23fe99eacfd09610c36dfe8f01b053c1a35 -> trunk/63a9c23fe99eacfd09610c36dfe8f01b053c1a35 2025-09-07T09:36:20.1848979Z * [new tag] trunk/65985937d97505f648b6ed852c3129f2dd08b251 -> trunk/65985937d97505f648b6ed852c3129f2dd08b251 2025-09-07T09:36:20.1850509Z * [new tag] trunk/66f3b4a682a6153517dd23369fdc3289b6494b07 -> trunk/66f3b4a682a6153517dd23369fdc3289b6494b07 2025-09-07T09:36:20.1851500Z * [new tag] trunk/6737e2c996990024187ba620d2764f3b6f6add2c -> trunk/6737e2c996990024187ba620d2764f3b6f6add2c 2025-09-07T09:36:20.1852639Z * [new tag] trunk/67c31dcd364f10072a55f4a30ffd1151c686283a -> trunk/67c31dcd364f10072a55f4a30ffd1151c686283a 2025-09-07T09:36:20.1853795Z * [new tag] trunk/68738beff73e9c3512e18b4edea811a897ce42db -> trunk/68738beff73e9c3512e18b4edea811a897ce42db 2025-09-07T09:36:20.1855154Z * [new tag] trunk/69a25f68884a168550695fdb1a7c310c54d29536 -> trunk/69a25f68884a168550695fdb1a7c310c54d29536 2025-09-07T09:36:20.1856334Z * [new tag] trunk/6b1900c22f1a07b9519346898d4c71d8a2b0f12f -> trunk/6b1900c22f1a07b9519346898d4c71d8a2b0f12f 2025-09-07T09:36:20.1857433Z * [new tag] trunk/6b8b3ac4403f771bd4a8f9a45d93347304148774 -> trunk/6b8b3ac4403f771bd4a8f9a45d93347304148774 2025-09-07T09:36:20.1858543Z * [new tag] trunk/6f7608d603834d6068b2e7a5d59bec3973b6bb1b -> trunk/6f7608d603834d6068b2e7a5d59bec3973b6bb1b 2025-09-07T09:36:20.1859834Z * [new tag] trunk/70d36e047dfb3488fd6335016711a784d810ebda -> trunk/70d36e047dfb3488fd6335016711a784d810ebda 2025-09-07T09:36:20.1861026Z * [new tag] trunk/71992dd805ff9d6763f77214dfe8b0465e88c87b -> trunk/71992dd805ff9d6763f77214dfe8b0465e88c87b 2025-09-07T09:36:20.1862206Z * [new tag] trunk/734ce8eba9c69381f187359bf0fef1d71d84cd20 -> trunk/734ce8eba9c69381f187359bf0fef1d71d84cd20 2025-09-07T09:36:20.1863427Z * [new tag] trunk/73eb4511fb863a37944342b7e92aae706de603c8 -> trunk/73eb4511fb863a37944342b7e92aae706de603c8 2025-09-07T09:36:20.1864659Z * [new tag] trunk/75bc23cfc345bd4c05e7f97c416c4b3d2d1fa64b -> trunk/75bc23cfc345bd4c05e7f97c416c4b3d2d1fa64b 2025-09-07T09:36:20.1866181Z * [new tag] trunk/771f369448321a387f2018535bc8b8b6e5f12fab -> trunk/771f369448321a387f2018535bc8b8b6e5f12fab 2025-09-07T09:36:20.1867350Z * [new tag] trunk/789d4942127143f2adcb53612c058ce4c9a2cf20 -> trunk/789d4942127143f2adcb53612c058ce4c9a2cf20 2025-09-07T09:36:20.1868368Z * [new tag] trunk/791eff96c85678c950888f9da24650083ee673fe -> trunk/791eff96c85678c950888f9da24650083ee673fe 2025-09-07T09:36:20.1869399Z * [new tag] trunk/793fc12aff1f69fbbf9f4278182fb52bbe350fc9 -> trunk/793fc12aff1f69fbbf9f4278182fb52bbe350fc9 2025-09-07T09:36:20.1870560Z * [new tag] trunk/79fcd5247a9a129eee526a14df30bfc6a22b3f01 -> trunk/79fcd5247a9a129eee526a14df30bfc6a22b3f01 2025-09-07T09:36:20.1871556Z * [new tag] trunk/7a83cf430e97d83d6fb14880b9049e77ff725685 -> trunk/7a83cf430e97d83d6fb14880b9049e77ff725685 2025-09-07T09:36:20.1872663Z * [new tag] trunk/7f4ff79210eb06924f223ae3a1941ee0e2635348 -> trunk/7f4ff79210eb06924f223ae3a1941ee0e2635348 2025-09-07T09:36:20.1873792Z * [new tag] trunk/8076a185c85112be62be292eb47409c88a585b1c -> trunk/8076a185c85112be62be292eb47409c88a585b1c 2025-09-07T09:36:20.1874932Z * [new tag] trunk/80dd397f1979371a5583fa3d5c7352029522a78d -> trunk/80dd397f1979371a5583fa3d5c7352029522a78d 2025-09-07T09:36:20.1876186Z * [new tag] trunk/8171d6052ec12628eb67e0040839314056014429 -> trunk/8171d6052ec12628eb67e0040839314056014429 2025-09-07T09:36:20.1877342Z * [new tag] trunk/81aeefa657b7ccc26b275c50a9f33b2f056e8071 -> trunk/81aeefa657b7ccc26b275c50a9f33b2f056e8071 2025-09-07T09:36:20.1878445Z * [new tag] trunk/81b7b16618bda250ce55982894a83dc0805eb64c -> trunk/81b7b16618bda250ce55982894a83dc0805eb64c 2025-09-07T09:36:20.1879583Z * [new tag] trunk/827f0d405448de31f79d1089f7d7fceab2f87895 -> trunk/827f0d405448de31f79d1089f7d7fceab2f87895 2025-09-07T09:36:20.1880858Z * [new tag] trunk/82f63c8f6de63c30132a8ac299b6e8c2fd0d3fe8 -> trunk/82f63c8f6de63c30132a8ac299b6e8c2fd0d3fe8 2025-09-07T09:36:20.1881903Z * [new tag] trunk/850e1382a9c56bfde18af09d3e72352d775e9435 -> trunk/850e1382a9c56bfde18af09d3e72352d775e9435 2025-09-07T09:36:20.1883081Z * [new tag] trunk/8678d831c48e616b717bff50f2d03141d2e9f965 -> trunk/8678d831c48e616b717bff50f2d03141d2e9f965 2025-09-07T09:36:20.1884221Z * [new tag] trunk/869cbcc16e489a4f5a14a93d5779b0ea86061c60 -> trunk/869cbcc16e489a4f5a14a93d5779b0ea86061c60 2025-09-07T09:36:20.1885524Z * [new tag] trunk/8703debf669bc2238211bfd039f4ecdd8228b7f7 -> trunk/8703debf669bc2238211bfd039f4ecdd8228b7f7 2025-09-07T09:36:20.1886811Z * [new tag] trunk/874069fbe46e82da5cfa405e6c0deb12e89ff608 -> trunk/874069fbe46e82da5cfa405e6c0deb12e89ff608 2025-09-07T09:36:20.1887981Z * [new tag] trunk/8875d6e394da2fffd04f31b28bf258c94d4776a3 -> trunk/8875d6e394da2fffd04f31b28bf258c94d4776a3 2025-09-07T09:36:20.1889169Z * [new tag] trunk/88d94d17e8c5155451393afa6eb3bab48ab61c16 -> trunk/88d94d17e8c5155451393afa6eb3bab48ab61c16 2025-09-07T09:36:20.1890362Z * [new tag] trunk/890626632def7e0ef95a2d01e87a0e4627824a9f -> trunk/890626632def7e0ef95a2d01e87a0e4627824a9f 2025-09-07T09:36:20.1891607Z * [new tag] trunk/8975cda2520b7b1b5bc3b4d8213edf261fa82570 -> trunk/8975cda2520b7b1b5bc3b4d8213edf261fa82570 2025-09-07T09:36:20.1892777Z * [new tag] trunk/89d41d3f61d04f14730ec26f008a59bef6624610 -> trunk/89d41d3f61d04f14730ec26f008a59bef6624610 2025-09-07T09:36:20.1893927Z * [new tag] trunk/8bb213b6d599ef1273fe52f9b1f6d476056c3a41 -> trunk/8bb213b6d599ef1273fe52f9b1f6d476056c3a41 2025-09-07T09:36:20.1895218Z * [new tag] trunk/8e23a1227b5fb2e39afaa7d57c075a75b640a5af -> trunk/8e23a1227b5fb2e39afaa7d57c075a75b640a5af 2025-09-07T09:36:20.1896786Z * [new tag] trunk/8ec551bb354ab2b85fbbba9d461740a20366d248 -> trunk/8ec551bb354ab2b85fbbba9d461740a20366d248 2025-09-07T09:36:20.1898033Z * [new tag] trunk/8fd3c9ce919c8d5c645fd348bba517e948cbc29d -> trunk/8fd3c9ce919c8d5c645fd348bba517e948cbc29d 2025-09-07T09:36:20.1899218Z * [new tag] trunk/90f50f7e68e120d9574e6e3189e37b4280010ad9 -> trunk/90f50f7e68e120d9574e6e3189e37b4280010ad9 2025-09-07T09:36:20.1900481Z * [new tag] trunk/91f0bcf43fc0bc743350d491ac63b77e92054ac9 -> trunk/91f0bcf43fc0bc743350d491ac63b77e92054ac9 2025-09-07T09:36:20.1901751Z * [new tag] trunk/92576a594b8121f6b0b1b5a3ea16d08792fc68ab -> trunk/92576a594b8121f6b0b1b5a3ea16d08792fc68ab 2025-09-07T09:36:20.1903051Z * [new tag] trunk/92a43025e0baa1f2ce345f28d22913b518a1ab9d -> trunk/92a43025e0baa1f2ce345f28d22913b518a1ab9d 2025-09-07T09:36:20.1904360Z * [new tag] trunk/93fb23d6fae7c4e82c4239a1033e522088742634 -> trunk/93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T09:36:20.1905761Z * [new tag] trunk/9458d1ac3bd70c2af316a8ba95d2c6c9c1199c9c -> trunk/9458d1ac3bd70c2af316a8ba95d2c6c9c1199c9c 2025-09-07T09:36:20.1906926Z * [new tag] trunk/9480cdc0b61488c89a23c2f64f43b2dcedc8728e -> trunk/9480cdc0b61488c89a23c2f64f43b2dcedc8728e 2025-09-07T09:36:20.1908413Z * [new tag] trunk/9491d289b329e4ba4a9f5f5b1be7960671bb7840 -> trunk/9491d289b329e4ba4a9f5f5b1be7960671bb7840 2025-09-07T09:36:20.1909574Z * [new tag] trunk/9499c8761cd2067feb9877414e818f6fd00290f1 -> trunk/9499c8761cd2067feb9877414e818f6fd00290f1 2025-09-07T09:36:20.1910700Z * [new tag] trunk/95ee0bfea99d3d346d6502b91b497d2b35795504 -> trunk/95ee0bfea99d3d346d6502b91b497d2b35795504 2025-09-07T09:36:20.1911843Z * [new tag] trunk/98374612fc2febd686be20761e56bdc2424bc36a -> trunk/98374612fc2febd686be20761e56bdc2424bc36a 2025-09-07T09:36:20.1913177Z * [new tag] trunk/98efc9e93d8fc61eb53cb91378443617cb550500 -> trunk/98efc9e93d8fc61eb53cb91378443617cb550500 2025-09-07T09:36:20.1914098Z * [new tag] trunk/994f2a5dbcbdc915da39bf6f6ce4d1f5e74835c9 -> trunk/994f2a5dbcbdc915da39bf6f6ce4d1f5e74835c9 2025-09-07T09:36:20.1915376Z * [new tag] trunk/99f356fa58c8d726cef022d8710f5491291158f6 -> trunk/99f356fa58c8d726cef022d8710f5491291158f6 2025-09-07T09:36:20.1916713Z * [new tag] trunk/9a1c5c0a078b94d13ac5c1ae0d754d19fb73bf99 -> trunk/9a1c5c0a078b94d13ac5c1ae0d754d19fb73bf99 2025-09-07T09:36:20.1917853Z * [new tag] trunk/9a665ca3c472384e9d722bddba79e5a7680f1abd -> trunk/9a665ca3c472384e9d722bddba79e5a7680f1abd 2025-09-07T09:36:20.1918983Z * [new tag] trunk/9aedb3cd87b52160872173c177f61053d97bed57 -> trunk/9aedb3cd87b52160872173c177f61053d97bed57 2025-09-07T09:36:20.1920101Z * [new tag] trunk/9b81fe281da41f2421506339d26b027a468902f4 -> trunk/9b81fe281da41f2421506339d26b027a468902f4 2025-09-07T09:36:20.1921182Z * [new tag] trunk/9bdcee01f86e2969cff1140cdecfca13cb51816e -> trunk/9bdcee01f86e2969cff1140cdecfca13cb51816e 2025-09-07T09:36:20.1922269Z * [new tag] trunk/9c03d6be87eedc06e524e202e07a7e776551a839 -> trunk/9c03d6be87eedc06e524e202e07a7e776551a839 2025-09-07T09:36:20.1923393Z * [new tag] trunk/9c957723a0fedd9c637e63e023a613019e2cab60 -> trunk/9c957723a0fedd9c637e63e023a613019e2cab60 2025-09-07T09:36:20.1924612Z * [new tag] trunk/9e5247f51d81735e5f1e65e80588985fa93bccc5 -> trunk/9e5247f51d81735e5f1e65e80588985fa93bccc5 2025-09-07T09:36:20.1926055Z * [new tag] trunk/9eadb37cdd699f7e8e8177a5227bfeb16184ef26 -> trunk/9eadb37cdd699f7e8e8177a5227bfeb16184ef26 2025-09-07T09:36:20.1927163Z * [new tag] trunk/a00cdc1e4159db73c9ffb3f25e93e55877709a29 -> trunk/a00cdc1e4159db73c9ffb3f25e93e55877709a29 2025-09-07T09:36:20.1928317Z * [new tag] trunk/a02ee4a816d11380c6f564c1aba64d56af5ba705 -> trunk/a02ee4a816d11380c6f564c1aba64d56af5ba705 2025-09-07T09:36:20.1929369Z * [new tag] trunk/a3c7f77e50f900721817934120d60c2361b3c40d -> trunk/a3c7f77e50f900721817934120d60c2361b3c40d 2025-09-07T09:36:20.1930532Z * [new tag] trunk/a3d72b09ae12126a2b7d4a63a45ac100a882a802 -> trunk/a3d72b09ae12126a2b7d4a63a45ac100a882a802 2025-09-07T09:36:20.1931649Z * [new tag] trunk/a3e5466002791da609fcb069155d8ee347baee92 -> trunk/a3e5466002791da609fcb069155d8ee347baee92 2025-09-07T09:36:20.1932867Z * [new tag] trunk/a714437093ed196eee28f7de454cf4c41badc098 -> trunk/a714437093ed196eee28f7de454cf4c41badc098 2025-09-07T09:36:20.1933967Z * [new tag] trunk/a75e8cd27098f290de0b7439685d05ce02e91356 -> trunk/a75e8cd27098f290de0b7439685d05ce02e91356 2025-09-07T09:36:20.1935132Z * [new tag] trunk/a8d6943d36c1c2a5f90d3573460695bad4b623ae -> trunk/a8d6943d36c1c2a5f90d3573460695bad4b623ae 2025-09-07T09:36:20.1936593Z * [new tag] trunk/a918bbad6ab20649ff82eefb48417ecbe96bcb34 -> trunk/a918bbad6ab20649ff82eefb48417ecbe96bcb34 2025-09-07T09:36:20.1937715Z * [new tag] trunk/a99d8d39bc842d6ebc3e368b178e4884d24b056e -> trunk/a99d8d39bc842d6ebc3e368b178e4884d24b056e 2025-09-07T09:36:20.1938836Z * [new tag] trunk/aac1a50a191b4102d566c9c1ea22f06d6c2e3f02 -> trunk/aac1a50a191b4102d566c9c1ea22f06d6c2e3f02 2025-09-07T09:36:20.1939951Z * [new tag] trunk/aad96a202244c7d0d120c04ba8db593edd8c0f92 -> trunk/aad96a202244c7d0d120c04ba8db593edd8c0f92 2025-09-07T09:36:20.1941101Z * [new tag] trunk/ab643e4dbbaf7b663d4237514cbf01af9b11565c -> trunk/ab643e4dbbaf7b663d4237514cbf01af9b11565c 2025-09-07T09:36:20.1942415Z * [new tag] trunk/abc447174cd2cf8591edbc70a9f836f9a5779f47 -> trunk/abc447174cd2cf8591edbc70a9f836f9a5779f47 2025-09-07T09:36:20.1943787Z * [new tag] trunk/acece97c3a9dceb63194e314da93fdf37cf15a0d -> trunk/acece97c3a9dceb63194e314da93fdf37cf15a0d 2025-09-07T09:36:20.1944823Z * [new tag] trunk/ada43ed39c80b746b4822c92640a1882619e2795 -> trunk/ada43ed39c80b746b4822c92640a1882619e2795 2025-09-07T09:36:20.1946284Z * [new tag] trunk/adae7f66aacf3f248c3101b858cf98d5809119fa -> trunk/adae7f66aacf3f248c3101b858cf98d5809119fa 2025-09-07T09:36:20.1947405Z * [new tag] trunk/ae0edc133e61e3b16caf0b2ee0ff3f33ab72af4c -> trunk/ae0edc133e61e3b16caf0b2ee0ff3f33ab72af4c 2025-09-07T09:36:20.1948503Z * [new tag] trunk/aed33a8fcbd60b052d4559d261390c5797129c6d -> trunk/aed33a8fcbd60b052d4559d261390c5797129c6d 2025-09-07T09:36:20.1949716Z * [new tag] trunk/b04e922712080a3652e438d05e8bb74e0cd2d238 -> trunk/b04e922712080a3652e438d05e8bb74e0cd2d238 2025-09-07T09:36:20.1950912Z * [new tag] trunk/b0a3e58dd71c1a039ac0ef51e5bd8f704f632f6f -> trunk/b0a3e58dd71c1a039ac0ef51e5bd8f704f632f6f 2025-09-07T09:36:20.1952010Z * [new tag] trunk/b16d3f4c8c01d461c2f01064e9ca5fa2b33f5cf1 -> trunk/b16d3f4c8c01d461c2f01064e9ca5fa2b33f5cf1 2025-09-07T09:36:20.1953064Z * [new tag] trunk/b18bb6796f210a183e687d9d64984a5a9d13cf09 -> trunk/b18bb6796f210a183e687d9d64984a5a9d13cf09 2025-09-07T09:36:20.1954257Z * [new tag] trunk/b1bb98ddebdd3e41bf7987372409bdce96ae55de -> trunk/b1bb98ddebdd3e41bf7987372409bdce96ae55de 2025-09-07T09:36:20.1955649Z * [new tag] trunk/b2b4add0e754411372060e1d7b4057a66439172b -> trunk/b2b4add0e754411372060e1d7b4057a66439172b 2025-09-07T09:36:20.1956898Z * [new tag] trunk/b2c7b9ad2dc5a7c0b61febd307761bd5bc2f0f05 -> trunk/b2c7b9ad2dc5a7c0b61febd307761bd5bc2f0f05 2025-09-07T09:36:20.1958061Z * [new tag] trunk/b40d9432be44a6b5974ee62e7d19c3c61c5ece37 -> trunk/b40d9432be44a6b5974ee62e7d19c3c61c5ece37 2025-09-07T09:36:20.1959160Z * [new tag] trunk/b4ad38279b178b7bd14355123c1101e2e853e77b -> trunk/b4ad38279b178b7bd14355123c1101e2e853e77b 2025-09-07T09:36:20.1960382Z * [new tag] trunk/b67c41039835bd9b20b83cd6233e86baaa5f5dde -> trunk/b67c41039835bd9b20b83cd6233e86baaa5f5dde 2025-09-07T09:36:20.1961623Z * [new tag] trunk/b6d0a9ea9056ede4f7024dbf3bd6c43be3aff49c -> trunk/b6d0a9ea9056ede4f7024dbf3bd6c43be3aff49c 2025-09-07T09:36:20.1962738Z * [new tag] trunk/b7dad7dd49448c88d0751fa2e29c70afe985f734 -> trunk/b7dad7dd49448c88d0751fa2e29c70afe985f734 2025-09-07T09:36:20.1964245Z * [new tag] trunk/b7e207ca9f046ddd716076965a0cce403ba99052 -> trunk/b7e207ca9f046ddd716076965a0cce403ba99052 2025-09-07T09:36:20.1965507Z * [new tag] trunk/b919560c4a7010e2d89facee25586269a994746e -> trunk/b919560c4a7010e2d89facee25586269a994746e 2025-09-07T09:36:20.1966814Z * [new tag] trunk/b9ba612f7a968f7b27e121ca8f4d0a4d954f5354 -> trunk/b9ba612f7a968f7b27e121ca8f4d0a4d954f5354 2025-09-07T09:36:20.1967999Z * [new tag] trunk/ba7f546ccccb5e0b36d9070dc25f26a9647f89f8 -> trunk/ba7f546ccccb5e0b36d9070dc25f26a9647f89f8 2025-09-07T09:36:20.1969134Z * [new tag] trunk/bb950284c7e72905994bc25dd436c10e48088d85 -> trunk/bb950284c7e72905994bc25dd436c10e48088d85 2025-09-07T09:36:20.1970239Z * [new tag] trunk/bbedc71fd3267c639c38b4ec25eaa22f973d9c4d -> trunk/bbedc71fd3267c639c38b4ec25eaa22f973d9c4d 2025-09-07T09:36:20.1971208Z * [new tag] trunk/bc4db2c27fce6ff1648bdc5af31ec225d2a31f37 -> trunk/bc4db2c27fce6ff1648bdc5af31ec225d2a31f37 2025-09-07T09:36:20.1972322Z * [new tag] trunk/bc505977fb66677a09c31155c987330fbb18a865 -> trunk/bc505977fb66677a09c31155c987330fbb18a865 2025-09-07T09:36:20.1973442Z * [new tag] trunk/bd39e47feea7326afb5bbb67fcb1e69279239527 -> trunk/bd39e47feea7326afb5bbb67fcb1e69279239527 2025-09-07T09:36:20.1974615Z * [new tag] trunk/be5b03dde96638f25ffd732a4fed7e41b4cf40e1 -> trunk/be5b03dde96638f25ffd732a4fed7e41b4cf40e1 2025-09-07T09:36:20.1976085Z * [new tag] trunk/bffc7dd1f374d8408911cd22c6b3d6df39ded9b3 -> trunk/bffc7dd1f374d8408911cd22c6b3d6df39ded9b3 2025-09-07T09:36:20.1977131Z * [new tag] trunk/c024b1f5a18d5c5aee5cc2acdd4c52b24b93ffcf -> trunk/c024b1f5a18d5c5aee5cc2acdd4c52b24b93ffcf 2025-09-07T09:36:20.1978286Z * [new tag] trunk/c0983e6cc0acf71689e1851d12609e00b3f59371 -> trunk/c0983e6cc0acf71689e1851d12609e00b3f59371 2025-09-07T09:36:20.1979418Z * [new tag] trunk/c10195e723eeeedd099ed8b73eda7184ca618fad -> trunk/c10195e723eeeedd099ed8b73eda7184ca618fad 2025-09-07T09:36:20.1980530Z * [new tag] trunk/c157cf6488ade6a7ee2ce2d25b059e1335630a99 -> trunk/c157cf6488ade6a7ee2ce2d25b059e1335630a99 2025-09-07T09:36:20.1981723Z * [new tag] trunk/c2a30246172fd71d56529907ffd3c27b76b1f3a7 -> trunk/c2a30246172fd71d56529907ffd3c27b76b1f3a7 2025-09-07T09:36:20.1983027Z * [new tag] trunk/c32111149921b48bfef909293f1049e21619ed76 -> trunk/c32111149921b48bfef909293f1049e21619ed76 2025-09-07T09:36:20.1984044Z * [new tag] trunk/c37103234afc832dcad307e9016230810957c9d5 -> trunk/c37103234afc832dcad307e9016230810957c9d5 2025-09-07T09:36:20.1985547Z * [new tag] trunk/c3ceca2995cd35e1376c4b0704669bff1a81e836 -> trunk/c3ceca2995cd35e1376c4b0704669bff1a81e836 2025-09-07T09:36:20.1986662Z * [new tag] trunk/c3d54dea9febb1236d48d19e5d4876a63f2e20fd -> trunk/c3d54dea9febb1236d48d19e5d4876a63f2e20fd 2025-09-07T09:36:20.1987779Z * [new tag] trunk/c465b3d52c5687fe910d35a5c75341b77f821741 -> trunk/c465b3d52c5687fe910d35a5c75341b77f821741 2025-09-07T09:36:20.1988877Z * [new tag] trunk/c5b8a10be5e89396da916d1069ffcb7135f0372b -> trunk/c5b8a10be5e89396da916d1069ffcb7135f0372b 2025-09-07T09:36:20.1989923Z * [new tag] trunk/c7e41071a08f4045bc11ab60ec366d7357d56e30 -> trunk/c7e41071a08f4045bc11ab60ec366d7357d56e30 2025-09-07T09:36:20.1991069Z * [new tag] trunk/c98ddaca6d2e19ca37aff00c4ff0cda1e9a6ff65 -> trunk/c98ddaca6d2e19ca37aff00c4ff0cda1e9a6ff65 2025-09-07T09:36:20.1992060Z * [new tag] trunk/cb1e31362c7b53acf4ac95b9f8878064c184f03b -> trunk/cb1e31362c7b53acf4ac95b9f8878064c184f03b 2025-09-07T09:36:20.1993225Z * [new tag] trunk/cbfb005f7cce79974795b148e265f594f59477c8 -> trunk/cbfb005f7cce79974795b148e265f594f59477c8 2025-09-07T09:36:20.1994405Z * [new tag] trunk/cc5bdd12401bda835291d2f3cb297132ebdbf358 -> trunk/cc5bdd12401bda835291d2f3cb297132ebdbf358 2025-09-07T09:36:20.1995888Z * [new tag] trunk/cd529b686d54bbaa443f5b310140de48422d96c7 -> trunk/cd529b686d54bbaa443f5b310140de48422d96c7 2025-09-07T09:36:20.1997136Z * [new tag] trunk/cec0ff122815582af5302360aff03676558c5c87 -> trunk/cec0ff122815582af5302360aff03676558c5c87 2025-09-07T09:36:20.1998358Z * [new tag] trunk/d11720efdb563d02cf4f7d324311fb15a755268e -> trunk/d11720efdb563d02cf4f7d324311fb15a755268e 2025-09-07T09:36:20.1999570Z * [new tag] trunk/d1706d9128ae24d9048167e80d3fe5196d19035e -> trunk/d1706d9128ae24d9048167e80d3fe5196d19035e 2025-09-07T09:36:20.2000733Z * [new tag] trunk/d1a15abfdcaef138f2d9e93a9f46be44f30b766d -> trunk/d1a15abfdcaef138f2d9e93a9f46be44f30b766d 2025-09-07T09:36:20.2001993Z * [new tag] trunk/d232a95d4a79404ca05c1f52d37fde7339dcdf49 -> trunk/d232a95d4a79404ca05c1f52d37fde7339dcdf49 2025-09-07T09:36:20.2003102Z * [new tag] trunk/d2d4c8e9b2371c9aacfb771d9402ac7427b9778e -> trunk/d2d4c8e9b2371c9aacfb771d9402ac7427b9778e 2025-09-07T09:36:20.2004202Z * [new tag] trunk/d33840c542b387ab08ba49aa6c45aa9567fd9be7 -> trunk/d33840c542b387ab08ba49aa6c45aa9567fd9be7 2025-09-07T09:36:20.2005440Z * [new tag] trunk/d5643e8f3a648a99636bfa1f2a41d54bd3c0d0f1 -> trunk/d5643e8f3a648a99636bfa1f2a41d54bd3c0d0f1 2025-09-07T09:36:20.2006741Z * [new tag] trunk/d5b38410b5b6cf75c7a7389972777a6497926ee7 -> trunk/d5b38410b5b6cf75c7a7389972777a6497926ee7 2025-09-07T09:36:20.2007543Z * [new tag] trunk/d5e0f4202ba14632e4d14862ace096609e763462 -> trunk/d5e0f4202ba14632e4d14862ace096609e763462 2025-09-07T09:36:20.2008843Z * [new tag] trunk/d636c181f9140a7b59be10b36eae23039fc2bb72 -> trunk/d636c181f9140a7b59be10b36eae23039fc2bb72 2025-09-07T09:36:20.2010438Z * [new tag] trunk/d64718503728001a1e78168fd7f2d4ff23e57285 -> trunk/d64718503728001a1e78168fd7f2d4ff23e57285 2025-09-07T09:36:20.2011603Z * [new tag] trunk/d67c29ad22670320d676b02e394274af34e8e643 -> trunk/d67c29ad22670320d676b02e394274af34e8e643 2025-09-07T09:36:20.2012790Z * [new tag] trunk/d6b74568e2c98ce58ecc145b72ac66d4caf7ce95 -> trunk/d6b74568e2c98ce58ecc145b72ac66d4caf7ce95 2025-09-07T09:36:20.2013948Z * [new tag] trunk/d711f27845abd45007ccab6076649ebd896c2661 -> trunk/d711f27845abd45007ccab6076649ebd896c2661 2025-09-07T09:36:20.2015185Z * [new tag] trunk/d9d6dde0f42d4bcc8c97671ac50d5096c7e500ab -> trunk/d9d6dde0f42d4bcc8c97671ac50d5096c7e500ab 2025-09-07T09:36:20.2016481Z * [new tag] trunk/da4db4b33d1fdd046650cf19fdbac581a19bf2f9 -> trunk/da4db4b33d1fdd046650cf19fdbac581a19bf2f9 2025-09-07T09:36:20.2017450Z * [new tag] trunk/dac8a4b91c01c3bbc96f54e621b1ea4ffdbd29d1 -> trunk/dac8a4b91c01c3bbc96f54e621b1ea4ffdbd29d1 2025-09-07T09:36:20.2018633Z * [new tag] trunk/dbec08729fb9848bebed6048c63831b87170d061 -> trunk/dbec08729fb9848bebed6048c63831b87170d061 2025-09-07T09:36:20.2019621Z * [new tag] trunk/dcf385395d838f38c8dca25913578230dd43099a -> trunk/dcf385395d838f38c8dca25913578230dd43099a 2025-09-07T09:36:20.2020716Z * [new tag] trunk/dd2519abe83ec3c40d4797492434e41fe3b47e17 -> trunk/dd2519abe83ec3c40d4797492434e41fe3b47e17 2025-09-07T09:36:20.2021973Z * [new tag] trunk/dec72ea4b006dd0fbcaaaa106ad273d73807ab9d -> trunk/dec72ea4b006dd0fbcaaaa106ad273d73807ab9d 2025-09-07T09:36:20.2023262Z * [new tag] trunk/e0a62b266c021b910ce6dc02a6c9429210487717 -> trunk/e0a62b266c021b910ce6dc02a6c9429210487717 2025-09-07T09:36:20.2024455Z * [new tag] trunk/e19e02c84c9dcc408375e5cae3b0709c18b99228 -> trunk/e19e02c84c9dcc408375e5cae3b0709c18b99228 2025-09-07T09:36:20.2025793Z * [new tag] trunk/e304ea4e69d3a7deeb7e48c7450c214a4c953937 -> trunk/e304ea4e69d3a7deeb7e48c7450c214a4c953937 2025-09-07T09:36:20.2026960Z * [new tag] trunk/e3068cdb446adefb5a875616ba37a60235391439 -> trunk/e3068cdb446adefb5a875616ba37a60235391439 2025-09-07T09:36:20.2028049Z * [new tag] trunk/e381d4b0205d5f126c1de534f867ba776f7c3ee6 -> trunk/e381d4b0205d5f126c1de534f867ba776f7c3ee6 2025-09-07T09:36:20.2029279Z * [new tag] trunk/e4bd0ff4f8981b805df32ea5b3550621965ea4f2 -> trunk/e4bd0ff4f8981b805df32ea5b3550621965ea4f2 2025-09-07T09:36:20.2030285Z * [new tag] trunk/e532c9d4f1cdcbc1ea9628f55b9813e77847bdc7 -> trunk/e532c9d4f1cdcbc1ea9628f55b9813e77847bdc7 2025-09-07T09:36:20.2031522Z * [new tag] trunk/e92cd9415377403b6e90585e764639e2e0b5973b -> trunk/e92cd9415377403b6e90585e764639e2e0b5973b 2025-09-07T09:36:20.2032606Z * [new tag] trunk/e9481b6617b5576b099d8ca5798111592e9ad090 -> trunk/e9481b6617b5576b099d8ca5798111592e9ad090 2025-09-07T09:36:20.2033679Z * [new tag] trunk/ea1883dfd3e42defe37b11202b878bb76defa087 -> trunk/ea1883dfd3e42defe37b11202b878bb76defa087 2025-09-07T09:36:20.2034860Z * [new tag] trunk/eac3d6f04cfbbebe3d470dacd216da7d4b1f95a8 -> trunk/eac3d6f04cfbbebe3d470dacd216da7d4b1f95a8 2025-09-07T09:36:20.2036241Z * [new tag] trunk/eb18d32bda75189494d955aa001ade15f10333de -> trunk/eb18d32bda75189494d955aa001ade15f10333de 2025-09-07T09:36:20.2037224Z * [new tag] trunk/ef3be6726f7ff4b77c22db10cec5b686f9107ea9 -> trunk/ef3be6726f7ff4b77c22db10cec5b686f9107ea9 2025-09-07T09:36:20.2038597Z * [new tag] trunk/ef8aabd42422725026cb4dbf48aafa9efa226a04 -> trunk/ef8aabd42422725026cb4dbf48aafa9efa226a04 2025-09-07T09:36:20.2039668Z * [new tag] trunk/f00445b43eee57e20bb9316fa796ca23bf73373b -> trunk/f00445b43eee57e20bb9316fa796ca23bf73373b 2025-09-07T09:36:20.2040761Z * [new tag] trunk/f0c391102b754e3b145e8c59231d2df563487e37 -> trunk/f0c391102b754e3b145e8c59231d2df563487e37 2025-09-07T09:36:20.2041962Z * [new tag] trunk/f27985b7e796fb66a1b476284ba42d8cb360a751 -> trunk/f27985b7e796fb66a1b476284ba42d8cb360a751 2025-09-07T09:36:20.2043155Z * [new tag] trunk/f36f285953700f971552083a5da9d0ceacb63bbd -> trunk/f36f285953700f971552083a5da9d0ceacb63bbd 2025-09-07T09:36:20.2044317Z * [new tag] trunk/f3cebec39ebc110e1c8b06e741896585f7892dbb -> trunk/f3cebec39ebc110e1c8b06e741896585f7892dbb 2025-09-07T09:36:20.2045467Z * [new tag] trunk/f4c33cd44acac92c0b451a04da20ebe9370e5b0c -> trunk/f4c33cd44acac92c0b451a04da20ebe9370e5b0c 2025-09-07T09:36:20.2046779Z * [new tag] trunk/f612045ce105f008b2b675e2fc870163babeb2e8 -> trunk/f612045ce105f008b2b675e2fc870163babeb2e8 2025-09-07T09:36:20.2047838Z * [new tag] trunk/f8746b878dfc1e9639d42cbde832e9b9e792c86c -> trunk/f8746b878dfc1e9639d42cbde832e9b9e792c86c 2025-09-07T09:36:20.2048951Z * [new tag] trunk/f8ffa9194e26523e5f976d4a824d5cc58922727c -> trunk/f8ffa9194e26523e5f976d4a824d5cc58922727c 2025-09-07T09:36:20.2050063Z * [new tag] trunk/f981a7fa5230b98974291fdde32fe8488bc5d469 -> trunk/f981a7fa5230b98974291fdde32fe8488bc5d469 2025-09-07T09:36:20.2051198Z * [new tag] trunk/fbf3d2027daabbcb44d0af274b139be2a248a4f7 -> trunk/fbf3d2027daabbcb44d0af274b139be2a248a4f7 2025-09-07T09:36:20.2052423Z * [new tag] trunk/fca2601c9d628e1bd2d75c7318cd22c4e8c832aa -> trunk/fca2601c9d628e1bd2d75c7318cd22c4e8c832aa 2025-09-07T09:36:20.2053591Z * [new tag] trunk/fea20775ad96bdca972a1811d7d3372f368614ab -> trunk/fea20775ad96bdca972a1811d7d3372f368614ab 2025-09-07T09:36:20.2054584Z * [new tag] trunk/fefee081642f87419a21dc852f7167d4640443cd -> trunk/fefee081642f87419a21dc852f7167d4640443cd 2025-09-07T09:36:20.2055731Z * [new tag] v0.1.1 -> v0.1.1 2025-09-07T09:36:20.2056812Z * [new tag] v0.1.10 -> v0.1.10 2025-09-07T09:36:20.2057779Z * [new tag] v0.1.11 -> v0.1.11 2025-09-07T09:36:20.2058721Z * [new tag] v0.1.12 -> v0.1.12 2025-09-07T09:36:20.2059671Z * [new tag] v0.1.2 -> v0.1.2 2025-09-07T09:36:20.2060562Z * [new tag] v0.1.3 -> v0.1.3 2025-09-07T09:36:20.2061763Z * [new tag] v0.1.4 -> v0.1.4 2025-09-07T09:36:20.2062795Z * [new tag] v0.1.5 -> v0.1.5 2025-09-07T09:36:20.2063813Z * [new tag] v0.1.6 -> v0.1.6 2025-09-07T09:36:20.2064792Z * [new tag] v0.1.7 -> v0.1.7 2025-09-07T09:36:20.2066066Z * [new tag] v0.1.8 -> v0.1.8 2025-09-07T09:36:20.2066972Z * [new tag] v0.1.9 -> v0.1.9 2025-09-07T09:36:20.2067981Z * [new tag] v0.2.0 -> v0.2.0 2025-09-07T09:36:20.2069018Z * [new tag] v0.3.0 -> v0.3.0 2025-09-07T09:36:20.2070142Z * [new tag] v0.3.1 -> v0.3.1 2025-09-07T09:36:20.2071154Z * [new tag] v0.4.0 -> v0.4.0 2025-09-07T09:36:20.2072169Z * [new tag] v0.4.1 -> v0.4.1 2025-09-07T09:36:20.2073246Z * [new tag] v1.0.0 -> v1.0.0 2025-09-07T09:36:20.2074469Z * [new tag] v1.0.0a0 -> v1.0.0a0 2025-09-07T09:36:20.2075606Z * [new tag] v1.0.1 -> v1.0.1 2025-09-07T09:36:20.2076716Z * [new tag] v1.0rc0 -> v1.0rc0 2025-09-07T09:36:20.2077589Z * [new tag] v1.0rc1 -> v1.0rc1 2025-09-07T09:36:20.2078692Z * [new tag] v1.1.0 -> v1.1.0 2025-09-07T09:36:20.2079742Z * [new tag] v1.1.0a0 -> v1.1.0a0 2025-09-07T09:36:20.2080924Z * [new tag] v1.10.0 -> v1.10.0 2025-09-07T09:36:20.2082104Z * [new tag] v1.10.0-rc1 -> v1.10.0-rc1 2025-09-07T09:36:20.2083213Z * [new tag] v1.10.0-rc2 -> v1.10.0-rc2 2025-09-07T09:36:20.2084080Z * [new tag] v1.10.0-rc3 -> v1.10.0-rc3 2025-09-07T09:36:20.2085240Z * [new tag] v1.10.1 -> v1.10.1 2025-09-07T09:36:20.2086215Z * [new tag] v1.10.1-rc1 -> v1.10.1-rc1 2025-09-07T09:36:20.2087078Z * [new tag] v1.10.2 -> v1.10.2 2025-09-07T09:36:20.2087938Z * [new tag] v1.10.2-rc1 -> v1.10.2-rc1 2025-09-07T09:36:20.2089033Z * [new tag] v1.11.0 -> v1.11.0 2025-09-07T09:36:20.2090178Z * [new tag] v1.11.0-rc1 -> v1.11.0-rc1 2025-09-07T09:36:20.2091374Z * [new tag] v1.11.0-rc2 -> v1.11.0-rc2 2025-09-07T09:36:20.2092475Z * [new tag] v1.11.0-rc3 -> v1.11.0-rc3 2025-09-07T09:36:20.2093578Z * [new tag] v1.11.0-rc4 -> v1.11.0-rc4 2025-09-07T09:36:20.2094715Z * [new tag] v1.11.0-rc5 -> v1.11.0-rc5 2025-09-07T09:36:20.2095903Z * [new tag] v1.11.0-rc6 -> v1.11.0-rc6 2025-09-07T09:36:20.2096821Z * [new tag] v1.11.0-rc7 -> v1.11.0-rc7 2025-09-07T09:36:20.2097853Z * [new tag] v1.12.0 -> v1.12.0 2025-09-07T09:36:20.2126253Z * [new tag] v1.12.0-rc1 -> v1.12.0-rc1 2025-09-07T09:36:20.2127415Z * [new tag] v1.12.0-rc2 -> v1.12.0-rc2 2025-09-07T09:36:20.2128529Z * [new tag] v1.12.0-rc3 -> v1.12.0-rc3 2025-09-07T09:36:20.2129673Z * [new tag] v1.12.0-rc4 -> v1.12.0-rc4 2025-09-07T09:36:20.2130685Z * [new tag] v1.12.0-rc5 -> v1.12.0-rc5 2025-09-07T09:36:20.2131863Z * [new tag] v1.12.0-rc6 -> v1.12.0-rc6 2025-09-07T09:36:20.2132681Z * [new tag] v1.12.0-rc7 -> v1.12.0-rc7 2025-09-07T09:36:20.2133493Z * [new tag] v1.12.0-rc8 -> v1.12.0-rc8 2025-09-07T09:36:20.2134453Z * [new tag] v1.12.1 -> v1.12.1 2025-09-07T09:36:20.2135975Z * [new tag] v1.12.1-rc1 -> v1.12.1-rc1 2025-09-07T09:36:20.2137044Z * [new tag] v1.12.1-rc2 -> v1.12.1-rc2 2025-09-07T09:36:20.2138234Z * [new tag] v1.12.1-rc3 -> v1.12.1-rc3 2025-09-07T09:36:20.2139338Z * [new tag] v1.12.1-rc4 -> v1.12.1-rc4 2025-09-07T09:36:20.2140188Z * [new tag] v1.12.1-rc5 -> v1.12.1-rc5 2025-09-07T09:36:20.2141311Z * [new tag] v1.13.0 -> v1.13.0 2025-09-07T09:36:20.2142575Z * [new tag] v1.13.0-rc1 -> v1.13.0-rc1 2025-09-07T09:36:20.2143683Z * [new tag] v1.13.0-rc2 -> v1.13.0-rc2 2025-09-07T09:36:20.2144789Z * [new tag] v1.13.0-rc3 -> v1.13.0-rc3 2025-09-07T09:36:20.2146419Z * [new tag] v1.13.0-rc4 -> v1.13.0-rc4 2025-09-07T09:36:20.2147139Z * [new tag] v1.13.0-rc5 -> v1.13.0-rc5 2025-09-07T09:36:20.2148046Z * [new tag] v1.13.0-rc6 -> v1.13.0-rc6 2025-09-07T09:36:20.2149160Z * [new tag] v1.13.1 -> v1.13.1 2025-09-07T09:36:20.2150097Z * [new tag] v1.13.1-rc1 -> v1.13.1-rc1 2025-09-07T09:36:20.2151149Z * [new tag] v1.2.0 -> v1.2.0 2025-09-07T09:36:20.2152280Z * [new tag] v1.2.0a0 -> v1.2.0a0 2025-09-07T09:36:20.2153363Z * [new tag] v1.3.0 -> v1.3.0 2025-09-07T09:36:20.2154465Z * [new tag] v1.3.0a0 -> v1.3.0a0 2025-09-07T09:36:20.2155609Z * [new tag] v1.3.1 -> v1.3.1 2025-09-07T09:36:20.2156763Z * [new tag] v1.4.0 -> v1.4.0 2025-09-07T09:36:20.2157899Z * [new tag] v1.4.0a0 -> v1.4.0a0 2025-09-07T09:36:20.2158806Z * [new tag] v1.4.1 -> v1.4.1 2025-09-07T09:36:20.2160009Z * [new tag] v1.5.0 -> v1.5.0 2025-09-07T09:36:20.2161179Z * [new tag] v1.5.0-rc1 -> v1.5.0-rc1 2025-09-07T09:36:20.2162331Z * [new tag] v1.5.0-rc2 -> v1.5.0-rc2 2025-09-07T09:36:20.2163481Z * [new tag] v1.5.0-rc3 -> v1.5.0-rc3 2025-09-07T09:36:20.2164508Z * [new tag] v1.5.0-rc4 -> v1.5.0-rc4 2025-09-07T09:36:20.2165685Z * [new tag] v1.5.0-rc5 -> v1.5.0-rc5 2025-09-07T09:36:20.2166961Z * [new tag] v1.5.1 -> v1.5.1 2025-09-07T09:36:20.2167877Z * [new tag] v1.5.1-rc1 -> v1.5.1-rc1 2025-09-07T09:36:20.2168766Z * [new tag] v1.6.0 -> v1.6.0 2025-09-07T09:36:20.2169875Z * [new tag] v1.6.0-rc1 -> v1.6.0-rc1 2025-09-07T09:36:20.2171116Z * [new tag] v1.6.0-rc2 -> v1.6.0-rc2 2025-09-07T09:36:20.2172242Z * [new tag] v1.6.0-rc3 -> v1.6.0-rc3 2025-09-07T09:36:20.2173372Z * [new tag] v1.6.0-rc4 -> v1.6.0-rc4 2025-09-07T09:36:20.2174457Z * [new tag] v1.6.0-rc5 -> v1.6.0-rc5 2025-09-07T09:36:20.2175792Z * [new tag] v1.6.0-rc6 -> v1.6.0-rc6 2025-09-07T09:36:20.2176750Z * [new tag] v1.6.0-rc7 -> v1.6.0-rc7 2025-09-07T09:36:20.2177914Z * [new tag] v1.7.0 -> v1.7.0 2025-09-07T09:36:20.2179143Z * [new tag] v1.7.0-rc1 -> v1.7.0-rc1 2025-09-07T09:36:20.2180382Z * [new tag] v1.7.0-rc2 -> v1.7.0-rc2 2025-09-07T09:36:20.2181558Z * [new tag] v1.7.0-rc3 -> v1.7.0-rc3 2025-09-07T09:36:20.2182571Z * [new tag] v1.7.0-rc4 -> v1.7.0-rc4 2025-09-07T09:36:20.2183720Z * [new tag] v1.7.1 -> v1.7.1 2025-09-07T09:36:20.2185042Z * [new tag] v1.7.1-rc1 -> v1.7.1-rc1 2025-09-07T09:36:20.2186312Z * [new tag] v1.7.1-rc2 -> v1.7.1-rc2 2025-09-07T09:36:20.2187232Z * [new tag] v1.7.1-rc3 -> v1.7.1-rc3 2025-09-07T09:36:20.2188375Z * [new tag] v1.8.0 -> v1.8.0 2025-09-07T09:36:20.2189335Z * [new tag] v1.8.0-rc1 -> v1.8.0-rc1 2025-09-07T09:36:20.2190484Z * [new tag] v1.8.0-rc2 -> v1.8.0-rc2 2025-09-07T09:36:20.2191673Z * [new tag] v1.8.0-rc3 -> v1.8.0-rc3 2025-09-07T09:36:20.2192969Z * [new tag] v1.8.0-rc4 -> v1.8.0-rc4 2025-09-07T09:36:20.2193859Z * [new tag] v1.8.0-rc5 -> v1.8.0-rc5 2025-09-07T09:36:20.2194727Z * [new tag] v1.8.1 -> v1.8.1 2025-09-07T09:36:20.2196124Z * [new tag] v1.8.1-rc1 -> v1.8.1-rc1 2025-09-07T09:36:20.2197076Z * [new tag] v1.8.1-rc2 -> v1.8.1-rc2 2025-09-07T09:36:20.2198038Z * [new tag] v1.8.1-rc3 -> v1.8.1-rc3 2025-09-07T09:36:20.2199601Z * [new tag] v1.8.2 -> v1.8.2 2025-09-07T09:36:20.2200540Z * [new tag] v1.8.2-rc1 -> v1.8.2-rc1 2025-09-07T09:36:20.2201704Z * [new tag] v1.9.0 -> v1.9.0 2025-09-07T09:36:20.2202800Z * [new tag] v1.9.0-rc1 -> v1.9.0-rc1 2025-09-07T09:36:20.2204038Z * [new tag] v1.9.0-rc2 -> v1.9.0-rc2 2025-09-07T09:36:20.2205417Z * [new tag] v1.9.0-rc3 -> v1.9.0-rc3 2025-09-07T09:36:20.2206460Z * [new tag] v1.9.0-rc4 -> v1.9.0-rc4 2025-09-07T09:36:20.2207762Z * [new tag] v1.9.1 -> v1.9.1 2025-09-07T09:36:20.2209158Z * [new tag] v1.9.1-rc1 -> v1.9.1-rc1 2025-09-07T09:36:20.2210112Z * [new tag] v1.9.1-rc2 -> v1.9.1-rc2 2025-09-07T09:36:20.2211319Z * [new tag] v2.0.0 -> v2.0.0 2025-09-07T09:36:20.2212404Z * [new tag] v2.0.0-rc1 -> v2.0.0-rc1 2025-09-07T09:36:20.2213665Z * [new tag] v2.0.0-rc2 -> v2.0.0-rc2 2025-09-07T09:36:20.2214850Z * [new tag] v2.0.0-rc3 -> v2.0.0-rc3 2025-09-07T09:36:20.2216284Z * [new tag] v2.0.0-rc4 -> v2.0.0-rc4 2025-09-07T09:36:20.2217385Z * [new tag] v2.0.0-rc5 -> v2.0.0-rc5 2025-09-07T09:36:20.2218312Z * [new tag] v2.0.0-rc6 -> v2.0.0-rc6 2025-09-07T09:36:20.2219583Z * [new tag] v2.0.1 -> v2.0.1 2025-09-07T09:36:20.2220763Z * [new tag] v2.0.1-rc1 -> v2.0.1-rc1 2025-09-07T09:36:20.2221857Z * [new tag] v2.0.1-rc2 -> v2.0.1-rc2 2025-09-07T09:36:20.2223086Z * [new tag] v2.0.1-rc3 -> v2.0.1-rc3 2025-09-07T09:36:20.2223969Z * [new tag] v2.0.1-rc4 -> v2.0.1-rc4 2025-09-07T09:36:20.2225885Z * [new tag] v2.1.0 -> v2.1.0 2025-09-07T09:36:20.2227008Z * [new tag] v2.1.0-rc1 -> v2.1.0-rc1 2025-09-07T09:36:20.2228164Z * [new tag] v2.1.0-rc2 -> v2.1.0-rc2 2025-09-07T09:36:20.2229472Z * [new tag] v2.1.0-rc3 -> v2.1.0-rc3 2025-09-07T09:36:20.2230751Z * [new tag] v2.1.0-rc4 -> v2.1.0-rc4 2025-09-07T09:36:20.2231948Z * [new tag] v2.1.0-rc5 -> v2.1.0-rc5 2025-09-07T09:36:20.2232824Z * [new tag] v2.1.0-rc6 -> v2.1.0-rc6 2025-09-07T09:36:20.2234126Z * [new tag] v2.1.1 -> v2.1.1 2025-09-07T09:36:20.2235606Z * [new tag] v2.1.1-rc1 -> v2.1.1-rc1 2025-09-07T09:36:20.2236720Z * [new tag] v2.1.1-rc2 -> v2.1.1-rc2 2025-09-07T09:36:20.2238009Z * [new tag] v2.1.1-rc3 -> v2.1.1-rc3 2025-09-07T09:36:20.2239214Z * [new tag] v2.1.1-rc4 -> v2.1.1-rc4 2025-09-07T09:36:20.2240415Z * [new tag] v2.1.1-rc5 -> v2.1.1-rc5 2025-09-07T09:36:20.2241566Z * [new tag] v2.1.1-rc6 -> v2.1.1-rc6 2025-09-07T09:36:20.2242579Z * [new tag] v2.1.2 -> v2.1.2 2025-09-07T09:36:20.2243828Z * [new tag] v2.1.2-rc1 -> v2.1.2-rc1 2025-09-07T09:36:20.2245301Z * [new tag] v2.1.2-rc2 -> v2.1.2-rc2 2025-09-07T09:36:20.2246341Z * [new tag] v2.1.2-rc3 -> v2.1.2-rc3 2025-09-07T09:36:20.2247557Z * [new tag] v2.2.0 -> v2.2.0 2025-09-07T09:36:20.2248772Z * [new tag] v2.2.0-rc1 -> v2.2.0-rc1 2025-09-07T09:36:20.2249940Z * [new tag] v2.2.0-rc2 -> v2.2.0-rc2 2025-09-07T09:36:20.2251100Z * [new tag] v2.2.0-rc3 -> v2.2.0-rc3 2025-09-07T09:36:20.2252221Z * [new tag] v2.2.0-rc4 -> v2.2.0-rc4 2025-09-07T09:36:20.2253344Z * [new tag] v2.2.0-rc5 -> v2.2.0-rc5 2025-09-07T09:36:20.2254539Z * [new tag] v2.2.0-rc6 -> v2.2.0-rc6 2025-09-07T09:36:20.2255762Z * [new tag] v2.2.0-rc7 -> v2.2.0-rc7 2025-09-07T09:36:20.2256787Z * [new tag] v2.2.0-rc8 -> v2.2.0-rc8 2025-09-07T09:36:20.2258057Z * [new tag] v2.2.1 -> v2.2.1 2025-09-07T09:36:20.2259327Z * [new tag] v2.2.1-rc1 -> v2.2.1-rc1 2025-09-07T09:36:20.2260343Z * [new tag] v2.2.1-rc2 -> v2.2.1-rc2 2025-09-07T09:36:20.2261325Z * [new tag] v2.2.1-rc3 -> v2.2.1-rc3 2025-09-07T09:36:20.2262461Z * [new tag] v2.2.2 -> v2.2.2 2025-09-07T09:36:20.2263751Z * [new tag] v2.2.2-rc1 -> v2.2.2-rc1 2025-09-07T09:36:20.2264729Z * [new tag] v2.2.2-rc2 -> v2.2.2-rc2 2025-09-07T09:36:20.2266000Z * [new tag] v2.2.2-rc3 -> v2.2.2-rc3 2025-09-07T09:36:20.2267151Z * [new tag] v2.3.0 -> v2.3.0 2025-09-07T09:36:20.2268330Z * [new tag] v2.3.0-rc1 -> v2.3.0-rc1 2025-09-07T09:36:20.2269575Z * [new tag] v2.3.0-rc10 -> v2.3.0-rc10 2025-09-07T09:36:20.2270764Z * [new tag] v2.3.0-rc11 -> v2.3.0-rc11 2025-09-07T09:36:20.2271811Z * [new tag] v2.3.0-rc12 -> v2.3.0-rc12 2025-09-07T09:36:20.2273021Z * [new tag] v2.3.0-rc2 -> v2.3.0-rc2 2025-09-07T09:36:20.2274259Z * [new tag] v2.3.0-rc3 -> v2.3.0-rc3 2025-09-07T09:36:20.2275665Z * [new tag] v2.3.0-rc4 -> v2.3.0-rc4 2025-09-07T09:36:20.2276886Z * [new tag] v2.3.0-rc5 -> v2.3.0-rc5 2025-09-07T09:36:20.2277880Z * [new tag] v2.3.0-rc6 -> v2.3.0-rc6 2025-09-07T09:36:20.2279130Z * [new tag] v2.3.0-rc7 -> v2.3.0-rc7 2025-09-07T09:36:20.2280352Z * [new tag] v2.3.0-rc8 -> v2.3.0-rc8 2025-09-07T09:36:20.2281361Z * [new tag] v2.3.0-rc9 -> v2.3.0-rc9 2025-09-07T09:36:20.2282378Z * [new tag] v2.3.1 -> v2.3.1 2025-09-07T09:36:20.2283589Z * [new tag] v2.3.1-rc1 -> v2.3.1-rc1 2025-09-07T09:36:20.2284766Z * [new tag] v2.3.1-rc2 -> v2.3.1-rc2 2025-09-07T09:36:20.2286907Z * [new tag] v2.3.1-rc3 -> v2.3.1-rc3 2025-09-07T09:36:20.2288089Z * [new tag] v2.4.0 -> v2.4.0 2025-09-07T09:36:20.2289213Z * [new tag] v2.4.0-rc1 -> v2.4.0-rc1 2025-09-07T09:36:20.2290583Z * [new tag] v2.4.0-rc2 -> v2.4.0-rc2 2025-09-07T09:36:20.2291651Z * [new tag] v2.4.0-rc3 -> v2.4.0-rc3 2025-09-07T09:36:20.2292735Z * [new tag] v2.4.0-rc4 -> v2.4.0-rc4 2025-09-07T09:36:20.2293985Z * [new tag] v2.4.0-rc5 -> v2.4.0-rc5 2025-09-07T09:36:20.2295434Z * [new tag] v2.4.0-rc6 -> v2.4.0-rc6 2025-09-07T09:36:20.2296741Z * [new tag] v2.4.0-rc7 -> v2.4.0-rc7 2025-09-07T09:36:20.2297877Z * [new tag] v2.4.0-rc8 -> v2.4.0-rc8 2025-09-07T09:36:20.2299201Z * [new tag] v2.4.0-rc9 -> v2.4.0-rc9 2025-09-07T09:36:20.2300200Z * [new tag] v2.4.1 -> v2.4.1 2025-09-07T09:36:20.2301411Z * [new tag] v2.4.1-rc1 -> v2.4.1-rc1 2025-09-07T09:36:20.2302785Z * [new tag] v2.4.1-rc2 -> v2.4.1-rc2 2025-09-07T09:36:20.2303991Z * [new tag] v2.4.1-rc3 -> v2.4.1-rc3 2025-09-07T09:36:20.2305388Z * [new tag] v2.5.0 -> v2.5.0 2025-09-07T09:36:20.2306688Z * [new tag] v2.5.0-rc1 -> v2.5.0-rc1 2025-09-07T09:36:20.2307679Z * [new tag] v2.5.0-rc10 -> v2.5.0-rc10 2025-09-07T09:36:20.2308868Z * [new tag] v2.5.0-rc2 -> v2.5.0-rc2 2025-09-07T09:36:20.2310002Z * [new tag] v2.5.0-rc3 -> v2.5.0-rc3 2025-09-07T09:36:20.2311155Z * [new tag] v2.5.0-rc4 -> v2.5.0-rc4 2025-09-07T09:36:20.2312367Z * [new tag] v2.5.0-rc5 -> v2.5.0-rc5 2025-09-07T09:36:20.2313596Z * [new tag] v2.5.0-rc6 -> v2.5.0-rc6 2025-09-07T09:36:20.2314840Z * [new tag] v2.5.0-rc7 -> v2.5.0-rc7 2025-09-07T09:36:20.2316301Z * [new tag] v2.5.0-rc8 -> v2.5.0-rc8 2025-09-07T09:36:20.2317449Z * [new tag] v2.5.0-rc9 -> v2.5.0-rc9 2025-09-07T09:36:20.2318467Z * [new tag] v2.5.1 -> v2.5.1 2025-09-07T09:36:20.2319469Z * [new tag] v2.5.1-rc1 -> v2.5.1-rc1 2025-09-07T09:36:20.2320471Z * [new tag] v2.6.0 -> v2.6.0 2025-09-07T09:36:20.2321756Z * [new tag] v2.6.0-rc1 -> v2.6.0-rc1 2025-09-07T09:36:20.2323019Z * [new tag] v2.6.0-rc2 -> v2.6.0-rc2 2025-09-07T09:36:20.2324199Z * [new tag] v2.6.0-rc3 -> v2.6.0-rc3 2025-09-07T09:36:20.2325610Z * [new tag] v2.6.0-rc4 -> v2.6.0-rc4 2025-09-07T09:36:20.2327039Z * [new tag] v2.6.0-rc5 -> v2.6.0-rc5 2025-09-07T09:36:20.2328414Z * [new tag] v2.6.0-rc6 -> v2.6.0-rc6 2025-09-07T09:36:20.2329594Z * [new tag] v2.6.0-rc7 -> v2.6.0-rc7 2025-09-07T09:36:20.2330899Z * [new tag] v2.6.0-rc8 -> v2.6.0-rc8 2025-09-07T09:36:20.2332109Z * [new tag] v2.6.0-rc9 -> v2.6.0-rc9 2025-09-07T09:36:20.2333554Z * [new tag] v2.7.0 -> v2.7.0 2025-09-07T09:36:20.2334813Z * [new tag] v2.7.0-rc1 -> v2.7.0-rc1 2025-09-07T09:36:20.2336047Z * [new tag] v2.7.0-rc10 -> v2.7.0-rc10 2025-09-07T09:36:20.2337246Z * [new tag] v2.7.0-rc2 -> v2.7.0-rc2 2025-09-07T09:36:20.2338556Z * [new tag] v2.7.0-rc3 -> v2.7.0-rc3 2025-09-07T09:36:20.2339778Z * [new tag] v2.7.0-rc4 -> v2.7.0-rc4 2025-09-07T09:36:20.2341134Z * [new tag] v2.7.0-rc5 -> v2.7.0-rc5 2025-09-07T09:36:20.2342332Z * [new tag] v2.7.0-rc6 -> v2.7.0-rc6 2025-09-07T09:36:20.2343558Z * [new tag] v2.7.0-rc7 -> v2.7.0-rc7 2025-09-07T09:36:20.2344776Z * [new tag] v2.7.0-rc8 -> v2.7.0-rc8 2025-09-07T09:36:20.2346373Z * [new tag] v2.7.0-rc9 -> v2.7.0-rc9 2025-09-07T09:36:20.2347421Z * [new tag] v2.7.1 -> v2.7.1 2025-09-07T09:36:20.2348697Z * [new tag] v2.7.1-rc1 -> v2.7.1-rc1 2025-09-07T09:36:20.2350001Z * [new tag] v2.7.1-rc2 -> v2.7.1-rc2 2025-09-07T09:36:20.2351185Z * [new tag] v2.7.1-rc3 -> v2.7.1-rc3 2025-09-07T09:36:20.2352430Z * [new tag] v2.7.1-rc4 -> v2.7.1-rc4 2025-09-07T09:36:20.2353441Z * [new tag] v2.7.1-rc5 -> v2.7.1-rc5 2025-09-07T09:36:20.2354473Z * [new tag] v2.8.0 -> v2.8.0 2025-09-07T09:36:20.2355927Z * [new tag] v2.8.0-rc1 -> v2.8.0-rc1 2025-09-07T09:36:20.2357095Z * [new tag] v2.8.0-rc2 -> v2.8.0-rc2 2025-09-07T09:36:20.2358411Z * [new tag] v2.8.0-rc3 -> v2.8.0-rc3 2025-09-07T09:36:20.2359673Z * [new tag] v2.8.0-rc4 -> v2.8.0-rc4 2025-09-07T09:36:20.2360962Z * [new tag] v2.8.0-rc5 -> v2.8.0-rc5 2025-09-07T09:36:20.2362351Z * [new tag] v2.8.0-rc6 -> v2.8.0-rc6 2025-09-07T09:36:20.2363493Z * [new tag] v2.8.0-rc7 -> v2.8.0-rc7 2025-09-07T09:36:20.2364744Z * [new tag] v2.8.0-rc8 -> v2.8.0-rc8 2025-09-07T09:36:20.2366225Z * [new tag] whc_flight_1 -> whc_flight_1 2025-09-07T09:36:20.2367521Z * [new tag] whc_flight_2 -> whc_flight_2 2025-09-07T09:36:20.2368548Z * [new tag] whc_flight_4 -> whc_flight_4 2025-09-07T09:36:20.3275666Z [command]/usr/bin/git rev-parse --verify --quiet 93fb23d6fae7c4e82c4239a1033e522088742634^{object} 2025-09-07T09:36:20.3305867Z 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T09:36:20.3310282Z ##[endgroup] 2025-09-07T09:36:20.3310643Z ##[group]Determining the checkout info 2025-09-07T09:36:20.3311568Z ##[endgroup] 2025-09-07T09:36:20.3315285Z [command]/usr/bin/git sparse-checkout disable 2025-09-07T09:36:20.3365390Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-09-07T09:36:20.3394769Z ##[group]Checking out the ref 2025-09-07T09:36:20.3398011Z [command]/usr/bin/git checkout --progress --force 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T09:36:21.3717504Z Updating files: 84% (16351/19405) 2025-09-07T09:36:21.3869007Z Updating files: 85% (16495/19405) 2025-09-07T09:36:21.4003753Z Updating files: 86% (16689/19405) 2025-09-07T09:36:21.4133054Z Updating files: 87% (16883/19405) 2025-09-07T09:36:21.4233927Z Updating files: 88% (17077/19405) 2025-09-07T09:36:21.4371904Z Updating files: 89% (17271/19405) 2025-09-07T09:36:21.4534395Z Updating files: 90% (17465/19405) 2025-09-07T09:36:21.4645560Z Updating files: 91% (17659/19405) 2025-09-07T09:36:21.4782702Z Updating files: 92% (17853/19405) 2025-09-07T09:36:21.4961301Z Updating files: 93% (18047/19405) 2025-09-07T09:36:21.5152406Z Updating files: 94% (18241/19405) 2025-09-07T09:36:21.5305616Z Updating files: 95% (18435/19405) 2025-09-07T09:36:21.5458780Z Updating files: 96% (18629/19405) 2025-09-07T09:36:21.5626990Z Updating files: 97% (18823/19405) 2025-09-07T09:36:21.5874347Z Updating files: 98% (19017/19405) 2025-09-07T09:36:21.6018175Z Updating files: 99% (19211/19405) 2025-09-07T09:36:21.6018470Z Updating files: 100% (19405/19405) 2025-09-07T09:36:21.6018740Z Updating files: 100% (19405/19405), done. 2025-09-07T09:36:21.6256519Z Note: switching to '93fb23d6fae7c4e82c4239a1033e522088742634'. 2025-09-07T09:36:21.6256835Z 2025-09-07T09:36:21.6257053Z You are in 'detached HEAD' state. You can look around, make experimental 2025-09-07T09:36:21.6257555Z changes and commit them, and you can discard any commits you make in this 2025-09-07T09:36:21.6258070Z state without impacting any branches by switching back to a branch. 2025-09-07T09:36:21.6258375Z 2025-09-07T09:36:21.6258570Z If you want to create a new branch to retain commits you create, you may 2025-09-07T09:36:21.6259020Z do so (now or later) by using -c with the switch command. Example: 2025-09-07T09:36:21.6259284Z 2025-09-07T09:36:21.6259409Z git switch -c 2025-09-07T09:36:21.6259647Z 2025-09-07T09:36:21.6259780Z Or undo this operation with: 2025-09-07T09:36:21.6259965Z 2025-09-07T09:36:21.6260058Z git switch - 2025-09-07T09:36:21.6260183Z 2025-09-07T09:36:21.6260414Z Turn off this advice by setting config variable advice.detachedHead to false 2025-09-07T09:36:21.6260741Z 2025-09-07T09:36:21.6260909Z HEAD is now at 93fb23d6fae Build vLLM nightly wheels (#162000) 2025-09-07T09:36:21.6398824Z ##[endgroup] 2025-09-07T09:36:21.6399198Z ##[group]Setting up auth for fetching submodules 2025-09-07T09:36:21.6404907Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-09-07T09:36:21.6456882Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-09-07T09:36:21.6492764Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-09-07T09:36:21.6524887Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-09-07T09:36:21.6555660Z ##[endgroup] 2025-09-07T09:36:21.6556066Z ##[group]Fetching submodules 2025-09-07T09:36:21.6558809Z [command]/usr/bin/git submodule sync --recursive 2025-09-07T09:36:21.6838073Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2025-09-07T09:36:21.7112526Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni' 2025-09-07T09:36:21.7126172Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16' 2025-09-07T09:36:21.7139354Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv' 2025-09-07T09:36:21.7152106Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK' 2025-09-07T09:36:21.7165163Z Submodule 'third_party/NVTX' (https://github.com/NVIDIA/NVTX.git) registered for path 'third_party/NVTX' 2025-09-07T09:36:21.7178388Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator' 2025-09-07T09:36:21.7191930Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK' 2025-09-07T09:36:21.7204561Z Submodule 'third_party/aiter' (https://github.com/ROCm/aiter.git) registered for path 'third_party/aiter' 2025-09-07T09:36:21.7217244Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark' 2025-09-07T09:36:21.7229603Z Submodule 'third_party/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/composable_kernel' 2025-09-07T09:36:21.7241584Z Submodule 'third_party/cpp-httplib' (https://github.com/yhirose/cpp-httplib.git) registered for path 'third_party/cpp-httplib' 2025-09-07T09:36:21.7253867Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo' 2025-09-07T09:36:21.7266912Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend' 2025-09-07T09:36:21.7279336Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass' 2025-09-07T09:36:21.7291943Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm' 2025-09-07T09:36:21.7303566Z Submodule 'third_party/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'third_party/flash-attention' 2025-09-07T09:36:21.7315232Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers' 2025-09-07T09:36:21.7327730Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt' 2025-09-07T09:36:21.7339615Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp' 2025-09-07T09:36:21.7351491Z Submodule 'third_party/gloo' (https://github.com/pytorch/gloo) registered for path 'third_party/gloo' 2025-09-07T09:36:21.7363217Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest' 2025-09-07T09:36:21.7375050Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep' 2025-09-07T09:36:21.7387173Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi' 2025-09-07T09:36:21.7398919Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto' 2025-09-07T09:36:21.7410366Z Submodule 'third_party/kleidiai' (https://github.com/ARM-software/kleidiai.git) registered for path 'third_party/kleidiai' 2025-09-07T09:36:21.7422662Z Submodule 'third_party/mimalloc' (https://github.com/microsoft/mimalloc.git) registered for path 'third_party/mimalloc' 2025-09-07T09:36:21.7434297Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann' 2025-09-07T09:36:21.7446752Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx' 2025-09-07T09:36:21.7458973Z Submodule 'third_party/opentelemetry-cpp' (https://github.com/open-telemetry/opentelemetry-cpp.git) registered for path 'third_party/opentelemetry-cpp' 2025-09-07T09:36:21.7470724Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft' 2025-09-07T09:36:21.7482823Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf' 2025-09-07T09:36:21.7494645Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd' 2025-09-07T09:36:21.7506613Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool' 2025-09-07T09:36:21.7519053Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11' 2025-09-07T09:36:21.7530676Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy' 2025-09-07T09:36:21.7552161Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef' 2025-09-07T09:36:21.7555407Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe' 2025-09-07T09:36:21.7589723Z Cloning into '/home/eve/_work/pytorch/pytorch/android/libs/fbjni'... 2025-09-07T09:36:22.1702175Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/FP16'... 2025-09-07T09:36:22.4860659Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/FXdiv'... 2025-09-07T09:36:22.7566904Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/NNPACK'... 2025-09-07T09:36:23.1733093Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/NVTX'... 2025-09-07T09:36:23.9926778Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'... 2025-09-07T09:36:25.1454290Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/XNNPACK'... 2025-09-07T09:36:35.1298494Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/aiter'... 2025-09-07T09:36:38.3475369Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/benchmark'... 2025-09-07T09:36:38.9101101Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/composable_kernel'... 2025-09-07T09:36:41.9276838Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/cpp-httplib'... 2025-09-07T09:36:42.5449278Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/cpuinfo'... 2025-09-07T09:36:43.2657137Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/cudnn_frontend'... 2025-09-07T09:36:44.3473864Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/cutlass'... 2025-09-07T09:36:46.2302568Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/fbgemm'... 2025-09-07T09:36:48.4874845Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/flash-attention'... 2025-09-07T09:36:49.1902372Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/flatbuffers'... 2025-09-07T09:36:50.5532015Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/fmt'... 2025-09-07T09:36:51.5491357Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'... 2025-09-07T09:36:52.6023339Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/gloo'... 2025-09-07T09:36:53.0925142Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/googletest'... 2025-09-07T09:36:54.3524339Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/ideep'... 2025-09-07T09:36:55.1658762Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/ittapi'... 2025-09-07T09:36:55.7759860Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/kineto'... 2025-09-07T09:36:57.3996361Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/kleidiai'... 2025-09-07T09:36:57.9626994Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/mimalloc'... 2025-09-07T09:36:58.8737216Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/nlohmann'... 2025-09-07T09:37:04.3250183Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/onnx'... 2025-09-07T09:37:09.3631056Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/opentelemetry-cpp'... 2025-09-07T09:37:14.2967494Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/pocketfft'... 2025-09-07T09:37:14.6729441Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/protobuf'... 2025-09-07T09:37:21.8802243Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/psimd'... 2025-09-07T09:37:22.1614513Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/pthreadpool'... 2025-09-07T09:37:22.5285892Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/pybind11'... 2025-09-07T09:37:23.4276888Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/python-peachpy'... 2025-09-07T09:37:23.8417651Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/sleef'... 2025-09-07T09:37:24.7618469Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/tensorpipe'... 2025-09-07T09:37:25.3568055Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2025-09-07T09:37:25.3732537Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2025-09-07T09:37:25.3849781Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2025-09-07T09:37:25.4129092Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2025-09-07T09:37:25.4964628Z Submodule path 'third_party/NVTX': checked out '2942f167cc30c5e3a44a2aecd5b0d9c07ff61a07' 2025-09-07T09:37:25.5560521Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1' 2025-09-07T09:37:26.3376111Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883' 2025-09-07T09:37:26.4979176Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150' 2025-09-07T09:37:26.5015263Z Submodule '3rdparty/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T09:37:26.5048844Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/aiter/3rdparty/composable_kernel'... 2025-09-07T09:37:29.8386002Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf' 2025-09-07T09:37:29.8640474Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f' 2025-09-07T09:37:30.2273075Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-09-07T09:37:30.2789477Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246' 2025-09-07T09:37:30.3781059Z Submodule path 'third_party/cpuinfo': checked out '5e3d2445e6a84d9599bee2bf78edbb4d80865e1d' 2025-09-07T09:37:30.4231520Z Submodule path 'third_party/cudnn_frontend': checked out 'f937055efc6d414d11f4c6577e3977fe74f35fb6' 2025-09-07T09:37:31.0863360Z Submodule path 'third_party/cutlass': checked out 'e51efbfe18fe4f4cbb66ab814c55bf4aa0185491' 2025-09-07T09:37:31.2392161Z Submodule path 'third_party/fbgemm': checked out '4b39c551efe15e6bbade20565b0ceb2d8ce3352d' 2025-09-07T09:37:31.2426488Z Submodule 'external/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/external/asmjit' 2025-09-07T09:37:31.2438115Z Submodule 'external/composable_kernel' (https://github.com/jwfromm/composable_kernel.git) registered for path 'third_party/fbgemm/external/composable_kernel' 2025-09-07T09:37:31.2450455Z Submodule 'external/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/external/cpuinfo' 2025-09-07T09:37:31.2463072Z Submodule 'external/cutlass' (https://github.com/jwfromm/cutlass) registered for path 'third_party/fbgemm/external/cutlass' 2025-09-07T09:37:31.2475780Z Submodule 'external/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/external/googletest' 2025-09-07T09:37:31.2488261Z Submodule 'external/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/external/hipify_torch' 2025-09-07T09:37:31.2500153Z Submodule 'external/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/fbgemm/external/json' 2025-09-07T09:37:31.2538065Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/fbgemm/external/asmjit'... 2025-09-07T09:37:32.3702388Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/fbgemm/external/composable_kernel'... 2025-09-07T09:37:33.3992294Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/fbgemm/external/cpuinfo'... 2025-09-07T09:37:34.1145711Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/fbgemm/external/cutlass'... 2025-09-07T09:37:35.7860084Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/fbgemm/external/googletest'... 2025-09-07T09:37:36.6679326Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/fbgemm/external/hipify_torch'... 2025-09-07T09:37:37.0875124Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/fbgemm/external/json'... 2025-09-07T09:37:43.6787794Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea' 2025-09-07T09:37:43.9473372Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out 'b1281b8b08d973a7064f864f47eeb30f3e2596e9' 2025-09-07T09:37:44.0494662Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-09-07T09:37:44.6963418Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '311f3c8e51dc0eb56310cfc6980bf63d0fbd7917' 2025-09-07T09:37:44.7437171Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-09-07T09:37:44.7572322Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691' 2025-09-07T09:37:44.8634362Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-09-07T09:37:44.9619190Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2025-09-07T09:37:44.9644218Z Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T09:37:44.9651097Z Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/flash-attention/csrc/cutlass' 2025-09-07T09:37:44.9686360Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/flash-attention/csrc/composable_kernel'... 2025-09-07T09:37:47.9374789Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/flash-attention/csrc/cutlass'... 2025-09-07T09:37:50.1458257Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2025-09-07T09:37:50.7299698Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2025-09-07T09:37:50.8792135Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757' 2025-09-07T09:37:50.9134760Z Submodule path 'third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21' 2025-09-07T09:37:50.9554731Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2025-09-07T09:37:50.9817215Z Submodule path 'third_party/gloo': checked out 'c7b7b022c124d9643957d9bd55f57ac59fce8fa2' 2025-09-07T09:37:51.0269885Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-09-07T09:37:51.0416126Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3' 2025-09-07T09:37:51.0444799Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn' 2025-09-07T09:37:51.0474351Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'... 2025-09-07T09:38:02.1729572Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d' 2025-09-07T09:38:02.1967255Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959' 2025-09-07T09:38:02.2869147Z Submodule path 'third_party/kineto': checked out '5e7501833f1021ce6f618572d3baf657b6319658' 2025-09-07T09:38:02.2908917Z Submodule 'libkineto/third_party/dynolog' (https://github.com/facebookincubator/dynolog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T09:38:02.2922986Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T09:38:02.2931610Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T09:38:02.2965093Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog'... 2025-09-07T09:38:03.2520668Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'... 2025-09-07T09:38:04.3604505Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'... 2025-09-07T09:38:05.7194198Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out '7d04a0053a845370ae06ce317a22a48e9edcc74e' 2025-09-07T09:38:05.7509511Z Submodule 'third_party/DCGM' (https://github.com/NVIDIA/DCGM.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T09:38:05.7605235Z Submodule 'third_party/cpr' (https://github.com/libcpr/cpr.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T09:38:05.8436569Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T09:38:05.8913554Z Submodule 'third_party/gflags' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T09:38:05.9364282Z Submodule 'third_party/glog' (https://github.com/google/glog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T09:38:05.9609635Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T09:38:05.9991894Z Submodule 'third_party/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T09:38:06.0364381Z Submodule 'third_party/pfs' (https://github.com/dtrugman/pfs.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T09:38:06.0400581Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'... 2025-09-07T09:38:07.7440944Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'... 2025-09-07T09:38:08.3529248Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'... 2025-09-07T09:38:11.2832107Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'... 2025-09-07T09:38:13.3753596Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/glog'... 2025-09-07T09:38:16.8309230Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'... 2025-09-07T09:38:20.6241560Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json'... 2025-09-07T09:38:34.7349605Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'... 2025-09-07T09:38:37.4618768Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9' 2025-09-07T09:38:37.5538983Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400' 2025-09-07T09:38:37.6466541Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2025-09-07T09:38:37.7334796Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067' 2025-09-07T09:38:37.9281488Z Submodule 'doc' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T09:38:37.9318569Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'... 2025-09-07T09:38:39.7223711Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4' 2025-09-07T09:38:39.7702224Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446' 2025-09-07T09:38:39.8522445Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '58d77fa8070e8cec2dc1ed015d66b454c8d78850' 2025-09-07T09:38:40.0202377Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5' 2025-09-07T09:38:40.0466967Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150' 2025-09-07T09:38:40.0900027Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '0041a40c1350ba702d475b9c4ad62da77caea164' 2025-09-07T09:38:40.1507951Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347' 2025-09-07T09:38:40.2194011Z Submodule path 'third_party/kleidiai': checked out 'cca02c2f69dd18e1f12647c1c0bdc8cf90e680c7' 2025-09-07T09:38:40.2759936Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e' 2025-09-07T09:38:40.3952659Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72' 2025-09-07T09:38:41.4007936Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83' 2025-09-07T09:38:41.4931697Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11' 2025-09-07T09:38:41.4964845Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'... 2025-09-07T09:38:44.5075297Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4' 2025-09-07T09:38:44.6830160Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878' 2025-09-07T09:38:44.7760830Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark) registered for path 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T09:38:44.8452737Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T09:38:45.1847808Z Submodule 'third_party/ms-gsl' (https://github.com/microsoft/GSL) registered for path 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T09:38:45.2791409Z Submodule 'third_party/nlohmann-json' (https://github.com/nlohmann/json) registered for path 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T09:38:45.2911721Z Submodule 'third_party/opentelemetry-proto' (https://github.com/open-telemetry/opentelemetry-proto) registered for path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T09:38:45.3551736Z Submodule 'third_party/opentracing-cpp' (https://github.com/opentracing/opentracing-cpp.git) registered for path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T09:38:45.4553414Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T09:38:45.6309543Z Submodule 'tools/vcpkg' (https://github.com/Microsoft/vcpkg) registered for path 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T09:38:45.6380022Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/benchmark'... 2025-09-07T09:38:46.7082015Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/googletest'... 2025-09-07T09:38:49.6222672Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/ms-gsl'... 2025-09-07T09:38:50.7008116Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/nlohmann-json'... 2025-09-07T09:39:03.2916047Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentelemetry-proto'... 2025-09-07T09:39:06.4842215Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp'... 2025-09-07T09:39:09.4236625Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp'... 2025-09-07T09:39:12.4239569Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/opentelemetry-cpp/tools/vcpkg'... 2025-09-07T09:39:24.3206558Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2' 2025-09-07T09:39:24.4597560Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1' 2025-09-07T09:39:24.5489948Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa' 2025-09-07T09:39:25.1654041Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d' 2025-09-07T09:39:25.3707466Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce' 2025-09-07T09:39:25.8008437Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5' 2025-09-07T09:39:25.8401854Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d' 2025-09-07T09:39:25.9251833Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T09:39:26.1481308Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T09:39:26.1514507Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'... 2025-09-07T09:39:28.8412260Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'... 2025-09-07T09:39:32.4344879Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4' 2025-09-07T09:39:32.5232013Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-09-07T09:39:33.6338834Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50' 2025-09-07T09:39:33.8230081Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa' 2025-09-07T09:39:34.6494132Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2025-09-07T09:39:34.8357669Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark' 2025-09-07T09:39:35.0149122Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest' 2025-09-07T09:39:35.0189066Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'... 2025-09-07T09:39:38.7392879Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'... 2025-09-07T09:39:43.0915720Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2025-09-07T09:39:43.6998729Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2025-09-07T09:39:43.7853709Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2025-09-07T09:39:44.3355493Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8' 2025-09-07T09:39:44.5504902Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8' 2025-09-07T09:39:44.6421827Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2025-09-07T09:39:44.7787776Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68' 2025-09-07T09:39:45.0134184Z Submodule path 'third_party/tensorpipe': checked out 'af0118d13e52f5a08841464a768e01a0bf3e3075' 2025-09-07T09:39:45.1688942Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest' 2025-09-07T09:39:45.3679504Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop' 2025-09-07T09:39:45.5563071Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv' 2025-09-07T09:39:45.6007621Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T09:39:45.6045240Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'... 2025-09-07T09:39:49.2297353Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'... 2025-09-07T09:39:51.8558531Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'... 2025-09-07T09:39:56.6453580Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'... 2025-09-07T09:39:59.3862881Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2025-09-07T09:39:59.4080726Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2025-09-07T09:39:59.5009994Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b' 2025-09-07T09:39:59.5354714Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2025-09-07T09:39:59.5464200Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T09:39:59.5497066Z Cloning into '/home/eve/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'... 2025-09-07T09:40:00.7658686Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2025-09-07T09:40:00.7707572Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2025-09-07T09:40:00.7992921Z Entering 'android/libs/fbjni' 2025-09-07T09:40:00.8054802Z Entering 'third_party/FP16' 2025-09-07T09:40:00.8103652Z Entering 'third_party/FXdiv' 2025-09-07T09:40:00.8154527Z Entering 'third_party/NNPACK' 2025-09-07T09:40:00.8205906Z Entering 'third_party/NVTX' 2025-09-07T09:40:00.8261453Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T09:40:00.8313205Z Entering 'third_party/XNNPACK' 2025-09-07T09:40:00.8399161Z Entering 'third_party/aiter' 2025-09-07T09:40:00.8462367Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T09:40:00.8528766Z Entering 'third_party/benchmark' 2025-09-07T09:40:00.8585154Z Entering 'third_party/composable_kernel' 2025-09-07T09:40:00.8643833Z Entering 'third_party/cpp-httplib' 2025-09-07T09:40:00.8713804Z Entering 'third_party/cpuinfo' 2025-09-07T09:40:00.8787803Z Entering 'third_party/cudnn_frontend' 2025-09-07T09:40:00.8851693Z Entering 'third_party/cutlass' 2025-09-07T09:40:00.8915891Z Entering 'third_party/fbgemm' 2025-09-07T09:40:00.8966992Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T09:40:00.9013501Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T09:40:00.9080896Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T09:40:00.9128193Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T09:40:00.9178711Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T09:40:00.9222371Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T09:40:00.9274044Z Entering 'third_party/fbgemm/external/json' 2025-09-07T09:40:00.9326385Z Entering 'third_party/flash-attention' 2025-09-07T09:40:00.9426465Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T09:40:00.9485920Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T09:40:00.9552058Z Entering 'third_party/flatbuffers' 2025-09-07T09:40:00.9613704Z Entering 'third_party/fmt' 2025-09-07T09:40:00.9672367Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T09:40:00.9882672Z Entering 'third_party/gloo' 2025-09-07T09:40:01.0348673Z Entering 'third_party/googletest' 2025-09-07T09:40:01.0693623Z Entering 'third_party/ideep' 2025-09-07T09:40:01.0823134Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T09:40:01.1269047Z Entering 'third_party/ittapi' 2025-09-07T09:40:01.1561438Z Entering 'third_party/kineto' 2025-09-07T09:40:01.1892421Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T09:40:01.2234657Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T09:40:01.2552704Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T09:40:01.2926498Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T09:40:01.3427298Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T09:40:01.3878009Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T09:40:01.4360788Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T09:40:01.4782558Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T09:40:01.5260117Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T09:40:01.5750825Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T09:40:01.6200489Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T09:40:01.6623286Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T09:40:01.6972995Z Entering 'third_party/kleidiai' 2025-09-07T09:40:01.7379005Z Entering 'third_party/mimalloc' 2025-09-07T09:40:01.7822068Z Entering 'third_party/nlohmann' 2025-09-07T09:40:01.8206287Z Entering 'third_party/onnx' 2025-09-07T09:40:01.8722839Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T09:40:01.9096663Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T09:40:01.9350465Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T09:40:01.9494907Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T09:40:01.9555519Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T09:40:01.9622069Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T09:40:01.9679666Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T09:40:02.0005726Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T09:40:02.0105499Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T09:40:02.0210776Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T09:40:02.0282183Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T09:40:02.0330974Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T09:40:02.0532801Z Entering 'third_party/pocketfft' 2025-09-07T09:40:02.0594281Z Entering 'third_party/protobuf' 2025-09-07T09:40:02.0641275Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T09:40:02.0695382Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T09:40:02.0748985Z Entering 'third_party/psimd' 2025-09-07T09:40:02.0844709Z Entering 'third_party/pthreadpool' 2025-09-07T09:40:02.0903951Z Entering 'third_party/pybind11' 2025-09-07T09:40:02.1001805Z Entering 'third_party/python-peachpy' 2025-09-07T09:40:02.1069209Z Entering 'third_party/sleef' 2025-09-07T09:40:02.1141852Z Entering 'third_party/tensorpipe' 2025-09-07T09:40:02.1212305Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T09:40:02.1265720Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T09:40:02.1309935Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T09:40:02.1364845Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T09:40:02.1408194Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T09:40:02.1499048Z ##[endgroup] 2025-09-07T09:40:02.1499480Z ##[group]Persisting credentials for submodules 2025-09-07T09:40:02.1506032Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-09-07T09:40:02.1790393Z Entering 'android/libs/fbjni' 2025-09-07T09:40:02.1847280Z Entering 'third_party/FP16' 2025-09-07T09:40:02.1901564Z Entering 'third_party/FXdiv' 2025-09-07T09:40:02.1953759Z Entering 'third_party/NNPACK' 2025-09-07T09:40:02.2008911Z Entering 'third_party/NVTX' 2025-09-07T09:40:02.2067185Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T09:40:02.2121953Z Entering 'third_party/XNNPACK' 2025-09-07T09:40:02.2192415Z Entering 'third_party/aiter' 2025-09-07T09:40:02.2245841Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T09:40:02.2308995Z Entering 'third_party/benchmark' 2025-09-07T09:40:02.2362868Z Entering 'third_party/composable_kernel' 2025-09-07T09:40:02.2424485Z Entering 'third_party/cpp-httplib' 2025-09-07T09:40:02.2478015Z Entering 'third_party/cpuinfo' 2025-09-07T09:40:02.2530820Z Entering 'third_party/cudnn_frontend' 2025-09-07T09:40:02.2585171Z Entering 'third_party/cutlass' 2025-09-07T09:40:02.2646920Z Entering 'third_party/fbgemm' 2025-09-07T09:40:02.2702706Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T09:40:02.2753188Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T09:40:02.2816809Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T09:40:02.2871262Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T09:40:02.2934262Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T09:40:02.2986297Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T09:40:02.3041897Z Entering 'third_party/fbgemm/external/json' 2025-09-07T09:40:02.3095953Z Entering 'third_party/flash-attention' 2025-09-07T09:40:02.3148827Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T09:40:02.3206002Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T09:40:02.3266685Z Entering 'third_party/flatbuffers' 2025-09-07T09:40:02.3318998Z Entering 'third_party/fmt' 2025-09-07T09:40:02.3369937Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T09:40:02.3423403Z Entering 'third_party/gloo' 2025-09-07T09:40:02.3475832Z Entering 'third_party/googletest' 2025-09-07T09:40:02.3525552Z Entering 'third_party/ideep' 2025-09-07T09:40:02.3581281Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T09:40:02.3643644Z Entering 'third_party/ittapi' 2025-09-07T09:40:02.3694504Z Entering 'third_party/kineto' 2025-09-07T09:40:02.3746503Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T09:40:02.3792262Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T09:40:02.3858716Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T09:40:02.3913617Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T09:40:02.3962223Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T09:40:02.4011576Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T09:40:02.4067102Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T09:40:02.4118787Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T09:40:02.4176271Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T09:40:02.4230555Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T09:40:02.4289267Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T09:40:02.4346865Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T09:40:02.4403998Z Entering 'third_party/kleidiai' 2025-09-07T09:40:02.4453501Z Entering 'third_party/mimalloc' 2025-09-07T09:40:02.4508135Z Entering 'third_party/nlohmann' 2025-09-07T09:40:02.4562242Z Entering 'third_party/onnx' 2025-09-07T09:40:02.4629073Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T09:40:02.4686440Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T09:40:02.4738503Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T09:40:02.4788788Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T09:40:02.4836020Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T09:40:02.4891087Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T09:40:02.4941065Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T09:40:02.4987810Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T09:40:02.5035307Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T09:40:02.5082024Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T09:40:02.5131704Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T09:40:02.5186517Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T09:40:02.5251763Z Entering 'third_party/pocketfft' 2025-09-07T09:40:02.5303089Z Entering 'third_party/protobuf' 2025-09-07T09:40:02.5356310Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T09:40:02.5409253Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T09:40:02.5469545Z Entering 'third_party/psimd' 2025-09-07T09:40:02.5525807Z Entering 'third_party/pthreadpool' 2025-09-07T09:40:02.5582990Z Entering 'third_party/pybind11' 2025-09-07T09:40:02.5639371Z Entering 'third_party/python-peachpy' 2025-09-07T09:40:02.5695295Z Entering 'third_party/sleef' 2025-09-07T09:40:02.5751034Z Entering 'third_party/tensorpipe' 2025-09-07T09:40:02.5801221Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T09:40:02.5855663Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T09:40:02.5910924Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T09:40:02.5960349Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T09:40:02.6005560Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T09:40:02.6082341Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-09-07T09:40:02.6354356Z Entering 'android/libs/fbjni' 2025-09-07T09:40:02.6746498Z file:/home/eve/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-09-07T09:40:02.6768920Z Entering 'third_party/FP16' 2025-09-07T09:40:02.8597550Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-09-07T09:40:02.8621263Z Entering 'third_party/FXdiv' 2025-09-07T09:40:02.8791545Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-09-07T09:40:02.8824415Z Entering 'third_party/NNPACK' 2025-09-07T09:40:02.8874302Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-09-07T09:40:02.8896866Z Entering 'third_party/NVTX' 2025-09-07T09:40:02.8941577Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-09-07T09:40:02.8965633Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T09:40:02.9883951Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-09-07T09:40:02.9907683Z Entering 'third_party/XNNPACK' 2025-09-07T09:40:02.9957648Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-09-07T09:40:02.9994420Z Entering 'third_party/aiter' 2025-09-07T09:40:03.0061589Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-09-07T09:40:03.0087057Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T09:40:03.0146290Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-09-07T09:40:03.0177545Z Entering 'third_party/benchmark' 2025-09-07T09:40:03.0380466Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-09-07T09:40:03.0407493Z Entering 'third_party/composable_kernel' 2025-09-07T09:40:03.0485898Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-09-07T09:40:03.0526200Z Entering 'third_party/cpp-httplib' 2025-09-07T09:40:03.0824147Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-09-07T09:40:03.0847362Z Entering 'third_party/cpuinfo' 2025-09-07T09:40:03.0913588Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-09-07T09:40:03.0938891Z Entering 'third_party/cudnn_frontend' 2025-09-07T09:40:03.1068023Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-09-07T09:40:03.1090860Z Entering 'third_party/cutlass' 2025-09-07T09:40:03.1431355Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-09-07T09:40:03.1462347Z Entering 'third_party/fbgemm' 2025-09-07T09:40:03.1741705Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-09-07T09:40:03.1765887Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T09:40:03.2223548Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-09-07T09:40:03.2244743Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T09:40:03.2609510Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-09-07T09:40:03.2638079Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T09:40:03.3033978Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-09-07T09:40:03.3053180Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T09:40:03.3484308Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-09-07T09:40:03.3512983Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T09:40:03.3884166Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-09-07T09:40:03.3907465Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T09:40:03.4338747Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-09-07T09:40:03.4359425Z Entering 'third_party/fbgemm/external/json' 2025-09-07T09:40:03.4804160Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-09-07T09:40:03.4828518Z Entering 'third_party/flash-attention' 2025-09-07T09:40:03.6231023Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-09-07T09:40:03.6258190Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T09:40:03.7028577Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-09-07T09:40:03.7065844Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T09:40:03.7119343Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-09-07T09:40:03.7149908Z Entering 'third_party/flatbuffers' 2025-09-07T09:40:03.7223354Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-09-07T09:40:03.7254263Z Entering 'third_party/fmt' 2025-09-07T09:40:03.7408895Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-09-07T09:40:03.7430492Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T09:40:03.8169886Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-09-07T09:40:03.8200608Z Entering 'third_party/gloo' 2025-09-07T09:40:03.9518950Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-09-07T09:40:03.9548187Z Entering 'third_party/googletest' 2025-09-07T09:40:03.9901623Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-09-07T09:40:03.9924335Z Entering 'third_party/ideep' 2025-09-07T09:40:04.1529316Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-09-07T09:40:04.1553296Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T09:40:04.1639988Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-09-07T09:40:04.1673405Z Entering 'third_party/ittapi' 2025-09-07T09:40:04.1736875Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-09-07T09:40:04.1760130Z Entering 'third_party/kineto' 2025-09-07T09:40:04.1805114Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-09-07T09:40:04.1825316Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T09:40:04.2036805Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-09-07T09:40:04.2058412Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T09:40:04.2170259Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-09-07T09:40:04.2192370Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T09:40:04.2276206Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-09-07T09:40:04.2299635Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T09:40:04.2435846Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-09-07T09:40:04.2457755Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T09:40:04.2550104Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-09-07T09:40:04.2572999Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T09:40:04.2657509Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-09-07T09:40:04.2684930Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T09:40:04.2967616Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-09-07T09:40:04.2990384Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T09:40:04.3545323Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-09-07T09:40:04.3569776Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T09:40:04.3661570Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-09-07T09:40:04.3685959Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T09:40:04.3772082Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-09-07T09:40:04.3797013Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T09:40:04.4191876Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-09-07T09:40:04.4213106Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T09:40:04.4392930Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-09-07T09:40:04.4418444Z Entering 'third_party/kleidiai' 2025-09-07T09:40:04.4554592Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-09-07T09:40:04.4580082Z Entering 'third_party/mimalloc' 2025-09-07T09:40:04.4630624Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-09-07T09:40:04.4651139Z Entering 'third_party/nlohmann' 2025-09-07T09:40:04.4726438Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-09-07T09:40:04.4751632Z Entering 'third_party/onnx' 2025-09-07T09:40:04.4812113Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-09-07T09:40:04.4847764Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T09:40:04.4906940Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-09-07T09:40:04.4931543Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T09:40:04.4979207Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-09-07T09:40:04.5003687Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T09:40:04.5077130Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-09-07T09:40:04.5101655Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T09:40:04.5145113Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-09-07T09:40:04.5164740Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T09:40:04.5204532Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-09-07T09:40:04.5223885Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T09:40:04.5435980Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-09-07T09:40:04.5462701Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T09:40:04.5527706Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-09-07T09:40:04.5551800Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T09:40:04.5595518Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-09-07T09:40:04.5616461Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T09:40:04.5663647Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-09-07T09:40:04.5683594Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T09:40:04.5735373Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-09-07T09:40:04.5762253Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T09:40:04.5810289Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-09-07T09:40:04.5836886Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T09:40:04.5886762Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-09-07T09:40:04.5930626Z Entering 'third_party/pocketfft' 2025-09-07T09:40:04.5979305Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-09-07T09:40:04.6002177Z Entering 'third_party/protobuf' 2025-09-07T09:40:04.6077308Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-09-07T09:40:04.6103655Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T09:40:04.6151274Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-09-07T09:40:04.6171408Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T09:40:04.6217339Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-09-07T09:40:04.6243617Z Entering 'third_party/psimd' 2025-09-07T09:40:04.6286515Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-09-07T09:40:04.6307611Z Entering 'third_party/pthreadpool' 2025-09-07T09:40:04.6363294Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-09-07T09:40:04.6388035Z Entering 'third_party/pybind11' 2025-09-07T09:40:04.6438965Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-09-07T09:40:04.6459777Z Entering 'third_party/python-peachpy' 2025-09-07T09:40:04.6562699Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-09-07T09:40:04.6585977Z Entering 'third_party/sleef' 2025-09-07T09:40:04.6636077Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-09-07T09:40:04.6661704Z Entering 'third_party/tensorpipe' 2025-09-07T09:40:04.6716821Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-09-07T09:40:04.6738211Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T09:40:04.6874557Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-09-07T09:40:04.6901042Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T09:40:04.6948983Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-09-07T09:40:04.6969832Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T09:40:04.7016705Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-09-07T09:40:04.7039336Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T09:40:04.7095136Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-09-07T09:40:04.7116840Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T09:40:04.7162217Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-09-07T09:40:04.8542548Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-09-07T09:40:04.8838362Z Entering 'android/libs/fbjni' 2025-09-07T09:40:04.8885121Z Entering 'third_party/FP16' 2025-09-07T09:40:04.8940424Z Entering 'third_party/FXdiv' 2025-09-07T09:40:04.8992930Z Entering 'third_party/NNPACK' 2025-09-07T09:40:04.9048023Z Entering 'third_party/NVTX' 2025-09-07T09:40:04.9097273Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T09:40:04.9163605Z Entering 'third_party/XNNPACK' 2025-09-07T09:40:04.9262236Z Entering 'third_party/aiter' 2025-09-07T09:40:04.9327592Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T09:40:04.9379656Z Entering 'third_party/benchmark' 2025-09-07T09:40:04.9427894Z Entering 'third_party/composable_kernel' 2025-09-07T09:40:04.9482891Z Entering 'third_party/cpp-httplib' 2025-09-07T09:40:04.9529738Z Entering 'third_party/cpuinfo' 2025-09-07T09:40:04.9584407Z Entering 'third_party/cudnn_frontend' 2025-09-07T09:40:04.9627648Z Entering 'third_party/cutlass' 2025-09-07T09:40:04.9682194Z Entering 'third_party/fbgemm' 2025-09-07T09:40:04.9729179Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T09:40:04.9778221Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T09:40:04.9833605Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T09:40:04.9890435Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T09:40:04.9949095Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T09:40:04.9995499Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T09:40:05.0067199Z Entering 'third_party/fbgemm/external/json' 2025-09-07T09:40:05.0120045Z Entering 'third_party/flash-attention' 2025-09-07T09:40:05.0177631Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T09:40:05.0559881Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T09:40:05.0616765Z Entering 'third_party/flatbuffers' 2025-09-07T09:40:05.0952613Z Entering 'third_party/fmt' 2025-09-07T09:40:05.1074327Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T09:40:05.1458735Z Entering 'third_party/gloo' 2025-09-07T09:40:05.1503163Z Entering 'third_party/googletest' 2025-09-07T09:40:05.1573961Z Entering 'third_party/ideep' 2025-09-07T09:40:05.1624652Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T09:40:05.1697382Z Entering 'third_party/ittapi' 2025-09-07T09:40:05.1748208Z Entering 'third_party/kineto' 2025-09-07T09:40:05.1798258Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T09:40:05.1871919Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T09:40:05.1911788Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T09:40:05.1958391Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T09:40:05.2004636Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T09:40:05.2042392Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T09:40:05.2091583Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T09:40:05.2150992Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T09:40:05.2199032Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T09:40:05.2260628Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T09:40:05.2310729Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T09:40:05.2375622Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T09:40:05.2437090Z Entering 'third_party/kleidiai' 2025-09-07T09:40:05.2496287Z Entering 'third_party/mimalloc' 2025-09-07T09:40:05.2551634Z Entering 'third_party/nlohmann' 2025-09-07T09:40:05.2600328Z Entering 'third_party/onnx' 2025-09-07T09:40:05.2667970Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T09:40:05.2722272Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T09:40:05.2777225Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T09:40:05.2824659Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T09:40:05.2919142Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T09:40:05.2979926Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T09:40:05.3025616Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T09:40:05.3069740Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T09:40:05.3126498Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T09:40:05.3198745Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T09:40:05.3249117Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T09:40:05.3298821Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T09:40:05.3361698Z Entering 'third_party/pocketfft' 2025-09-07T09:40:05.3407802Z Entering 'third_party/protobuf' 2025-09-07T09:40:05.3453306Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T09:40:05.3491030Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T09:40:05.3568726Z Entering 'third_party/psimd' 2025-09-07T09:40:05.3845194Z Entering 'third_party/pthreadpool' 2025-09-07T09:40:05.4236049Z Entering 'third_party/pybind11' 2025-09-07T09:40:05.4482940Z Entering 'third_party/python-peachpy' 2025-09-07T09:40:05.4938664Z Entering 'third_party/sleef' 2025-09-07T09:40:05.5063702Z Entering 'third_party/tensorpipe' 2025-09-07T09:40:05.5368186Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T09:40:05.5538196Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T09:40:05.5834802Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T09:40:05.6264211Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T09:40:05.6446541Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T09:40:05.6515683Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-09-07T09:40:05.6776016Z Entering 'android/libs/fbjni' 2025-09-07T09:40:05.6814695Z Entering 'third_party/FP16' 2025-09-07T09:40:05.6881909Z Entering 'third_party/FXdiv' 2025-09-07T09:40:05.6921130Z Entering 'third_party/NNPACK' 2025-09-07T09:40:05.6961207Z Entering 'third_party/NVTX' 2025-09-07T09:40:05.7003848Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T09:40:05.7050087Z Entering 'third_party/XNNPACK' 2025-09-07T09:40:05.7105363Z Entering 'third_party/aiter' 2025-09-07T09:40:05.7156304Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T09:40:05.7219348Z Entering 'third_party/benchmark' 2025-09-07T09:40:05.7263290Z Entering 'third_party/composable_kernel' 2025-09-07T09:40:05.7315464Z Entering 'third_party/cpp-httplib' 2025-09-07T09:40:05.7364547Z Entering 'third_party/cpuinfo' 2025-09-07T09:40:05.7405977Z Entering 'third_party/cudnn_frontend' 2025-09-07T09:40:05.7446510Z Entering 'third_party/cutlass' 2025-09-07T09:40:05.7502901Z Entering 'third_party/fbgemm' 2025-09-07T09:40:05.7562872Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T09:40:05.7608043Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T09:40:05.7654793Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T09:40:05.7696599Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T09:40:05.7744093Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T09:40:05.7787406Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T09:40:05.7829782Z Entering 'third_party/fbgemm/external/json' 2025-09-07T09:40:05.7916446Z Entering 'third_party/flash-attention' 2025-09-07T09:40:05.7959763Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T09:40:05.8009217Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T09:40:05.8062829Z Entering 'third_party/flatbuffers' 2025-09-07T09:40:05.8108738Z Entering 'third_party/fmt' 2025-09-07T09:40:05.8150413Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T09:40:05.8207274Z Entering 'third_party/gloo' 2025-09-07T09:40:05.8248044Z Entering 'third_party/googletest' 2025-09-07T09:40:05.8290963Z Entering 'third_party/ideep' 2025-09-07T09:40:05.8336566Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T09:40:05.8388112Z Entering 'third_party/ittapi' 2025-09-07T09:40:05.8480624Z Entering 'third_party/kineto' 2025-09-07T09:40:05.8527911Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T09:40:05.8565600Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T09:40:05.8606447Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T09:40:05.8646784Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T09:40:05.8688526Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T09:40:05.8730110Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T09:40:05.8776419Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T09:40:05.8820142Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T09:40:05.8876742Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T09:40:05.8920309Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T09:40:05.8974774Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T09:40:05.9015867Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T09:40:05.9060424Z Entering 'third_party/kleidiai' 2025-09-07T09:40:05.9111985Z Entering 'third_party/mimalloc' 2025-09-07T09:40:05.9152396Z Entering 'third_party/nlohmann' 2025-09-07T09:40:05.9199517Z Entering 'third_party/onnx' 2025-09-07T09:40:05.9265633Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T09:40:05.9317898Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T09:40:05.9360311Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T09:40:05.9399673Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T09:40:05.9441951Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T09:40:05.9498469Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T09:40:05.9543214Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T09:40:05.9589240Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T09:40:05.9661337Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T09:40:05.9701317Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T09:40:05.9747575Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T09:40:05.9790386Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T09:40:05.9855670Z Entering 'third_party/pocketfft' 2025-09-07T09:40:05.9907128Z Entering 'third_party/protobuf' 2025-09-07T09:40:05.9968053Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T09:40:06.0068382Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T09:40:06.0118102Z Entering 'third_party/psimd' 2025-09-07T09:40:06.0165818Z Entering 'third_party/pthreadpool' 2025-09-07T09:40:06.0263520Z Entering 'third_party/pybind11' 2025-09-07T09:40:06.0333785Z Entering 'third_party/python-peachpy' 2025-09-07T09:40:06.0385954Z Entering 'third_party/sleef' 2025-09-07T09:40:06.0434311Z Entering 'third_party/tensorpipe' 2025-09-07T09:40:06.0485107Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T09:40:06.0570080Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T09:40:06.0618673Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T09:40:06.0725654Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T09:40:06.0794293Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T09:40:06.0861643Z ##[endgroup] 2025-09-07T09:40:06.0904742Z [command]/usr/bin/git log -1 --format=%H 2025-09-07T09:40:06.0932866Z 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T09:40:06.1114618Z ##[group]Run actions/checkout@v4 2025-09-07T09:40:06.1114841Z with: 2025-09-07T09:40:06.1115440Z ref: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T09:40:06.1115696Z fetch-depth: 0 2025-09-07T09:40:06.1115875Z submodules: recursive 2025-09-07T09:40:06.1116078Z show-progress: false 2025-09-07T09:40:06.1116284Z repository: pytorch/pytorch 2025-09-07T09:40:06.1116596Z token: *** 2025-09-07T09:40:06.1116772Z ssh-strict: true 2025-09-07T09:40:06.1116958Z ssh-user: git 2025-09-07T09:40:06.1117139Z persist-credentials: true 2025-09-07T09:40:06.1117334Z clean: true 2025-09-07T09:40:06.1117519Z sparse-checkout-cone-mode: true 2025-09-07T09:40:06.1117737Z fetch-tags: false 2025-09-07T09:40:06.1117911Z lfs: false 2025-09-07T09:40:06.1118071Z set-safe-directory: true 2025-09-07T09:40:06.1118263Z env: 2025-09-07T09:40:06.1118422Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:40:06.1118616Z ##[endgroup] 2025-09-07T09:40:06.2248166Z Syncing repository: pytorch/pytorch 2025-09-07T09:40:06.2249695Z ##[group]Getting Git version info 2025-09-07T09:40:06.2250278Z Working directory is '/home/eve/_work/pytorch/pytorch' 2025-09-07T09:40:06.2283348Z [command]/usr/bin/git version 2025-09-07T09:40:06.2322506Z git version 2.50.1 2025-09-07T09:40:06.2346376Z ##[endgroup] 2025-09-07T09:40:06.2358449Z Temporarily overriding HOME='/home/eve/_work/_temp/132f6341-961f-4a95-ad45-e814cfa315c7' before making global git config changes 2025-09-07T09:40:06.2359516Z Adding repository directory to the temporary git global config as a safe directory 2025-09-07T09:40:06.2363317Z [command]/usr/bin/git config --global --add safe.directory /home/eve/_work/pytorch/pytorch 2025-09-07T09:40:06.2412278Z [command]/usr/bin/git config --local --get remote.origin.url 2025-09-07T09:40:06.2437449Z https://github.com/pytorch/pytorch 2025-09-07T09:40:06.2452373Z ##[group]Removing previously created refs, to avoid conflicts 2025-09-07T09:40:06.2456054Z [command]/usr/bin/git rev-parse --symbolic-full-name --verify --quiet HEAD 2025-09-07T09:40:06.2481876Z HEAD 2025-09-07T09:40:06.2519975Z ##[endgroup] 2025-09-07T09:40:06.2523338Z [command]/usr/bin/git submodule status 2025-09-07T09:40:06.2848107Z 7e1e1fe3858c63c251c637ae41a20de425dde96f android/libs/fbjni (v0.1.0-12-g7e1e1fe) 2025-09-07T09:40:06.3007247Z 4dfe081cf6bcd15db339cf2680b9281b8451eeb3 third_party/FP16 (4dfe081) 2025-09-07T09:40:06.3094335Z b408327ac2a15ec3e43352421954f5b1967701d1 third_party/FXdiv (b408327) 2025-09-07T09:40:06.3192635Z c07e3a0400713d546e0dea2d5466dd22ea389c73 third_party/NNPACK (c07e3a0) 2025-09-07T09:40:06.3247027Z 2942f167cc30c5e3a44a2aecd5b0d9c07ff61a07 third_party/NVTX (v3.1.0-263-g2942f16) 2025-09-07T09:40:06.3328441Z 1d8f600fd424278486eade7ed3e877c99f0846b1 third_party/VulkanMemoryAllocator (v2.1.0-982-g1d8f600) 2025-09-07T09:40:06.3809376Z 51a0103656eff6fc9bfd39a4597923c4b542c883 third_party/XNNPACK (remotes/origin/ds/ndk-1243-g51a0103656) 2025-09-07T09:40:06.3853082Z 01aae101b9e5e94d6c16a9514c9fb8df99c93150 third_party/aiter (v0.1.1-92-g01aae101) 2025-09-07T09:40:06.3880920Z 299e5928955cc62af9968370293b916f5130916f third_party/benchmark (v1.9.3) 2025-09-07T09:40:06.3969576Z 7fe50dc3da2069d6645d9deb8c017a876472a977 third_party/composable_kernel (rocm-6.4.3-459-g7fe50dc3d) 2025-09-07T09:40:06.4104263Z 89c932f313c6437c38f2982869beacc89c2f2246 third_party/cpp-httplib (v0.26.0) 2025-09-07T09:40:06.4240760Z 5e3d2445e6a84d9599bee2bf78edbb4d80865e1d third_party/cpuinfo (5e3d244) 2025-09-07T09:40:06.4288559Z f937055efc6d414d11f4c6577e3977fe74f35fb6 third_party/cudnn_frontend (v0.5-52-gf937055) 2025-09-07T09:40:06.4386121Z e51efbfe18fe4f4cbb66ab814c55bf4aa0185491 third_party/cutlass (v4.1.0) 2025-09-07T09:40:06.4446049Z 4b39c551efe15e6bbade20565b0ceb2d8ce3352d third_party/fbgemm (v1.3.0-rc1-342-g4b39c551) 2025-09-07T09:40:06.4536324Z 979702c87a8713a8e0a5e9fee122b90d2ef13be5 third_party/flash-attention (v2.7.4) 2025-09-07T09:40:06.4564812Z a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757 third_party/flatbuffers (v24.12.23) 2025-09-07T09:40:06.4966009Z 40626af88bd7df9a5fb80be7b25ac85b122d6c21 third_party/fmt (11.2.0) 2025-09-07T09:40:06.5098311Z 3fb5c176c17c765a3492cd2f0321b0dab712f350 third_party/gemmlowp/gemmlowp (remotes/origin/revert-87-master-135-g3fb5c17) 2025-09-07T09:40:06.5247354Z c7b7b022c124d9643957d9bd55f57ac59fce8fa2 third_party/gloo (remotes/origin/gh/c-p-i-o/1/base-33-gc7b7b02) 2025-09-07T09:40:06.5480247Z 52eb8108c5bdec04579160ae17225d66034bd723 third_party/googletest (release-1.8.0-3544-g52eb8108) 2025-09-07T09:40:06.5566224Z 719d8e6cd7f7a0e01b155657526d693acf97c2b3 third_party/ideep (pytorch-rls-v3.7.1) 2025-09-07T09:40:06.5631815Z dec1d23ca65ab069d225dfe40dea14f455170959 third_party/ittapi (v3.25.5) 2025-09-07T09:40:06.5891460Z 5e7501833f1021ce6f618572d3baf657b6319658 third_party/kineto (remotes/origin/sraikund/test-98-g5e75018) 2025-09-07T09:40:06.5924487Z cca02c2f69dd18e1f12647c1c0bdc8cf90e680c7 third_party/kleidiai (v1.8.0) 2025-09-07T09:40:06.5953562Z fbd8b99c2b828428947d70fdc046bb55609be93e third_party/mimalloc (v2.2.4) 2025-09-07T09:40:06.5984740Z 55f93686c01528224f448c19128836e7df245f72 third_party/nlohmann (v3.12.0) 2025-09-07T09:40:06.6340137Z e709452ef2bbc1d113faf678c24e6d3467696e83 third_party/onnx (v1.18.0) 2025-09-07T09:40:06.6373362Z a799f4aed9c94b765dcdaabaeab7d5e7e2310878 third_party/opentelemetry-cpp (v1.14.2) 2025-09-07T09:40:06.6405579Z 0fa0ef591e38c2758e3184c6c23e497b9f732ffa third_party/pocketfft (release_for_eigen-40-g0fa0ef5) 2025-09-07T09:40:06.6746766Z d1eca4e4b421cd2997495c4b4e65cea6be4e9b8a third_party/protobuf (v3.7.0-rc.2-1279-gd1eca4e4b) 2025-09-07T09:40:06.6838967Z 072586a71b55b7f8c584153d223e95687148a900 third_party/psimd (heads/master) 2025-09-07T09:40:06.6903328Z 4fe0e1e183925bf8cfa6aae24237e724a96479b8 third_party/pthreadpool (0.1-144-g4fe0e1e) 2025-09-07T09:40:06.6930635Z f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8 third_party/pybind11 (v3.0.1) 2025-09-07T09:40:06.7017684Z f45429b087dd7d5bc78bb40dc7cf06425c252d67 third_party/python-peachpy (remotes/origin/pre-generated) 2025-09-07T09:40:06.7117120Z 5a1d179df9cf652951b59010a2d2075372d67f68 third_party/sleef (3.8) 2025-09-07T09:40:06.7213056Z af0118d13e52f5a08841464a768e01a0bf3e3075 third_party/tensorpipe (heads/main) 2025-09-07T09:40:06.7231881Z ##[group]Cleaning the repository 2025-09-07T09:40:06.7236357Z [command]/usr/bin/git clean -ffdx 2025-09-07T09:40:06.7606044Z [command]/usr/bin/git reset --hard HEAD 2025-09-07T09:40:07.3901957Z HEAD is now at 93fb23d6fae Build vLLM nightly wheels (#162000) 2025-09-07T09:40:07.3938312Z ##[endgroup] 2025-09-07T09:40:07.3940475Z ##[group]Disabling automatic garbage collection 2025-09-07T09:40:07.3948140Z [command]/usr/bin/git config --local gc.auto 0 2025-09-07T09:40:07.3995142Z ##[endgroup] 2025-09-07T09:40:07.3995506Z ##[group]Setting up auth 2025-09-07T09:40:07.4002160Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-09-07T09:40:07.4035278Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-09-07T09:40:07.4320710Z Entering 'android/libs/fbjni' 2025-09-07T09:40:07.4370079Z Entering 'third_party/FP16' 2025-09-07T09:40:07.4422251Z Entering 'third_party/FXdiv' 2025-09-07T09:40:07.4471092Z Entering 'third_party/NNPACK' 2025-09-07T09:40:07.4530807Z Entering 'third_party/NVTX' 2025-09-07T09:40:07.4582041Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T09:40:07.4631508Z Entering 'third_party/XNNPACK' 2025-09-07T09:40:07.4696738Z Entering 'third_party/aiter' 2025-09-07T09:40:07.4743939Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T09:40:07.4803128Z Entering 'third_party/benchmark' 2025-09-07T09:40:07.4855411Z Entering 'third_party/composable_kernel' 2025-09-07T09:40:07.4912574Z Entering 'third_party/cpp-httplib' 2025-09-07T09:40:07.4960600Z Entering 'third_party/cpuinfo' 2025-09-07T09:40:07.5009734Z Entering 'third_party/cudnn_frontend' 2025-09-07T09:40:07.5058839Z Entering 'third_party/cutlass' 2025-09-07T09:40:07.5116879Z Entering 'third_party/fbgemm' 2025-09-07T09:40:07.5169155Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T09:40:07.5217158Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T09:40:07.5277315Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T09:40:07.5326568Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T09:40:07.5384725Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T09:40:07.5434214Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T09:40:07.5484745Z Entering 'third_party/fbgemm/external/json' 2025-09-07T09:40:07.5538351Z Entering 'third_party/flash-attention' 2025-09-07T09:40:07.5590375Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T09:40:07.5648190Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T09:40:07.5714765Z Entering 'third_party/flatbuffers' 2025-09-07T09:40:07.5766258Z Entering 'third_party/fmt' 2025-09-07T09:40:07.5814781Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T09:40:07.5866449Z Entering 'third_party/gloo' 2025-09-07T09:40:07.5914364Z Entering 'third_party/googletest' 2025-09-07T09:40:07.5965284Z Entering 'third_party/ideep' 2025-09-07T09:40:07.6013710Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T09:40:07.6072309Z Entering 'third_party/ittapi' 2025-09-07T09:40:07.6120993Z Entering 'third_party/kineto' 2025-09-07T09:40:07.6174174Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T09:40:07.6223142Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T09:40:07.6273736Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T09:40:07.6327125Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T09:40:07.6379168Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T09:40:07.6424037Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T09:40:07.6476950Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T09:40:07.6533146Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T09:40:07.6589183Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T09:40:07.6640981Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T09:40:07.6692873Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T09:40:07.6741014Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T09:40:07.6792996Z Entering 'third_party/kleidiai' 2025-09-07T09:40:07.6841329Z Entering 'third_party/mimalloc' 2025-09-07T09:40:07.6894041Z Entering 'third_party/nlohmann' 2025-09-07T09:40:07.6942226Z Entering 'third_party/onnx' 2025-09-07T09:40:07.7008008Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T09:40:07.7060988Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T09:40:07.7112299Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T09:40:07.7160580Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T09:40:07.7210749Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T09:40:07.7266478Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T09:40:07.7319763Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T09:40:07.7375776Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T09:40:07.7429238Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T09:40:07.7480920Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T09:40:07.7535815Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T09:40:07.7591090Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T09:40:07.7666395Z Entering 'third_party/pocketfft' 2025-09-07T09:40:07.7722410Z Entering 'third_party/protobuf' 2025-09-07T09:40:07.7778943Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T09:40:07.7828158Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T09:40:07.7880449Z Entering 'third_party/psimd' 2025-09-07T09:40:07.7931610Z Entering 'third_party/pthreadpool' 2025-09-07T09:40:07.7983445Z Entering 'third_party/pybind11' 2025-09-07T09:40:07.8037428Z Entering 'third_party/python-peachpy' 2025-09-07T09:40:07.8089008Z Entering 'third_party/sleef' 2025-09-07T09:40:07.8139819Z Entering 'third_party/tensorpipe' 2025-09-07T09:40:07.8191264Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T09:40:07.8239604Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T09:40:07.8285885Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T09:40:07.8336613Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T09:40:07.8381978Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T09:40:07.8459238Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-09-07T09:40:07.8487543Z http.https://github.com/.extraheader 2025-09-07T09:40:07.8496913Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-09-07T09:40:07.8551826Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-09-07T09:40:07.8839259Z Entering 'android/libs/fbjni' 2025-09-07T09:40:07.8868304Z http.https://github.com/.extraheader 2025-09-07T09:40:07.9642655Z Entering 'third_party/FP16' 2025-09-07T09:40:07.9671124Z http.https://github.com/.extraheader 2025-09-07T09:40:07.9707751Z Entering 'third_party/FXdiv' 2025-09-07T09:40:07.9738524Z http.https://github.com/.extraheader 2025-09-07T09:40:08.0347565Z Entering 'third_party/NNPACK' 2025-09-07T09:40:08.0376404Z http.https://github.com/.extraheader 2025-09-07T09:40:08.0520833Z Entering 'third_party/NVTX' 2025-09-07T09:40:08.0551687Z http.https://github.com/.extraheader 2025-09-07T09:40:08.0603442Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T09:40:08.0632811Z http.https://github.com/.extraheader 2025-09-07T09:40:08.0749794Z Entering 'third_party/XNNPACK' 2025-09-07T09:40:08.0778671Z http.https://github.com/.extraheader 2025-09-07T09:40:08.1013081Z Entering 'third_party/aiter' 2025-09-07T09:40:08.1046894Z http.https://github.com/.extraheader 2025-09-07T09:40:08.1089851Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T09:40:08.1119591Z http.https://github.com/.extraheader 2025-09-07T09:40:08.1174066Z Entering 'third_party/benchmark' 2025-09-07T09:40:08.1204048Z http.https://github.com/.extraheader 2025-09-07T09:40:08.1241573Z Entering 'third_party/composable_kernel' 2025-09-07T09:40:08.1269685Z http.https://github.com/.extraheader 2025-09-07T09:40:08.1318751Z Entering 'third_party/cpp-httplib' 2025-09-07T09:40:08.1345951Z http.https://github.com/.extraheader 2025-09-07T09:40:08.1383512Z Entering 'third_party/cpuinfo' 2025-09-07T09:40:08.1413944Z http.https://github.com/.extraheader 2025-09-07T09:40:08.1454830Z Entering 'third_party/cudnn_frontend' 2025-09-07T09:40:08.1481991Z http.https://github.com/.extraheader 2025-09-07T09:40:08.1520576Z Entering 'third_party/cutlass' 2025-09-07T09:40:08.1547972Z http.https://github.com/.extraheader 2025-09-07T09:40:08.1619411Z Entering 'third_party/fbgemm' 2025-09-07T09:40:08.1647611Z http.https://github.com/.extraheader 2025-09-07T09:40:08.1687600Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T09:40:08.1715311Z http.https://github.com/.extraheader 2025-09-07T09:40:08.1792036Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T09:40:08.1817561Z http.https://github.com/.extraheader 2025-09-07T09:40:08.1877976Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T09:40:08.1908888Z http.https://github.com/.extraheader 2025-09-07T09:40:08.1946444Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T09:40:08.1977652Z http.https://github.com/.extraheader 2025-09-07T09:40:08.2055373Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T09:40:08.2082453Z http.https://github.com/.extraheader 2025-09-07T09:40:08.2122568Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T09:40:08.2152229Z http.https://github.com/.extraheader 2025-09-07T09:40:08.2192739Z Entering 'third_party/fbgemm/external/json' 2025-09-07T09:40:08.2219769Z http.https://github.com/.extraheader 2025-09-07T09:40:08.2267869Z Entering 'third_party/flash-attention' 2025-09-07T09:40:08.2297964Z http.https://github.com/.extraheader 2025-09-07T09:40:08.2345408Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T09:40:08.2376267Z http.https://github.com/.extraheader 2025-09-07T09:40:08.2451809Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T09:40:08.2478229Z http.https://github.com/.extraheader 2025-09-07T09:40:08.2526626Z Entering 'third_party/flatbuffers' 2025-09-07T09:40:08.2552817Z http.https://github.com/.extraheader 2025-09-07T09:40:08.2889612Z Entering 'third_party/fmt' 2025-09-07T09:40:08.2916146Z http.https://github.com/.extraheader 2025-09-07T09:40:08.2954835Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T09:40:08.2984828Z http.https://github.com/.extraheader 2025-09-07T09:40:08.3036975Z Entering 'third_party/gloo' 2025-09-07T09:40:08.3066260Z http.https://github.com/.extraheader 2025-09-07T09:40:08.3161252Z Entering 'third_party/googletest' 2025-09-07T09:40:08.3190697Z http.https://github.com/.extraheader 2025-09-07T09:40:08.3227563Z Entering 'third_party/ideep' 2025-09-07T09:40:08.3252735Z http.https://github.com/.extraheader 2025-09-07T09:40:08.3504174Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T09:40:08.3531453Z http.https://github.com/.extraheader 2025-09-07T09:40:08.3578448Z Entering 'third_party/ittapi' 2025-09-07T09:40:08.3607506Z http.https://github.com/.extraheader 2025-09-07T09:40:08.3644242Z Entering 'third_party/kineto' 2025-09-07T09:40:08.3673648Z http.https://github.com/.extraheader 2025-09-07T09:40:08.3712171Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T09:40:08.3752029Z http.https://github.com/.extraheader 2025-09-07T09:40:08.3787001Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T09:40:08.3826763Z http.https://github.com/.extraheader 2025-09-07T09:40:08.4091688Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T09:40:08.4117448Z http.https://github.com/.extraheader 2025-09-07T09:40:08.4157187Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T09:40:08.4185984Z http.https://github.com/.extraheader 2025-09-07T09:40:08.4237627Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T09:40:08.4264111Z http.https://github.com/.extraheader 2025-09-07T09:40:08.4315305Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T09:40:08.4345272Z http.https://github.com/.extraheader 2025-09-07T09:40:08.4466470Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T09:40:08.4497204Z http.https://github.com/.extraheader 2025-09-07T09:40:08.4546898Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T09:40:08.4571658Z http.https://github.com/.extraheader 2025-09-07T09:40:08.4642859Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T09:40:08.4671632Z http.https://github.com/.extraheader 2025-09-07T09:40:08.4719769Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T09:40:08.4745672Z http.https://github.com/.extraheader 2025-09-07T09:40:08.4864912Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T09:40:08.4890136Z http.https://github.com/.extraheader 2025-09-07T09:40:08.5289420Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T09:40:08.5317345Z http.https://github.com/.extraheader 2025-09-07T09:40:08.5529792Z Entering 'third_party/kleidiai' 2025-09-07T09:40:08.5559712Z http.https://github.com/.extraheader 2025-09-07T09:40:08.5622672Z Entering 'third_party/mimalloc' 2025-09-07T09:40:08.5648770Z http.https://github.com/.extraheader 2025-09-07T09:40:08.5912084Z Entering 'third_party/nlohmann' 2025-09-07T09:40:08.5940364Z http.https://github.com/.extraheader 2025-09-07T09:40:08.5977117Z Entering 'third_party/onnx' 2025-09-07T09:40:08.6004920Z http.https://github.com/.extraheader 2025-09-07T09:40:08.6057497Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T09:40:08.6082643Z http.https://github.com/.extraheader 2025-09-07T09:40:08.7278689Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T09:40:08.7307961Z http.https://github.com/.extraheader 2025-09-07T09:40:08.8439419Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T09:40:08.8469398Z http.https://github.com/.extraheader 2025-09-07T09:40:08.8866863Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T09:40:08.8897140Z http.https://github.com/.extraheader 2025-09-07T09:40:08.9348672Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T09:40:08.9378935Z http.https://github.com/.extraheader 2025-09-07T09:40:08.9794914Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T09:40:08.9821995Z http.https://github.com/.extraheader 2025-09-07T09:40:09.0897980Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T09:40:09.0928748Z http.https://github.com/.extraheader 2025-09-07T09:40:09.1273247Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T09:40:09.1300770Z http.https://github.com/.extraheader 2025-09-07T09:40:09.1335358Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T09:40:09.1367869Z http.https://github.com/.extraheader 2025-09-07T09:40:09.1410463Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T09:40:09.1437412Z http.https://github.com/.extraheader 2025-09-07T09:40:09.1504605Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T09:40:09.1530368Z http.https://github.com/.extraheader 2025-09-07T09:40:09.1573845Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T09:40:09.1601419Z http.https://github.com/.extraheader 2025-09-07T09:40:09.1663137Z Entering 'third_party/pocketfft' 2025-09-07T09:40:09.1691581Z http.https://github.com/.extraheader 2025-09-07T09:40:09.1730130Z Entering 'third_party/protobuf' 2025-09-07T09:40:09.1759550Z http.https://github.com/.extraheader 2025-09-07T09:40:09.1816617Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T09:40:09.1844372Z http.https://github.com/.extraheader 2025-09-07T09:40:09.1884314Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T09:40:09.1912011Z http.https://github.com/.extraheader 2025-09-07T09:40:09.2133426Z Entering 'third_party/psimd' 2025-09-07T09:40:09.2162790Z http.https://github.com/.extraheader 2025-09-07T09:40:09.2212183Z Entering 'third_party/pthreadpool' 2025-09-07T09:40:09.2239482Z http.https://github.com/.extraheader 2025-09-07T09:40:09.2302256Z Entering 'third_party/pybind11' 2025-09-07T09:40:09.2329377Z http.https://github.com/.extraheader 2025-09-07T09:40:09.2437464Z Entering 'third_party/python-peachpy' 2025-09-07T09:40:09.2464698Z http.https://github.com/.extraheader 2025-09-07T09:40:09.2557325Z Entering 'third_party/sleef' 2025-09-07T09:40:09.2588296Z http.https://github.com/.extraheader 2025-09-07T09:40:09.2635324Z Entering 'third_party/tensorpipe' 2025-09-07T09:40:09.2662167Z http.https://github.com/.extraheader 2025-09-07T09:40:09.2898066Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T09:40:09.2924381Z http.https://github.com/.extraheader 2025-09-07T09:40:09.3183188Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T09:40:09.3207305Z http.https://github.com/.extraheader 2025-09-07T09:40:09.3371104Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T09:40:09.3398790Z http.https://github.com/.extraheader 2025-09-07T09:40:09.3820693Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T09:40:09.3848211Z http.https://github.com/.extraheader 2025-09-07T09:40:09.4211899Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T09:40:09.4239730Z http.https://github.com/.extraheader 2025-09-07T09:40:09.4619996Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-09-07T09:40:09.5359958Z ##[endgroup] 2025-09-07T09:40:09.5360375Z ##[group]Fetching the repository 2025-09-07T09:40:09.5372664Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-09-07T09:40:10.1358152Z [command]/usr/bin/git rev-parse --verify --quiet 93fb23d6fae7c4e82c4239a1033e522088742634^{object} 2025-09-07T09:40:10.1391227Z 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T09:40:10.1396379Z ##[endgroup] 2025-09-07T09:40:10.1396776Z ##[group]Determining the checkout info 2025-09-07T09:40:10.1397648Z ##[endgroup] 2025-09-07T09:40:10.1401409Z [command]/usr/bin/git sparse-checkout disable 2025-09-07T09:40:10.4327660Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-09-07T09:40:10.4591780Z ##[group]Checking out the ref 2025-09-07T09:40:10.4602004Z [command]/usr/bin/git checkout --progress --force 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T09:40:10.6634450Z HEAD is now at 93fb23d6fae Build vLLM nightly wheels (#162000) 2025-09-07T09:40:10.6644008Z ##[endgroup] 2025-09-07T09:40:10.6644474Z ##[group]Setting up auth for fetching submodules 2025-09-07T09:40:10.6659425Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-09-07T09:40:10.8128315Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-09-07T09:40:10.9562053Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-09-07T09:40:10.9933468Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-09-07T09:40:11.0683473Z ##[endgroup] 2025-09-07T09:40:11.0686514Z ##[group]Fetching submodules 2025-09-07T09:40:11.0688030Z [command]/usr/bin/git submodule sync --recursive 2025-09-07T09:40:11.1321404Z Synchronizing submodule url for 'android/libs/fbjni' 2025-09-07T09:40:11.2490715Z Synchronizing submodule url for 'third_party/FP16' 2025-09-07T09:40:11.2536482Z Synchronizing submodule url for 'third_party/FXdiv' 2025-09-07T09:40:11.2716617Z Synchronizing submodule url for 'third_party/NNPACK' 2025-09-07T09:40:11.3107432Z Synchronizing submodule url for 'third_party/NVTX' 2025-09-07T09:40:11.3672799Z Synchronizing submodule url for 'third_party/VulkanMemoryAllocator' 2025-09-07T09:40:11.3698986Z Synchronizing submodule url for 'third_party/XNNPACK' 2025-09-07T09:40:11.3759699Z Synchronizing submodule url for 'third_party/aiter' 2025-09-07T09:40:11.4209171Z Synchronizing submodule url for 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T09:40:11.4251746Z Synchronizing submodule url for 'third_party/benchmark' 2025-09-07T09:40:11.4393602Z Synchronizing submodule url for 'third_party/composable_kernel' 2025-09-07T09:40:11.4578590Z Synchronizing submodule url for 'third_party/cpp-httplib' 2025-09-07T09:40:11.4617523Z Synchronizing submodule url for 'third_party/cpuinfo' 2025-09-07T09:40:11.4789925Z Synchronizing submodule url for 'third_party/cudnn_frontend' 2025-09-07T09:40:11.4893321Z Synchronizing submodule url for 'third_party/cutlass' 2025-09-07T09:40:11.4944641Z Synchronizing submodule url for 'third_party/fbgemm' 2025-09-07T09:40:11.5200726Z Synchronizing submodule url for 'third_party/fbgemm/external/asmjit' 2025-09-07T09:40:11.5242274Z Synchronizing submodule url for 'third_party/fbgemm/external/composable_kernel' 2025-09-07T09:40:11.5334387Z Synchronizing submodule url for 'third_party/fbgemm/external/cpuinfo' 2025-09-07T09:40:11.5507468Z Synchronizing submodule url for 'third_party/fbgemm/external/cutlass' 2025-09-07T09:40:11.5546695Z Synchronizing submodule url for 'third_party/fbgemm/external/googletest' 2025-09-07T09:40:11.5810864Z Synchronizing submodule url for 'third_party/fbgemm/external/hipify_torch' 2025-09-07T09:40:11.6004354Z Synchronizing submodule url for 'third_party/fbgemm/external/json' 2025-09-07T09:40:11.6055294Z Synchronizing submodule url for 'third_party/flash-attention' 2025-09-07T09:40:11.6244634Z Synchronizing submodule url for 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T09:40:11.6287526Z Synchronizing submodule url for 'third_party/flash-attention/csrc/cutlass' 2025-09-07T09:40:11.6348789Z Synchronizing submodule url for 'third_party/flatbuffers' 2025-09-07T09:40:11.6624355Z Synchronizing submodule url for 'third_party/fmt' 2025-09-07T09:40:11.6675930Z Synchronizing submodule url for 'third_party/gemmlowp/gemmlowp' 2025-09-07T09:40:11.6959472Z Synchronizing submodule url for 'third_party/gloo' 2025-09-07T09:40:11.6999121Z Synchronizing submodule url for 'third_party/googletest' 2025-09-07T09:40:11.7230514Z Synchronizing submodule url for 'third_party/ideep' 2025-09-07T09:40:11.7471234Z Synchronizing submodule url for 'third_party/ideep/mkl-dnn' 2025-09-07T09:40:11.7518794Z Synchronizing submodule url for 'third_party/ittapi' 2025-09-07T09:40:11.7717452Z Synchronizing submodule url for 'third_party/kineto' 2025-09-07T09:40:11.7959891Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T09:40:11.8201910Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T09:40:11.8244527Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T09:40:11.8473639Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T09:40:11.8516514Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T09:40:11.8698057Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T09:40:11.8743329Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T09:40:11.8896811Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T09:40:11.8929320Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T09:40:11.9086850Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T09:40:11.9259956Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T09:40:11.9358145Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T09:40:11.9433077Z Synchronizing submodule url for 'third_party/kleidiai' 2025-09-07T09:40:11.9683334Z Synchronizing submodule url for 'third_party/mimalloc' 2025-09-07T09:40:11.9713483Z Synchronizing submodule url for 'third_party/nlohmann' 2025-09-07T09:40:11.9843674Z Synchronizing submodule url for 'third_party/onnx' 2025-09-07T09:40:11.9998211Z Synchronizing submodule url for 'third_party/onnx/third_party/pybind11' 2025-09-07T09:40:12.0042979Z Synchronizing submodule url for 'third_party/opentelemetry-cpp' 2025-09-07T09:40:12.0166223Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T09:40:12.0370832Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T09:40:12.0425587Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T09:40:12.0643259Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T09:40:12.0687637Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T09:40:12.0809350Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T09:40:12.0991712Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T09:40:12.1152345Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T09:40:12.1188606Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T09:40:12.1225707Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T09:40:12.1285147Z Synchronizing submodule url for 'third_party/pocketfft' 2025-09-07T09:40:12.1510975Z Synchronizing submodule url for 'third_party/protobuf' 2025-09-07T09:40:12.1725815Z Synchronizing submodule url for 'third_party/protobuf/third_party/benchmark' 2025-09-07T09:40:12.2238833Z Synchronizing submodule url for 'third_party/protobuf/third_party/googletest' 2025-09-07T09:40:12.2589154Z Synchronizing submodule url for 'third_party/psimd' 2025-09-07T09:40:12.3415723Z Synchronizing submodule url for 'third_party/pthreadpool' 2025-09-07T09:40:12.4693247Z Synchronizing submodule url for 'third_party/pybind11' 2025-09-07T09:40:12.5177051Z Synchronizing submodule url for 'third_party/python-peachpy' 2025-09-07T09:40:12.5543084Z Synchronizing submodule url for 'third_party/sleef' 2025-09-07T09:40:12.6899502Z Synchronizing submodule url for 'third_party/tensorpipe' 2025-09-07T09:40:12.7765132Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/googletest' 2025-09-07T09:40:12.8697623Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libnop' 2025-09-07T09:40:12.9074318Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libuv' 2025-09-07T09:40:12.9910439Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T09:40:13.1102428Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T09:40:13.1148961Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2025-09-07T09:40:13.2050503Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2025-09-07T09:40:13.2932646Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2025-09-07T09:40:13.3447361Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2025-09-07T09:40:13.4381439Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2025-09-07T09:40:13.7647442Z Submodule path 'third_party/NVTX': checked out '2942f167cc30c5e3a44a2aecd5b0d9c07ff61a07' 2025-09-07T09:40:14.0897951Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1' 2025-09-07T09:40:14.6358543Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883' 2025-09-07T09:40:14.8908207Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150' 2025-09-07T09:40:15.2436871Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf' 2025-09-07T09:40:15.3253452Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f' 2025-09-07T09:40:15.6157932Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-09-07T09:40:15.7313244Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246' 2025-09-07T09:40:16.2558163Z Submodule path 'third_party/cpuinfo': checked out '5e3d2445e6a84d9599bee2bf78edbb4d80865e1d' 2025-09-07T09:40:16.3337265Z Submodule path 'third_party/cudnn_frontend': checked out 'f937055efc6d414d11f4c6577e3977fe74f35fb6' 2025-09-07T09:40:16.5068580Z Submodule path 'third_party/cutlass': checked out 'e51efbfe18fe4f4cbb66ab814c55bf4aa0185491' 2025-09-07T09:40:16.7083599Z Submodule path 'third_party/fbgemm': checked out '4b39c551efe15e6bbade20565b0ceb2d8ce3352d' 2025-09-07T09:40:16.8830361Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea' 2025-09-07T09:40:17.2431284Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out 'b1281b8b08d973a7064f864f47eeb30f3e2596e9' 2025-09-07T09:40:17.3623484Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-09-07T09:40:18.3063581Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '311f3c8e51dc0eb56310cfc6980bf63d0fbd7917' 2025-09-07T09:40:18.3917534Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-09-07T09:40:18.4517566Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691' 2025-09-07T09:40:18.5979217Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-09-07T09:40:18.7131943Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2025-09-07T09:40:19.1867004Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2025-09-07T09:40:19.9812528Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2025-09-07T09:40:20.3687305Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757' 2025-09-07T09:40:20.4523048Z Submodule path 'third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21' 2025-09-07T09:40:20.5668949Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2025-09-07T09:40:20.6410687Z Submodule path 'third_party/gloo': checked out 'c7b7b022c124d9643957d9bd55f57ac59fce8fa2' 2025-09-07T09:40:20.7462778Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-09-07T09:40:20.7981711Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3' 2025-09-07T09:40:21.1004645Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d' 2025-09-07T09:40:21.1994592Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959' 2025-09-07T09:40:21.3405467Z Submodule path 'third_party/kineto': checked out '5e7501833f1021ce6f618572d3baf657b6319658' 2025-09-07T09:40:21.5598995Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out '7d04a0053a845370ae06ce317a22a48e9edcc74e' 2025-09-07T09:40:21.7939450Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9' 2025-09-07T09:40:21.8767504Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400' 2025-09-07T09:40:21.9578351Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2025-09-07T09:40:22.0453508Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067' 2025-09-07T09:40:22.4310121Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4' 2025-09-07T09:40:22.6034885Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446' 2025-09-07T09:40:22.6871081Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '58d77fa8070e8cec2dc1ed015d66b454c8d78850' 2025-09-07T09:40:23.2565765Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5' 2025-09-07T09:40:23.3452604Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150' 2025-09-07T09:40:23.4352527Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '0041a40c1350ba702d475b9c4ad62da77caea164' 2025-09-07T09:40:23.4928777Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347' 2025-09-07T09:40:23.5453770Z Submodule path 'third_party/kleidiai': checked out 'cca02c2f69dd18e1f12647c1c0bdc8cf90e680c7' 2025-09-07T09:40:23.6027743Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e' 2025-09-07T09:40:23.7753209Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72' 2025-09-07T09:40:23.9942353Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83' 2025-09-07T09:40:24.0782847Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4' 2025-09-07T09:40:24.1766757Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878' 2025-09-07T09:40:24.2041606Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2' 2025-09-07T09:40:24.2475604Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1' 2025-09-07T09:40:24.2617562Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa' 2025-09-07T09:40:24.3075968Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d' 2025-09-07T09:40:24.3320367Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce' 2025-09-07T09:40:24.3514564Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5' 2025-09-07T09:40:24.3687320Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d' 2025-09-07T09:40:24.5715725Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4' 2025-09-07T09:40:24.6275306Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-09-07T09:40:24.7762068Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50' 2025-09-07T09:40:24.8695768Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa' 2025-09-07T09:40:25.3364785Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2025-09-07T09:40:25.4584168Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2025-09-07T09:40:25.5177542Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2025-09-07T09:40:25.5700750Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2025-09-07T09:40:25.6258255Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8' 2025-09-07T09:40:25.6945247Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8' 2025-09-07T09:40:25.7446773Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2025-09-07T09:40:25.8014900Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68' 2025-09-07T09:40:25.8777971Z Submodule path 'third_party/tensorpipe': checked out 'af0118d13e52f5a08841464a768e01a0bf3e3075' 2025-09-07T09:40:25.9698343Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2025-09-07T09:40:26.0612165Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2025-09-07T09:40:26.1580691Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b' 2025-09-07T09:40:26.2490011Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2025-09-07T09:40:26.3367752Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2025-09-07T09:40:26.3423555Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2025-09-07T09:40:26.3707402Z Entering 'android/libs/fbjni' 2025-09-07T09:40:26.3872800Z Entering 'third_party/FP16' 2025-09-07T09:40:26.4014013Z Entering 'third_party/FXdiv' 2025-09-07T09:40:26.4187873Z Entering 'third_party/NNPACK' 2025-09-07T09:40:26.4344411Z Entering 'third_party/NVTX' 2025-09-07T09:40:26.4394458Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T09:40:26.4440725Z Entering 'third_party/XNNPACK' 2025-09-07T09:40:26.4506147Z Entering 'third_party/aiter' 2025-09-07T09:40:26.4572618Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T09:40:26.4625432Z Entering 'third_party/benchmark' 2025-09-07T09:40:26.4675636Z Entering 'third_party/composable_kernel' 2025-09-07T09:40:26.4737607Z Entering 'third_party/cpp-httplib' 2025-09-07T09:40:26.5017644Z Entering 'third_party/cpuinfo' 2025-09-07T09:40:26.5512753Z Entering 'third_party/cudnn_frontend' 2025-09-07T09:40:26.5686262Z Entering 'third_party/cutlass' 2025-09-07T09:40:26.6122863Z Entering 'third_party/fbgemm' 2025-09-07T09:40:26.6542062Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T09:40:26.7026954Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T09:40:26.7192786Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T09:40:26.7720907Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T09:40:26.7832034Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T09:40:26.8143466Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T09:40:26.8218585Z Entering 'third_party/fbgemm/external/json' 2025-09-07T09:40:26.8337178Z Entering 'third_party/flash-attention' 2025-09-07T09:40:26.8390340Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T09:40:26.8755448Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T09:40:26.9120895Z Entering 'third_party/flatbuffers' 2025-09-07T09:40:26.9593404Z Entering 'third_party/fmt' 2025-09-07T09:40:27.0029398Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T09:40:27.0297710Z Entering 'third_party/gloo' 2025-09-07T09:40:27.0673167Z Entering 'third_party/googletest' 2025-09-07T09:40:27.1118967Z Entering 'third_party/ideep' 2025-09-07T09:40:27.1600640Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T09:40:27.2042704Z Entering 'third_party/ittapi' 2025-09-07T09:40:27.2395916Z Entering 'third_party/kineto' 2025-09-07T09:40:27.2888817Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T09:40:27.3303603Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T09:40:27.3730548Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T09:40:27.4199709Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T09:40:27.4592718Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T09:40:27.5067737Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T09:40:27.5570246Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T09:40:27.5977410Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T09:40:27.6460935Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T09:40:27.6949112Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T09:40:27.7464681Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T09:40:27.7839459Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T09:40:27.9021896Z Entering 'third_party/kleidiai' 2025-09-07T09:40:27.9517364Z Entering 'third_party/mimalloc' 2025-09-07T09:40:27.9975425Z Entering 'third_party/nlohmann' 2025-09-07T09:40:28.0398440Z Entering 'third_party/onnx' 2025-09-07T09:40:28.0878317Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T09:40:28.1358267Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T09:40:28.1852253Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T09:40:28.2232847Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T09:40:28.2716953Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T09:40:28.3007425Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T09:40:28.3477077Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T09:40:28.5533530Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T09:40:28.5579692Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T09:40:28.6049987Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T09:40:28.6511236Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T09:40:28.6958142Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T09:40:28.7464856Z Entering 'third_party/pocketfft' 2025-09-07T09:40:28.7916970Z Entering 'third_party/protobuf' 2025-09-07T09:40:28.8298263Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T09:40:28.8783068Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T09:40:28.9153994Z Entering 'third_party/psimd' 2025-09-07T09:40:28.9521924Z Entering 'third_party/pthreadpool' 2025-09-07T09:40:28.9929307Z Entering 'third_party/pybind11' 2025-09-07T09:40:29.0318354Z Entering 'third_party/python-peachpy' 2025-09-07T09:40:29.0736326Z Entering 'third_party/sleef' 2025-09-07T09:40:29.1205479Z Entering 'third_party/tensorpipe' 2025-09-07T09:40:29.1622752Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T09:40:29.2106364Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T09:40:29.2487387Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T09:40:29.2710891Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T09:40:29.3090095Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T09:40:29.3503156Z ##[endgroup] 2025-09-07T09:40:29.3503616Z ##[group]Persisting credentials for submodules 2025-09-07T09:40:29.3512568Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-09-07T09:40:29.3884846Z Entering 'android/libs/fbjni' 2025-09-07T09:40:29.3917314Z url.https://github.com/.insteadof 2025-09-07T09:40:29.3917666Z url.https://github.com/.insteadof 2025-09-07T09:40:29.4405460Z Entering 'third_party/FP16' 2025-09-07T09:40:29.4433626Z url.https://github.com/.insteadof 2025-09-07T09:40:29.4434062Z url.https://github.com/.insteadof 2025-09-07T09:40:29.4802183Z Entering 'third_party/FXdiv' 2025-09-07T09:40:29.4835191Z url.https://github.com/.insteadof 2025-09-07T09:40:29.4835581Z url.https://github.com/.insteadof 2025-09-07T09:40:29.5290102Z Entering 'third_party/NNPACK' 2025-09-07T09:40:29.5320315Z url.https://github.com/.insteadof 2025-09-07T09:40:29.5320677Z url.https://github.com/.insteadof 2025-09-07T09:40:29.5737782Z Entering 'third_party/NVTX' 2025-09-07T09:40:29.5769760Z url.https://github.com/.insteadof 2025-09-07T09:40:29.5770123Z url.https://github.com/.insteadof 2025-09-07T09:40:29.6203715Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T09:40:29.6235670Z url.https://github.com/.insteadof 2025-09-07T09:40:29.6236026Z url.https://github.com/.insteadof 2025-09-07T09:40:29.6394181Z Entering 'third_party/XNNPACK' 2025-09-07T09:40:29.6428019Z url.https://github.com/.insteadof 2025-09-07T09:40:29.6428374Z url.https://github.com/.insteadof 2025-09-07T09:40:29.6819366Z Entering 'third_party/aiter' 2025-09-07T09:40:29.6851136Z url.https://github.com/.insteadof 2025-09-07T09:40:29.6851499Z url.https://github.com/.insteadof 2025-09-07T09:40:29.7186470Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T09:40:29.7216600Z url.https://github.com/.insteadof 2025-09-07T09:40:29.7216956Z url.https://github.com/.insteadof 2025-09-07T09:40:29.7654558Z Entering 'third_party/benchmark' 2025-09-07T09:40:29.7687704Z url.https://github.com/.insteadof 2025-09-07T09:40:29.7688052Z url.https://github.com/.insteadof 2025-09-07T09:40:29.8067707Z Entering 'third_party/composable_kernel' 2025-09-07T09:40:29.8099902Z url.https://github.com/.insteadof 2025-09-07T09:40:29.8100299Z url.https://github.com/.insteadof 2025-09-07T09:40:29.8552214Z Entering 'third_party/cpp-httplib' 2025-09-07T09:40:29.8583685Z url.https://github.com/.insteadof 2025-09-07T09:40:29.8584024Z url.https://github.com/.insteadof 2025-09-07T09:40:29.8971710Z Entering 'third_party/cpuinfo' 2025-09-07T09:40:29.9002870Z url.https://github.com/.insteadof 2025-09-07T09:40:29.9003537Z url.https://github.com/.insteadof 2025-09-07T09:40:29.9409263Z Entering 'third_party/cudnn_frontend' 2025-09-07T09:40:29.9439714Z url.https://github.com/.insteadof 2025-09-07T09:40:29.9440032Z url.https://github.com/.insteadof 2025-09-07T09:40:29.9819593Z Entering 'third_party/cutlass' 2025-09-07T09:40:29.9849043Z url.https://github.com/.insteadof 2025-09-07T09:40:29.9849378Z url.https://github.com/.insteadof 2025-09-07T09:40:30.0194799Z Entering 'third_party/fbgemm' 2025-09-07T09:40:30.0226025Z url.https://github.com/.insteadof 2025-09-07T09:40:30.0226331Z url.https://github.com/.insteadof 2025-09-07T09:40:30.0658933Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T09:40:30.0688279Z url.https://github.com/.insteadof 2025-09-07T09:40:30.0688653Z url.https://github.com/.insteadof 2025-09-07T09:40:30.1013859Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T09:40:30.1044141Z url.https://github.com/.insteadof 2025-09-07T09:40:30.1044601Z url.https://github.com/.insteadof 2025-09-07T09:40:30.1447622Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T09:40:30.1475530Z url.https://github.com/.insteadof 2025-09-07T09:40:30.1475850Z url.https://github.com/.insteadof 2025-09-07T09:40:30.1856651Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T09:40:30.1888236Z url.https://github.com/.insteadof 2025-09-07T09:40:30.1888543Z url.https://github.com/.insteadof 2025-09-07T09:40:30.2316771Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T09:40:30.2347050Z url.https://github.com/.insteadof 2025-09-07T09:40:30.2347365Z url.https://github.com/.insteadof 2025-09-07T09:40:30.2636890Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T09:40:30.2666611Z url.https://github.com/.insteadof 2025-09-07T09:40:30.2666905Z url.https://github.com/.insteadof 2025-09-07T09:40:30.3082367Z Entering 'third_party/fbgemm/external/json' 2025-09-07T09:40:30.3111503Z url.https://github.com/.insteadof 2025-09-07T09:40:30.3111810Z url.https://github.com/.insteadof 2025-09-07T09:40:30.3461598Z Entering 'third_party/flash-attention' 2025-09-07T09:40:30.3493507Z url.https://github.com/.insteadof 2025-09-07T09:40:30.3493830Z url.https://github.com/.insteadof 2025-09-07T09:40:30.3838635Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T09:40:30.3879869Z url.https://github.com/.insteadof 2025-09-07T09:40:30.3880194Z url.https://github.com/.insteadof 2025-09-07T09:40:30.4234147Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T09:40:30.4263275Z url.https://github.com/.insteadof 2025-09-07T09:40:30.4263604Z url.https://github.com/.insteadof 2025-09-07T09:40:30.4641395Z Entering 'third_party/flatbuffers' 2025-09-07T09:40:30.4673937Z url.https://github.com/.insteadof 2025-09-07T09:40:30.4674249Z url.https://github.com/.insteadof 2025-09-07T09:40:30.5086320Z Entering 'third_party/fmt' 2025-09-07T09:40:30.5116620Z url.https://github.com/.insteadof 2025-09-07T09:40:30.5116960Z url.https://github.com/.insteadof 2025-09-07T09:40:30.5448572Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T09:40:30.5479028Z url.https://github.com/.insteadof 2025-09-07T09:40:30.5479357Z url.https://github.com/.insteadof 2025-09-07T09:40:30.5859048Z Entering 'third_party/gloo' 2025-09-07T09:40:30.5890398Z url.https://github.com/.insteadof 2025-09-07T09:40:30.5890668Z url.https://github.com/.insteadof 2025-09-07T09:40:30.6306802Z Entering 'third_party/googletest' 2025-09-07T09:40:30.6337677Z url.https://github.com/.insteadof 2025-09-07T09:40:30.6337964Z url.https://github.com/.insteadof 2025-09-07T09:40:30.6686065Z Entering 'third_party/ideep' 2025-09-07T09:40:30.6714119Z url.https://github.com/.insteadof 2025-09-07T09:40:30.6714433Z url.https://github.com/.insteadof 2025-09-07T09:40:30.7144633Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T09:40:30.7173023Z url.https://github.com/.insteadof 2025-09-07T09:40:30.7173335Z url.https://github.com/.insteadof 2025-09-07T09:40:30.7577259Z Entering 'third_party/ittapi' 2025-09-07T09:40:30.7606190Z url.https://github.com/.insteadof 2025-09-07T09:40:30.7606502Z url.https://github.com/.insteadof 2025-09-07T09:40:30.8051945Z Entering 'third_party/kineto' 2025-09-07T09:40:30.8080745Z url.https://github.com/.insteadof 2025-09-07T09:40:30.8081049Z url.https://github.com/.insteadof 2025-09-07T09:40:30.8445330Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T09:40:30.8473267Z url.https://github.com/.insteadof 2025-09-07T09:40:30.8473586Z url.https://github.com/.insteadof 2025-09-07T09:40:30.8848762Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T09:40:30.8879308Z url.https://github.com/.insteadof 2025-09-07T09:40:30.8879588Z url.https://github.com/.insteadof 2025-09-07T09:40:30.9274816Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T09:40:30.9305091Z url.https://github.com/.insteadof 2025-09-07T09:40:30.9305402Z url.https://github.com/.insteadof 2025-09-07T09:40:30.9729364Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T09:40:30.9760336Z url.https://github.com/.insteadof 2025-09-07T09:40:30.9760658Z url.https://github.com/.insteadof 2025-09-07T09:40:31.0069708Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T09:40:31.0097320Z url.https://github.com/.insteadof 2025-09-07T09:40:31.0097603Z url.https://github.com/.insteadof 2025-09-07T09:40:31.0551305Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T09:40:31.0583389Z url.https://github.com/.insteadof 2025-09-07T09:40:31.0583685Z url.https://github.com/.insteadof 2025-09-07T09:40:31.0857115Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T09:40:31.0886469Z url.https://github.com/.insteadof 2025-09-07T09:40:31.0886821Z url.https://github.com/.insteadof 2025-09-07T09:40:31.0966011Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T09:40:31.0994136Z url.https://github.com/.insteadof 2025-09-07T09:40:31.0994494Z url.https://github.com/.insteadof 2025-09-07T09:40:31.1458805Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T09:40:31.1488184Z url.https://github.com/.insteadof 2025-09-07T09:40:31.1488475Z url.https://github.com/.insteadof 2025-09-07T09:40:31.1731192Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T09:40:31.1759735Z url.https://github.com/.insteadof 2025-09-07T09:40:31.1760019Z url.https://github.com/.insteadof 2025-09-07T09:40:31.2224823Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T09:40:31.2255430Z url.https://github.com/.insteadof 2025-09-07T09:40:31.2255756Z url.https://github.com/.insteadof 2025-09-07T09:40:31.2674154Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T09:40:31.2704162Z url.https://github.com/.insteadof 2025-09-07T09:40:31.2704447Z url.https://github.com/.insteadof 2025-09-07T09:40:31.2795248Z Entering 'third_party/kleidiai' 2025-09-07T09:40:31.2822666Z url.https://github.com/.insteadof 2025-09-07T09:40:31.2822939Z url.https://github.com/.insteadof 2025-09-07T09:40:31.3172887Z Entering 'third_party/mimalloc' 2025-09-07T09:40:31.3200814Z url.https://github.com/.insteadof 2025-09-07T09:40:31.3201080Z url.https://github.com/.insteadof 2025-09-07T09:40:31.3574609Z Entering 'third_party/nlohmann' 2025-09-07T09:40:31.3606019Z url.https://github.com/.insteadof 2025-09-07T09:40:31.3606306Z url.https://github.com/.insteadof 2025-09-07T09:40:31.3996288Z Entering 'third_party/onnx' 2025-09-07T09:40:31.4025636Z url.https://github.com/.insteadof 2025-09-07T09:40:31.4025973Z url.https://github.com/.insteadof 2025-09-07T09:40:31.4388502Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T09:40:31.4419281Z url.https://github.com/.insteadof 2025-09-07T09:40:31.4419563Z url.https://github.com/.insteadof 2025-09-07T09:40:31.4793449Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T09:40:31.4827317Z url.https://github.com/.insteadof 2025-09-07T09:40:31.4828142Z url.https://github.com/.insteadof 2025-09-07T09:40:31.5173164Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T09:40:31.5200745Z url.https://github.com/.insteadof 2025-09-07T09:40:31.5201046Z url.https://github.com/.insteadof 2025-09-07T09:40:31.5586770Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T09:40:31.5613738Z url.https://github.com/.insteadof 2025-09-07T09:40:31.5614036Z url.https://github.com/.insteadof 2025-09-07T09:40:31.6000578Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T09:40:31.6029878Z url.https://github.com/.insteadof 2025-09-07T09:40:31.6030199Z url.https://github.com/.insteadof 2025-09-07T09:40:31.6254682Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T09:40:31.6282851Z url.https://github.com/.insteadof 2025-09-07T09:40:31.6283150Z url.https://github.com/.insteadof 2025-09-07T09:40:31.6754573Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T09:40:31.6784325Z url.https://github.com/.insteadof 2025-09-07T09:40:31.6784627Z url.https://github.com/.insteadof 2025-09-07T09:40:31.6842933Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T09:40:31.6873679Z url.https://github.com/.insteadof 2025-09-07T09:40:31.6873989Z url.https://github.com/.insteadof 2025-09-07T09:40:31.6985815Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T09:40:31.7013846Z url.https://github.com/.insteadof 2025-09-07T09:40:31.7014138Z url.https://github.com/.insteadof 2025-09-07T09:40:31.7351604Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T09:40:31.7379326Z url.https://github.com/.insteadof 2025-09-07T09:40:31.7379633Z url.https://github.com/.insteadof 2025-09-07T09:40:31.7770002Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T09:40:31.7799631Z url.https://github.com/.insteadof 2025-09-07T09:40:31.7799967Z url.https://github.com/.insteadof 2025-09-07T09:40:31.8039582Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T09:40:31.8067953Z url.https://github.com/.insteadof 2025-09-07T09:40:31.8068250Z url.https://github.com/.insteadof 2025-09-07T09:40:31.8465158Z Entering 'third_party/pocketfft' 2025-09-07T09:40:31.8494312Z url.https://github.com/.insteadof 2025-09-07T09:40:31.8494617Z url.https://github.com/.insteadof 2025-09-07T09:40:31.8821565Z Entering 'third_party/protobuf' 2025-09-07T09:40:31.8855826Z url.https://github.com/.insteadof 2025-09-07T09:40:31.8856117Z url.https://github.com/.insteadof 2025-09-07T09:40:31.9156077Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T09:40:31.9184884Z url.https://github.com/.insteadof 2025-09-07T09:40:31.9185327Z url.https://github.com/.insteadof 2025-09-07T09:40:31.9624102Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T09:40:31.9652836Z url.https://github.com/.insteadof 2025-09-07T09:40:31.9653152Z url.https://github.com/.insteadof 2025-09-07T09:40:32.0110234Z Entering 'third_party/psimd' 2025-09-07T09:40:32.0142088Z url.https://github.com/.insteadof 2025-09-07T09:40:32.0142388Z url.https://github.com/.insteadof 2025-09-07T09:40:32.0490847Z Entering 'third_party/pthreadpool' 2025-09-07T09:40:32.0520711Z url.https://github.com/.insteadof 2025-09-07T09:40:32.0521000Z url.https://github.com/.insteadof 2025-09-07T09:40:32.0929925Z Entering 'third_party/pybind11' 2025-09-07T09:40:32.0963027Z url.https://github.com/.insteadof 2025-09-07T09:40:32.0963322Z url.https://github.com/.insteadof 2025-09-07T09:40:32.1261683Z Entering 'third_party/python-peachpy' 2025-09-07T09:40:32.1291546Z url.https://github.com/.insteadof 2025-09-07T09:40:32.1291851Z url.https://github.com/.insteadof 2025-09-07T09:40:32.1727269Z Entering 'third_party/sleef' 2025-09-07T09:40:32.1757721Z url.https://github.com/.insteadof 2025-09-07T09:40:32.1758035Z url.https://github.com/.insteadof 2025-09-07T09:40:32.2205503Z Entering 'third_party/tensorpipe' 2025-09-07T09:40:32.2233616Z url.https://github.com/.insteadof 2025-09-07T09:40:32.2233916Z url.https://github.com/.insteadof 2025-09-07T09:40:32.2625443Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T09:40:32.2650782Z url.https://github.com/.insteadof 2025-09-07T09:40:32.2651074Z url.https://github.com/.insteadof 2025-09-07T09:40:32.3118112Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T09:40:32.3148981Z url.https://github.com/.insteadof 2025-09-07T09:40:32.3151288Z url.https://github.com/.insteadof 2025-09-07T09:40:32.3568004Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T09:40:32.3598672Z url.https://github.com/.insteadof 2025-09-07T09:40:32.3599020Z url.https://github.com/.insteadof 2025-09-07T09:40:32.3974322Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T09:40:32.4003363Z url.https://github.com/.insteadof 2025-09-07T09:40:32.4003666Z url.https://github.com/.insteadof 2025-09-07T09:40:32.4457576Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T09:40:32.4485195Z url.https://github.com/.insteadof 2025-09-07T09:40:32.4485494Z url.https://github.com/.insteadof 2025-09-07T09:40:32.4920152Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-09-07T09:40:32.5208596Z Entering 'android/libs/fbjni' 2025-09-07T09:40:32.5346418Z file:/home/eve/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-09-07T09:40:32.5369106Z Entering 'third_party/FP16' 2025-09-07T09:40:32.5817581Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-09-07T09:40:32.5839232Z Entering 'third_party/FXdiv' 2025-09-07T09:40:32.6228164Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-09-07T09:40:32.6249830Z Entering 'third_party/NNPACK' 2025-09-07T09:40:32.6702321Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-09-07T09:40:32.6723720Z Entering 'third_party/NVTX' 2025-09-07T09:40:32.7181888Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-09-07T09:40:32.7205910Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T09:40:32.7602141Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-09-07T09:40:32.7626393Z Entering 'third_party/XNNPACK' 2025-09-07T09:40:32.8082290Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-09-07T09:40:32.8121881Z Entering 'third_party/aiter' 2025-09-07T09:40:32.8422887Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-09-07T09:40:32.8446466Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T09:40:32.8866559Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-09-07T09:40:32.8898254Z Entering 'third_party/benchmark' 2025-09-07T09:40:32.9338785Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-09-07T09:40:32.9362496Z Entering 'third_party/composable_kernel' 2025-09-07T09:40:32.9735415Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-09-07T09:40:32.9765325Z Entering 'third_party/cpp-httplib' 2025-09-07T09:40:33.0215332Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-09-07T09:40:33.0242724Z Entering 'third_party/cpuinfo' 2025-09-07T09:40:33.0679630Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-09-07T09:40:33.0703737Z Entering 'third_party/cudnn_frontend' 2025-09-07T09:40:33.1101657Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-09-07T09:40:33.1126503Z Entering 'third_party/cutlass' 2025-09-07T09:40:33.1577143Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-09-07T09:40:33.1612309Z Entering 'third_party/fbgemm' 2025-09-07T09:40:33.2034578Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-09-07T09:40:33.2060237Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T09:40:33.2500017Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-09-07T09:40:33.2523014Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T09:40:33.2968628Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-09-07T09:40:33.2997787Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T09:40:33.3386442Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-09-07T09:40:33.3408233Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T09:40:33.4405207Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-09-07T09:40:33.4436450Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T09:40:33.4884724Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-09-07T09:40:33.4909109Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T09:40:33.6938663Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-09-07T09:40:33.6964734Z Entering 'third_party/fbgemm/external/json' 2025-09-07T09:40:33.7409635Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-09-07T09:40:33.7435936Z Entering 'third_party/flash-attention' 2025-09-07T09:40:33.7842903Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-09-07T09:40:33.7868920Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T09:40:33.8321219Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-09-07T09:40:33.8356914Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T09:40:33.8776646Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-09-07T09:40:33.8810719Z Entering 'third_party/flatbuffers' 2025-09-07T09:40:33.9253949Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-09-07T09:40:33.9278953Z Entering 'third_party/fmt' 2025-09-07T09:40:33.9685570Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-09-07T09:40:33.9707405Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T09:40:34.0117729Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-09-07T09:40:34.0140632Z Entering 'third_party/gloo' 2025-09-07T09:40:34.0599019Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-09-07T09:40:34.0621834Z Entering 'third_party/googletest' 2025-09-07T09:40:34.1030042Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-09-07T09:40:34.1053351Z Entering 'third_party/ideep' 2025-09-07T09:40:34.1514375Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-09-07T09:40:34.1535372Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T09:40:34.1975431Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-09-07T09:40:34.2003599Z Entering 'third_party/ittapi' 2025-09-07T09:40:34.2414021Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-09-07T09:40:34.2435239Z Entering 'third_party/kineto' 2025-09-07T09:40:34.2901754Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-09-07T09:40:34.2922434Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T09:40:34.3321291Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-09-07T09:40:34.3342246Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T09:40:34.3798984Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-09-07T09:40:34.3822244Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T09:40:34.4279985Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-09-07T09:40:34.4300897Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T09:40:34.4696320Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-09-07T09:40:34.4718161Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T09:40:34.5190155Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-09-07T09:40:34.5209194Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T09:40:34.5633514Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-09-07T09:40:34.5659064Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T09:40:34.6084913Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-09-07T09:40:34.6106208Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T09:40:34.6566805Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-09-07T09:40:34.6588274Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T09:40:34.6966984Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-09-07T09:40:34.6990852Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T09:40:34.7407212Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-09-07T09:40:34.7429940Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T09:40:34.7867858Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-09-07T09:40:34.7891058Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T09:40:34.8315904Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-09-07T09:40:34.8340817Z Entering 'third_party/kleidiai' 2025-09-07T09:40:34.8789439Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-09-07T09:40:34.8817980Z Entering 'third_party/mimalloc' 2025-09-07T09:40:34.9242337Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-09-07T09:40:34.9266325Z Entering 'third_party/nlohmann' 2025-09-07T09:40:34.9662071Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-09-07T09:40:34.9689988Z Entering 'third_party/onnx' 2025-09-07T09:40:35.0128682Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-09-07T09:40:35.0169788Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T09:40:35.0530750Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-09-07T09:40:35.0557893Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T09:40:35.1022798Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-09-07T09:40:35.1046787Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T09:40:35.1449673Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-09-07T09:40:35.1471098Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T09:40:35.1904273Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-09-07T09:40:35.1926151Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T09:40:35.2351032Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-09-07T09:40:35.2372243Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T09:40:35.2776790Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-09-07T09:40:35.2797685Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T09:40:35.3215868Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-09-07T09:40:35.3238534Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T09:40:35.3683743Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-09-07T09:40:35.3706093Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T09:40:35.4018046Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-09-07T09:40:35.4038593Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T09:40:35.4438256Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-09-07T09:40:35.4463131Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T09:40:35.4893223Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-09-07T09:40:35.4918889Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T09:40:35.5288150Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-09-07T09:40:35.5331387Z Entering 'third_party/pocketfft' 2025-09-07T09:40:35.5759551Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-09-07T09:40:35.5782611Z Entering 'third_party/protobuf' 2025-09-07T09:40:35.6206324Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-09-07T09:40:35.6229822Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T09:40:35.6634815Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-09-07T09:40:35.6656836Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T09:40:35.7097456Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-09-07T09:40:35.7122179Z Entering 'third_party/psimd' 2025-09-07T09:40:35.7499602Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-09-07T09:40:35.7522831Z Entering 'third_party/pthreadpool' 2025-09-07T09:40:35.7961797Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-09-07T09:40:35.7984645Z Entering 'third_party/pybind11' 2025-09-07T09:40:35.8428500Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-09-07T09:40:35.8450301Z Entering 'third_party/python-peachpy' 2025-09-07T09:40:35.8818993Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-09-07T09:40:35.8842914Z Entering 'third_party/sleef' 2025-09-07T09:40:35.9306466Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-09-07T09:40:35.9328820Z Entering 'third_party/tensorpipe' 2025-09-07T09:40:35.9723513Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-09-07T09:40:35.9745628Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T09:40:36.0178803Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-09-07T09:40:36.0200219Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T09:40:36.0661187Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-09-07T09:40:36.0684774Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T09:40:36.1082323Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-09-07T09:40:36.1109376Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T09:40:36.1561787Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-09-07T09:40:36.1581830Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T09:40:36.2026129Z file:/home/eve/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-09-07T09:40:36.9935508Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-09-07T09:40:37.0224640Z Entering 'android/libs/fbjni' 2025-09-07T09:40:37.0446030Z Entering 'third_party/FP16' 2025-09-07T09:40:37.0661993Z Entering 'third_party/FXdiv' 2025-09-07T09:40:37.1138353Z Entering 'third_party/NNPACK' 2025-09-07T09:40:37.1544442Z Entering 'third_party/NVTX' 2025-09-07T09:40:37.2016091Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T09:40:37.2488308Z Entering 'third_party/XNNPACK' 2025-09-07T09:40:37.2919636Z Entering 'third_party/aiter' 2025-09-07T09:40:37.3389546Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T09:40:37.3869197Z Entering 'third_party/benchmark' 2025-09-07T09:40:37.4307104Z Entering 'third_party/composable_kernel' 2025-09-07T09:40:37.4803319Z Entering 'third_party/cpp-httplib' 2025-09-07T09:40:37.5279095Z Entering 'third_party/cpuinfo' 2025-09-07T09:40:37.5722956Z Entering 'third_party/cudnn_frontend' 2025-09-07T09:40:37.6142953Z Entering 'third_party/cutlass' 2025-09-07T09:40:37.6615416Z Entering 'third_party/fbgemm' 2025-09-07T09:40:37.7089207Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T09:40:37.7536012Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T09:40:37.8023268Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T09:40:37.8403481Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T09:40:37.8894072Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T09:40:37.9373687Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T09:40:37.9703598Z Entering 'third_party/fbgemm/external/json' 2025-09-07T09:40:38.0186292Z Entering 'third_party/flash-attention' 2025-09-07T09:40:38.0676182Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T09:40:38.1158523Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T09:40:38.1659862Z Entering 'third_party/flatbuffers' 2025-09-07T09:40:38.2053503Z Entering 'third_party/fmt' 2025-09-07T09:40:38.2498461Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T09:40:38.2968059Z Entering 'third_party/gloo' 2025-09-07T09:40:38.3412142Z Entering 'third_party/googletest' 2025-09-07T09:40:38.3889301Z Entering 'third_party/ideep' 2025-09-07T09:40:38.4260517Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T09:40:38.4781120Z Entering 'third_party/ittapi' 2025-09-07T09:40:38.6440457Z Entering 'third_party/kineto' 2025-09-07T09:40:38.6488357Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T09:40:38.6662098Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T09:40:38.6710185Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T09:40:38.6755276Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T09:40:38.6798652Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T09:40:38.6842345Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T09:40:38.6887261Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T09:40:38.6930295Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T09:40:38.6974434Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T09:40:38.7018949Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T09:40:38.7494207Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T09:40:38.7867409Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T09:40:38.8348633Z Entering 'third_party/kleidiai' 2025-09-07T09:40:38.8818935Z Entering 'third_party/mimalloc' 2025-09-07T09:40:38.9234357Z Entering 'third_party/nlohmann' 2025-09-07T09:40:39.0741840Z Entering 'third_party/onnx' 2025-09-07T09:40:39.0956876Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T09:40:39.1148813Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T09:40:39.1350388Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T09:40:39.1552038Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T09:40:39.1726393Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T09:40:39.1769667Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T09:40:39.2103665Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T09:40:39.2534523Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T09:40:39.2869824Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T09:40:39.3121322Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T09:40:39.3572121Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T09:40:39.3824343Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T09:40:39.4229794Z Entering 'third_party/pocketfft' 2025-09-07T09:40:39.4636590Z Entering 'third_party/protobuf' 2025-09-07T09:40:39.4968568Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T09:40:39.5382619Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T09:40:39.5731411Z Entering 'third_party/psimd' 2025-09-07T09:40:39.6108880Z Entering 'third_party/pthreadpool' 2025-09-07T09:40:39.6479132Z Entering 'third_party/pybind11' 2025-09-07T09:40:39.6804725Z Entering 'third_party/python-peachpy' 2025-09-07T09:40:39.7229227Z Entering 'third_party/sleef' 2025-09-07T09:40:39.7546589Z Entering 'third_party/tensorpipe' 2025-09-07T09:40:39.7990013Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T09:40:39.8427128Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T09:40:39.8816937Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T09:40:40.1568601Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T09:40:40.2040592Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T09:40:40.2559743Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-09-07T09:40:40.2880066Z Entering 'android/libs/fbjni' 2025-09-07T09:40:40.2998258Z Entering 'third_party/FP16' 2025-09-07T09:40:40.3468012Z Entering 'third_party/FXdiv' 2025-09-07T09:40:40.3868678Z Entering 'third_party/NNPACK' 2025-09-07T09:40:40.4209003Z Entering 'third_party/NVTX' 2025-09-07T09:40:40.4678403Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T09:40:40.6170546Z Entering 'third_party/XNNPACK' 2025-09-07T09:40:40.6602504Z Entering 'third_party/aiter' 2025-09-07T09:40:40.7068517Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T09:40:40.7483743Z Entering 'third_party/benchmark' 2025-09-07T09:40:41.0090487Z Entering 'third_party/composable_kernel' 2025-09-07T09:40:41.0672045Z Entering 'third_party/cpp-httplib' 2025-09-07T09:40:41.1115916Z Entering 'third_party/cpuinfo' 2025-09-07T09:40:41.1538223Z Entering 'third_party/cudnn_frontend' 2025-09-07T09:40:41.4205485Z Entering 'third_party/cutlass' 2025-09-07T09:40:41.4361409Z Entering 'third_party/fbgemm' 2025-09-07T09:40:41.4771402Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T09:40:41.5252867Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T09:40:41.7992907Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T09:40:41.8036810Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T09:40:41.8284665Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T09:40:41.8509849Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T09:40:41.8813471Z Entering 'third_party/fbgemm/external/json' 2025-09-07T09:40:41.8881429Z Entering 'third_party/flash-attention' 2025-09-07T09:40:41.9047302Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T09:40:41.9115425Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T09:40:41.9233224Z Entering 'third_party/flatbuffers' 2025-09-07T09:40:42.0769650Z Entering 'third_party/fmt' 2025-09-07T09:40:42.0832872Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T09:40:42.0957724Z Entering 'third_party/gloo' 2025-09-07T09:40:42.1190665Z Entering 'third_party/googletest' 2025-09-07T09:40:42.1249160Z Entering 'third_party/ideep' 2025-09-07T09:40:42.1293341Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T09:40:42.1373949Z Entering 'third_party/ittapi' 2025-09-07T09:40:42.1439988Z Entering 'third_party/kineto' 2025-09-07T09:40:42.1493784Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T09:40:42.1820893Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T09:40:42.4996529Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T09:40:42.5047789Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T09:40:42.5338721Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T09:40:42.5694728Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T09:40:42.6175599Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T09:40:42.6264352Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T09:40:42.6314736Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T09:40:42.6373127Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T09:40:42.6788461Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T09:40:42.7236228Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T09:40:42.7327222Z Entering 'third_party/kleidiai' 2025-09-07T09:40:42.7472071Z Entering 'third_party/mimalloc' 2025-09-07T09:40:42.7941447Z Entering 'third_party/nlohmann' 2025-09-07T09:40:42.8209560Z Entering 'third_party/onnx' 2025-09-07T09:40:42.8564098Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T09:40:42.8969690Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T09:40:42.9046223Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T09:40:42.9533890Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T09:40:43.0052504Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T09:40:43.0310361Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T09:40:43.0792084Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T09:40:43.1084628Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T09:40:43.1174795Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T09:40:43.1627707Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T09:40:43.3439767Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T09:40:43.4110701Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T09:40:43.4264201Z Entering 'third_party/pocketfft' 2025-09-07T09:40:43.4732552Z Entering 'third_party/protobuf' 2025-09-07T09:40:43.5179155Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T09:40:43.5586942Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T09:40:43.5810162Z Entering 'third_party/psimd' 2025-09-07T09:40:43.6291356Z Entering 'third_party/pthreadpool' 2025-09-07T09:40:43.6653187Z Entering 'third_party/pybind11' 2025-09-07T09:40:43.7076421Z Entering 'third_party/python-peachpy' 2025-09-07T09:40:43.7166667Z Entering 'third_party/sleef' 2025-09-07T09:40:43.7847112Z Entering 'third_party/tensorpipe' 2025-09-07T09:40:43.8297734Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T09:40:43.8462722Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T09:40:43.8659236Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T09:40:43.8885148Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T09:40:43.9457571Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T09:40:43.9518721Z ##[endgroup] 2025-09-07T09:40:43.9558020Z [command]/usr/bin/git log -1 --format=%H 2025-09-07T09:40:43.9588447Z 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T09:40:43.9767510Z Prepare all required actions 2025-09-07T09:40:43.9768085Z Getting action download info 2025-09-07T09:40:44.3308606Z ##[group]Run ./.github/actions/setup-linux 2025-09-07T09:40:44.3308909Z env: 2025-09-07T09:40:44.3309090Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:40:44.3309295Z ##[endgroup] 2025-09-07T09:40:44.6296463Z ##[group]Run set -euo pipefail 2025-09-07T09:40:44.6296760Z set -euo pipefail 2025-09-07T09:40:44.6296991Z function get_ec2_metadata() { 2025-09-07T09:40:44.6297272Z  # Pulled from instance metadata endpoint for EC2 2025-09-07T09:40:44.6297735Z  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html 2025-09-07T09:40:44.6298139Z  category=$1 2025-09-07T09:40:44.6298410Z  # If it is GCP runner (runner name contains gcp), do not run this 2025-09-07T09:40:44.6298734Z  runner_name_str=i-0d73070610f53945f-1005 2025-09-07T09:40:44.6299015Z  if [[ -f /.inarc ]]; then 2025-09-07T09:40:44.6299282Z  echo "ARC Runner, no info on ec2 metadata" 2025-09-07T09:40:44.6299551Z  elif [[ $runner_name_str == *"gcp"* ]]; then 2025-09-07T09:40:44.6300113Z  echo "Runner is from Google Cloud Platform, No info on ec2 metadata" 2025-09-07T09:40:44.6300418Z  else 2025-09-07T09:40:44.6301028Z  curl -H "X-aws-ec2-metadata-token: $(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 30")" -fsSL "http://169.254.169.254/latest/meta-data/${category}" 2025-09-07T09:40:44.6301808Z  fi 2025-09-07T09:40:44.6301969Z } 2025-09-07T09:40:44.6302170Z echo "ami-id: $(get_ec2_metadata ami-id)" 2025-09-07T09:40:44.6302482Z echo "instance-id: $(get_ec2_metadata instance-id)" 2025-09-07T09:40:44.6302830Z echo "instance-type: $(get_ec2_metadata instance-type)" 2025-09-07T09:40:44.6303136Z echo "system info $(uname -a)" 2025-09-07T09:40:44.6317867Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:40:44.6318169Z env: 2025-09-07T09:40:44.6318331Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:40:44.6318559Z ##[endgroup] 2025-09-07T09:40:44.6648896Z ami-id: ARC Runner, no info on ec2 metadata 2025-09-07T09:40:44.6655486Z instance-id: ARC Runner, no info on ec2 metadata 2025-09-07T09:40:44.6661115Z instance-type: ARC Runner, no info on ec2 metadata 2025-09-07T09:40:44.6675962Z system info Linux c9e10662379e 6.8.0-1017-aws #18~22.04.1-Ubuntu SMP Thu Oct 3 19:57:42 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux 2025-09-07T09:40:44.6956474Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T09:40:44.6957279Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T09:40:44.6971065Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:40:44.6971354Z env: 2025-09-07T09:40:44.6971523Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:40:44.6971719Z ##[endgroup] 2025-09-07T09:40:44.7780961Z ##[group]Run nick-fields/retry@v3.0.0 2025-09-07T09:40:44.7782383Z with: 2025-09-07T09:40:44.7782708Z shell: bash 2025-09-07T09:40:44.7782921Z timeout_minutes: 5 2025-09-07T09:40:44.7783315Z max_attempts: 3 2025-09-07T09:40:44.7783716Z retry_wait_seconds: 30 2025-09-07T09:40:44.7788339Z command: AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" # For LF Runners we need to make sure we also login to Meta's ECR docker registry too. META_AWS_ACCOUNT_ID=308535385114 if [ "$AWS_ACCOUNT_ID" != "$META_AWS_ACCOUNT_ID" ] ; then aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$META_AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" fi 2025-09-07T09:40:44.7790286Z polling_interval_seconds: 1 2025-09-07T09:40:44.7790538Z warning_on_retry: true 2025-09-07T09:40:44.7790881Z continue_on_error: false 2025-09-07T09:40:44.7791135Z env: 2025-09-07T09:40:44.7791492Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:40:44.7791706Z AWS_RETRY_MODE: standard 2025-09-07T09:40:44.7792508Z AWS_MAX_ATTEMPTS: 5 2025-09-07T09:40:44.7792813Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:40:44.7793020Z ##[endgroup] 2025-09-07T09:40:46.4330258Z 2025-09-07T09:40:46.4330824Z WARNING! Your credentials are stored unencrypted in '/home/eve/.docker/config.json'. 2025-09-07T09:40:46.4331402Z Configure a credential helper to remove this warning. See 2025-09-07T09:40:46.4331802Z https://docs.docker.com/go/credential-store/ 2025-09-07T09:40:46.4332022Z 2025-09-07T09:40:46.4332116Z Login Succeeded 2025-09-07T09:40:46.8923604Z Command completed after 1 attempt(s). 2025-09-07T09:40:47.0235552Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-09-07T09:40:47.0236027Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-09-07T09:40:47.0236635Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-09-07T09:40:47.0255541Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:40:47.0255852Z env: 2025-09-07T09:40:47.0256016Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:40:47.0256216Z ##[endgroup] 2025-09-07T09:40:47.1613582Z ##[group]Run set +e 2025-09-07T09:40:47.1613859Z set +e 2025-09-07T09:40:47.1614075Z set -x 2025-09-07T09:40:47.1614273Z  2025-09-07T09:40:47.1614518Z PT_DOMAIN=download.pytorch.org 2025-09-07T09:40:47.1615218Z # TODO: Flaky access to download.pytorch.org https://github.com/pytorch/pytorch/issues/100400, 2025-09-07T09:40:47.1615827Z # cleaning this up once the issue is fixed. There are more than one resolved IP here, the last 2025-09-07T09:40:47.1616254Z # one is returned at random 2025-09-07T09:40:47.1616578Z RESOLVED_IP=$(dig -4 +short "${PT_DOMAIN}" | tail -n1) 2025-09-07T09:40:47.1616881Z  2025-09-07T09:40:47.1617083Z if [ -z "${RESOLVED_IP}" ]; then 2025-09-07T09:40:47.1617427Z  echo "Couldn't resolve ${PT_DOMAIN}, retrying with Google DNS..." 2025-09-07T09:40:47.1617841Z  RESOLVED_IP=$(dig -4 +short "${PT_DOMAIN}" @8.8.8.8 | tail -n1) 2025-09-07T09:40:47.1618152Z  2025-09-07T09:40:47.1618339Z  if [ -z "${RESOLVED_IP}" ]; then 2025-09-07T09:40:47.1618642Z  echo "Couldn't resolve ${PT_DOMAIN}, exiting..." 2025-09-07T09:40:47.1618927Z  exit 1 2025-09-07T09:40:47.1619129Z  fi 2025-09-07T09:40:47.1619300Z fi 2025-09-07T09:40:47.1619465Z  2025-09-07T09:40:47.1619659Z if grep -r "${PT_DOMAIN}" /etc/hosts; then 2025-09-07T09:40:47.1619951Z  # Clean up any old records first 2025-09-07T09:40:47.1620240Z  sudo sed -i "/${PT_DOMAIN}/d" /etc/hosts 2025-09-07T09:40:47.1620493Z fi 2025-09-07T09:40:47.1620649Z  2025-09-07T09:40:47.1620899Z echo "${RESOLVED_IP} ${PT_DOMAIN}" | sudo tee -a /etc/hosts 2025-09-07T09:40:47.1621218Z cat /etc/hosts 2025-09-07T09:40:47.1635896Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:40:47.1636180Z env: 2025-09-07T09:40:47.1636346Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:40:47.1636543Z ##[endgroup] 2025-09-07T09:40:47.2062472Z + PT_DOMAIN=download.pytorch.org 2025-09-07T09:40:47.2068347Z ++ dig -4 +short download.pytorch.org 2025-09-07T09:40:47.2069537Z ++ tail -n1 2025-09-07T09:40:47.2483982Z + RESOLVED_IP=3.170.131.13 2025-09-07T09:40:47.2484304Z + '[' -z 3.170.131.13 ']' 2025-09-07T09:40:47.2484622Z + grep -r download.pytorch.org /etc/hosts 2025-09-07T09:40:47.2502007Z + echo '3.170.131.13 download.pytorch.org' 2025-09-07T09:40:47.2502535Z + sudo tee -a /etc/hosts 2025-09-07T09:40:47.4242872Z 3.170.131.13 download.pytorch.org 2025-09-07T09:40:47.4243350Z + cat /etc/hosts 2025-09-07T09:40:47.4243632Z 127.0.0.1 localhost 2025-09-07T09:40:47.4248883Z ::1 localhost ip6-localhost ip6-loopback 2025-09-07T09:40:47.4249259Z fe00:: ip6-localnet 2025-09-07T09:40:47.4249511Z ff00:: ip6-mcastprefix 2025-09-07T09:40:47.4249717Z ff02::1 ip6-allnodes 2025-09-07T09:40:47.4249919Z ff02::2 ip6-allrouters 2025-09-07T09:40:47.4250115Z 172.17.0.2 c9e10662379e 2025-09-07T09:40:47.4250324Z 3.170.131.13 download.pytorch.org 2025-09-07T09:40:47.4713000Z ##[group]Run set +x 2025-09-07T09:40:47.4713271Z set +x 2025-09-07T09:40:47.4713542Z  2025-09-07T09:40:47.4713791Z max_attempts=30 2025-09-07T09:40:47.4714122Z delay=10 2025-09-07T09:40:47.4714382Z attempt=1 2025-09-07T09:40:47.4714610Z  2025-09-07T09:40:47.4714891Z for attempt in $(seq 1 $max_attempts); do 2025-09-07T09:40:47.4715609Z  echo "Attempt $attempt of $max_attempts: Checking if Docker daemon is running..." 2025-09-07T09:40:47.4716147Z  if docker info > /dev/null 2>&1; then 2025-09-07T09:40:47.4716555Z  echo "Docker is running. Proceeding with the next steps" 2025-09-07T09:40:47.4717090Z  exit 0 2025-09-07T09:40:47.4717352Z  else 2025-09-07T09:40:47.4717641Z  echo "Docker is not running yet." 2025-09-07T09:40:47.4717952Z  echo "Retrying in $delay seconds..." 2025-09-07T09:40:47.4718205Z  sleep $delay 2025-09-07T09:40:47.4718390Z  fi 2025-09-07T09:40:47.4718551Z done 2025-09-07T09:40:47.4718802Z echo "Reached maximum attempts to connect to Docker. Exiting." 2025-09-07T09:40:47.4719096Z exit 1 2025-09-07T09:40:47.4734763Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:40:47.4735307Z env: 2025-09-07T09:40:47.4735554Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:40:47.4735755Z ##[endgroup] 2025-09-07T09:40:47.5238545Z Attempt 1 of 30: Checking if Docker daemon is running... 2025-09-07T09:40:47.5706808Z Docker is running. Proceeding with the next steps 2025-09-07T09:40:47.6128487Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main 2025-09-07T09:40:47.6128885Z with: 2025-09-07T09:40:47.6129661Z docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T09:40:47.6130508Z use-custom-docker-registry: true 2025-09-07T09:40:47.6130762Z docker-build-dir: .ci/docker 2025-09-07T09:40:47.6131002Z docker-build-script: ./build.sh 2025-09-07T09:40:47.6131244Z working-directory: . 2025-09-07T09:40:47.6131528Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T09:40:47.6131849Z force-push: false 2025-09-07T09:40:47.6132037Z env: 2025-09-07T09:40:47.6132209Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:40:47.6132423Z ##[endgroup] 2025-09-07T09:40:47.8897675Z ##[group]Run set -ex 2025-09-07T09:40:47.8897953Z set -ex 2025-09-07T09:40:47.8898132Z  2025-09-07T09:40:47.8898503Z # If the docker build directory or the build script doesn't exist, the action will 2025-09-07T09:40:47.8899054Z # gracefully return the docker image name as it is. Pulling docker image in Linux 2025-09-07T09:40:47.8899504Z # job could then download the pre-built image as usual 2025-09-07T09:40:47.8900736Z if [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then 2025-09-07T09:40:47.8901803Z  echo "skip=false" >> "${GITHUB_OUTPUT}" 2025-09-07T09:40:47.8902076Z else 2025-09-07T09:40:47.8902292Z  echo "skip=true" >> "${GITHUB_OUTPUT}" 2025-09-07T09:40:47.8902974Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-09-07T09:40:47.8904224Z  2025-09-07T09:40:47.8904699Z  echo "Not using custom ECR registry. Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..." 2025-09-07T09:40:47.8905406Z  exit 0 2025-09-07T09:40:47.8905569Z fi 2025-09-07T09:40:47.8905725Z  2025-09-07T09:40:47.8905974Z if [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then 2025-09-07T09:40:47.8906410Z  # The docker image name already includes the ECR prefix and tag, so we can just 2025-09-07T09:40:47.8906795Z  # use it as it is, but first let's extract the tag 2025-09-07T09:40:47.8907136Z  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}') 2025-09-07T09:40:47.8907499Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-09-07T09:40:47.8908358Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-09-07T09:40:47.8909750Z else 2025-09-07T09:40:47.8910606Z  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then 2025-09-07T09:40:47.8911014Z  CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:} 2025-09-07T09:40:47.8911313Z  DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*} 2025-09-07T09:40:47.8911812Z  fi 2025-09-07T09:40:47.8912153Z  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}") 2025-09-07T09:40:47.8912586Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-09-07T09:40:47.8913032Z  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-09-07T09:40:47.8913525Z  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}" 2025-09-07T09:40:47.8913830Z fi 2025-09-07T09:40:47.8932466Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:40:47.8932762Z env: 2025-09-07T09:40:47.8932925Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:40:47.8933131Z REPO_NAME: pytorch 2025-09-07T09:40:47.8934095Z DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T09:40:47.8935159Z DOCKER_BUILD_DIR: .ci/docker 2025-09-07T09:40:47.8935386Z DOCKER_BUILD_SCRIPT: ./build.sh 2025-09-07T09:40:47.8935668Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T09:40:47.8935975Z USE_CUSTOM_DOCKER_REGISTRY: true 2025-09-07T09:40:47.8936196Z CUSTOM_TAG_PREFIX: 2025-09-07T09:40:47.8937012Z ##[endgroup] 2025-09-07T09:40:47.9408821Z + [[ -d .ci/docker ]] 2025-09-07T09:40:47.9409072Z + [[ -f .ci/docker/./build.sh ]] 2025-09-07T09:40:47.9409307Z + [[ true == \t\r\u\e ]] 2025-09-07T09:40:47.9409514Z + echo skip=false 2025-09-07T09:40:47.9410529Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]] 2025-09-07T09:40:47.9419271Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T09:40:47.9420079Z ++ awk -F '[:,]' '{print $2}' 2025-09-07T09:40:47.9438914Z + DOCKER_TAG=pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T09:40:47.9439909Z + echo docker-tag=pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T09:40:47.9441246Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T09:40:47.9898756Z ##[group]Run set +e 2025-09-07T09:40:47.9898991Z set +e 2025-09-07T09:40:47.9899166Z set -x 2025-09-07T09:40:47.9899336Z  2025-09-07T09:40:47.9899490Z login() { 2025-09-07T09:40:47.9899873Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-09-07T09:40:47.9900267Z } 2025-09-07T09:40:47.9900422Z  2025-09-07T09:40:47.9900569Z retry () { 2025-09-07T09:40:47.9900768Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-09-07T09:40:47.9901014Z } 2025-09-07T09:40:47.9901167Z  2025-09-07T09:40:47.9901412Z retry login "${DOCKER_REGISTRY}" 2025-09-07T09:40:47.9901644Z  2025-09-07T09:40:47.9901805Z START_TIME=$(date +%s) 2025-09-07T09:40:47.9902024Z # Wait up to 120 minutes 2025-09-07T09:40:47.9902294Z while [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do 2025-09-07T09:40:47.9902663Z  # Check if image already exists, if it does then skip building it 2025-09-07T09:40:47.9903037Z  if docker manifest inspect "${DOCKER_IMAGE}"; then 2025-09-07T09:40:47.9903311Z  exit 0 2025-09-07T09:40:47.9903485Z  fi 2025-09-07T09:40:47.9903847Z  2025-09-07T09:40:47.9904135Z  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can 2025-09-07T09:40:47.9904628Z  # use this to differentiate between the Docker build and regular build jobs. For the 2025-09-07T09:40:47.9905265Z  # latter, it will wait for the Docker images to become available before continuing 2025-09-07T09:40:47.9905666Z  if [ "${DOCKER_PUSH:-false}" == "true" ]; then 2025-09-07T09:40:47.9905969Z  # It's a Docker build job, let's build the image 2025-09-07T09:40:47.9906221Z  break 2025-09-07T09:40:47.9906393Z  else 2025-09-07T09:40:47.9906643Z  # It's a regular build job, wait for the image to become available 2025-09-07T09:40:47.9906932Z  sleep 300 2025-09-07T09:40:47.9907115Z  fi 2025-09-07T09:40:47.9907273Z done 2025-09-07T09:40:47.9907428Z  2025-09-07T09:40:47.9907860Z # NB: This part requires a full checkout. Otherwise, the merge base will 2025-09-07T09:40:47.9908306Z # be empty. The default action would be to continue rebuild the image 2025-09-07T09:40:47.9908685Z if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then 2025-09-07T09:40:47.9909012Z  # if we're on the base branch then use the parent commit 2025-09-07T09:40:47.9909292Z  MERGE_BASE=$(git rev-parse HEAD~) 2025-09-07T09:40:47.9909521Z else 2025-09-07T09:40:47.9909759Z  # otherwise we're on a PR, so use the most recent base commit 2025-09-07T09:40:47.9910106Z  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") 2025-09-07T09:40:47.9910366Z fi 2025-09-07T09:40:47.9910513Z  2025-09-07T09:40:47.9910688Z if [[ -z "${MERGE_BASE}" ]]; then 2025-09-07T09:40:47.9910950Z  echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-09-07T09:40:47.9911187Z  2025-09-07T09:40:47.9911529Z  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..." 2025-09-07T09:40:47.9911924Z  exit 0 2025-09-07T09:40:47.9912088Z fi 2025-09-07T09:40:47.9912238Z  2025-09-07T09:40:47.9912453Z if ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then 2025-09-07T09:40:47.9912935Z  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit" 2025-09-07T09:40:47.9913353Z  exit 1 2025-09-07T09:40:47.9913512Z fi 2025-09-07T09:40:47.9913659Z  2025-09-07T09:40:47.9913909Z PREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}") 2025-09-07T09:40:47.9914370Z # If no image exists but the hash is the same as the previous hash then we should error out here 2025-09-07T09:40:47.9914793Z if [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then 2025-09-07T09:40:47.9915430Z  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch" 2025-09-07T09:40:47.9915968Z  echo " Will re-build docker image to store in local cache, TTS may be longer" 2025-09-07T09:40:47.9916282Z fi 2025-09-07T09:40:47.9916430Z  2025-09-07T09:40:47.9916615Z echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-09-07T09:40:47.9930061Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:40:47.9930348Z env: 2025-09-07T09:40:47.9930510Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:40:47.9930718Z DOCKER_BUILD_DIR: .ci/docker 2025-09-07T09:40:47.9930976Z BASE_REVISION: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T09:40:47.9931739Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T09:40:47.9932734Z DOCKER_TAG: pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T09:40:47.9933485Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T09:40:47.9933774Z DOCKER_PUSH: 2025-09-07T09:40:47.9933946Z ##[endgroup] 2025-09-07T09:40:48.0330950Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T09:40:48.0331321Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T09:40:48.0334910Z + aws ecr get-login-password --region us-east-1 2025-09-07T09:40:48.0336760Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T09:40:48.9190241Z 2025-09-07T09:40:48.9190823Z WARNING! Your credentials are stored unencrypted in '/home/eve/.docker/config.json'. 2025-09-07T09:40:48.9191387Z Configure a credential helper to remove this warning. See 2025-09-07T09:40:48.9191778Z https://docs.docker.com/go/credential-store/ 2025-09-07T09:40:48.9191999Z 2025-09-07T09:40:48.9192083Z Login Succeeded 2025-09-07T09:40:48.9226884Z ++ date +%s 2025-09-07T09:40:48.9237839Z + START_TIME=1757238048 2025-09-07T09:40:48.9242478Z ++ date +%s 2025-09-07T09:40:48.9253595Z + [[ 1757230848 -lt 1757238048 ]] 2025-09-07T09:40:48.9254462Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T09:40:49.3247278Z { 2025-09-07T09:40:49.3247518Z "schemaVersion": 2, 2025-09-07T09:40:49.3247852Z "mediaType": "application/vnd.docker.distribution.manifest.v2+json", 2025-09-07T09:40:49.3248203Z "config": { 2025-09-07T09:40:49.3248462Z "mediaType": "application/vnd.docker.container.image.v1+json", 2025-09-07T09:40:49.3248773Z "size": 31375, 2025-09-07T09:40:49.3249092Z "digest": "sha256:29d1d8a31b215537637bab7c99e18c255840b899cf7023e4e3cb5efa3270aef8" 2025-09-07T09:40:49.3249450Z }, 2025-09-07T09:40:49.3249599Z "layers": [ 2025-09-07T09:40:49.3249760Z { 2025-09-07T09:40:49.3250033Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.3250365Z "size": 30448359, 2025-09-07T09:40:49.3250683Z "digest": "sha256:e6fdc8487bfe6d764301ef3634bc6c043841dc3ab05ca14f81e69c0f92562d46" 2025-09-07T09:40:49.3251041Z }, 2025-09-07T09:40:49.3266820Z { 2025-09-07T09:40:49.3267267Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.3267824Z "size": 1554, 2025-09-07T09:40:49.3268378Z "digest": "sha256:171dcef20c49de4bc9268f60e02f111b72c638b0f24c3c5636c5013029db6d30" 2025-09-07T09:40:49.3268784Z }, 2025-09-07T09:40:49.3268941Z { 2025-09-07T09:40:49.3269216Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.3269554Z "size": 313297922, 2025-09-07T09:40:49.3269890Z "digest": "sha256:4c92b3f72f1df31fe9f487fc1c27fcf1ba475ffb43abd69056306d1247786e40" 2025-09-07T09:40:49.3270262Z }, 2025-09-07T09:40:49.3270403Z { 2025-09-07T09:40:49.3270805Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.3271141Z "size": 792, 2025-09-07T09:40:49.3271461Z "digest": "sha256:744f9ba90a6582eb601b3c20409bb10d6dad635dd118c3975f79721f4c82747c" 2025-09-07T09:40:49.3271805Z }, 2025-09-07T09:40:49.3271956Z { 2025-09-07T09:40:49.3272209Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.3272521Z "size": 106, 2025-09-07T09:40:49.3272811Z "digest": "sha256:d3c08322a3326e45849dd80264a047c4f42ba4a2419d35c919542e2890e23934" 2025-09-07T09:40:49.3273167Z }, 2025-09-07T09:40:49.3273316Z { 2025-09-07T09:40:49.3273558Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.3273857Z "size": 704, 2025-09-07T09:40:49.3274162Z "digest": "sha256:ffd43b71f3ccf3ba563606231cb1d191eb9dd0052f422d54835e6af350525170" 2025-09-07T09:40:49.3274511Z }, 2025-09-07T09:40:49.3274654Z { 2025-09-07T09:40:49.3274889Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.3275437Z "size": 1215, 2025-09-07T09:40:49.4080953Z "digest": "sha256:830692b57f6e2758398ec80c3b67a20441d12696b54ed14f2ecebf926198f7d6" 2025-09-07T09:40:49.4081357Z }, 2025-09-07T09:40:49.4081524Z { 2025-09-07T09:40:49.4081798Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4082138Z "size": 482, 2025-09-07T09:40:49.4082461Z "digest": "sha256:5bad36d184686719399be50830a98939d7dbda2313fb407df5915217483fc6a3" 2025-09-07T09:40:49.4082828Z }, 2025-09-07T09:40:49.4082982Z { 2025-09-07T09:40:49.4083243Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4083577Z "size": 110343614, 2025-09-07T09:40:49.4083920Z "digest": "sha256:0e34fdd9ac5c39eb0a9d2c2d258b26f42bb79d7dc0a22014bf201daa2e033eb4" 2025-09-07T09:40:49.4084302Z }, 2025-09-07T09:40:49.4084461Z { 2025-09-07T09:40:49.4084728Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4085258Z "size": 4786, 2025-09-07T09:40:49.4085840Z "digest": "sha256:3c868a62868ef54f82ac11be8dabe1b4365d000bacfe4c104e08022fc96dd767" 2025-09-07T09:40:49.4086254Z }, 2025-09-07T09:40:49.4086414Z { 2025-09-07T09:40:49.4086677Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4087018Z "size": 1710, 2025-09-07T09:40:49.4087350Z "digest": "sha256:62170a22dd571d55ffccac64c0be17f4006d2498cfbf7c6289325f0899cba005" 2025-09-07T09:40:49.4087728Z }, 2025-09-07T09:40:49.4087887Z { 2025-09-07T09:40:49.4088150Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4088482Z "size": 724, 2025-09-07T09:40:49.4088809Z "digest": "sha256:553c1d23b6c4dbd8ab136d0c3659460391ffa14cb9b43be9d7b2f47f90895697" 2025-09-07T09:40:49.4089184Z }, 2025-09-07T09:40:49.4089335Z { 2025-09-07T09:40:49.4089622Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4089956Z "size": 543, 2025-09-07T09:40:49.4090272Z "digest": "sha256:9408d557a804a7dce00897e03ce9f4f447281eb38ce4bc331098a1f1a5ff0d30" 2025-09-07T09:40:49.4090645Z }, 2025-09-07T09:40:49.4090801Z { 2025-09-07T09:40:49.4091057Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4091379Z "size": 3241148049, 2025-09-07T09:40:49.4091721Z "digest": "sha256:df607cfc7c07db6d442e0274e2be8cdc507df8716717363aa92f2fea069bdd9a" 2025-09-07T09:40:49.4092101Z }, 2025-09-07T09:40:49.4092299Z { 2025-09-07T09:40:49.4092667Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4093080Z "size": 32, 2025-09-07T09:40:49.4093572Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T09:40:49.4094174Z }, 2025-09-07T09:40:49.4094403Z { 2025-09-07T09:40:49.4094808Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4095522Z "size": 380, 2025-09-07T09:40:49.4095896Z "digest": "sha256:40a8e39faeda9f5273ff5014b2ef7d1ffeeef321de234186a705b1e0574326d2" 2025-09-07T09:40:49.4096279Z }, 2025-09-07T09:40:49.4096433Z { 2025-09-07T09:40:49.4096759Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4097198Z "size": 53548049, 2025-09-07T09:40:49.4097493Z "digest": "sha256:d895771c9faca390d7270f8c9c832b1428128c31ba6760b837d64b7e5920373f" 2025-09-07T09:40:49.4097817Z }, 2025-09-07T09:40:49.4098011Z { 2025-09-07T09:40:49.4098414Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4098824Z "size": 232, 2025-09-07T09:40:49.4099181Z "digest": "sha256:c4ee04f39d49efb46e52443e60c7f41832ea708d9bc5bf76c6d740895c66f57a" 2025-09-07T09:40:49.4099633Z }, 2025-09-07T09:40:49.4099773Z { 2025-09-07T09:40:49.4099998Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4100296Z "size": 3403403, 2025-09-07T09:40:49.4100593Z "digest": "sha256:3690c9826e48ed74e21e494d9d78990902abbc68795d002260ce71bff9a2cb3b" 2025-09-07T09:40:49.4100920Z }, 2025-09-07T09:40:49.4101373Z { 2025-09-07T09:40:49.4101614Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4101918Z "size": 1478, 2025-09-07T09:40:49.4102220Z "digest": "sha256:57cbc5013733eedfdf176b6db4b44458e826e1f64c0ef38849e9d77addc88936" 2025-09-07T09:40:49.4102546Z }, 2025-09-07T09:40:49.4102684Z { 2025-09-07T09:40:49.4102908Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4103195Z "size": 482, 2025-09-07T09:40:49.4103476Z "digest": "sha256:f5f4b06b58bbe4201d8b2eb5b0c6c1299f2725dd59e71cc45ef76ad89bba4deb" 2025-09-07T09:40:49.4103818Z }, 2025-09-07T09:40:49.4103955Z { 2025-09-07T09:40:49.4104177Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4104455Z "size": 197, 2025-09-07T09:40:49.4104737Z "digest": "sha256:f59713ce4bf491fe1f663d90e3b32d2290a7d8a4a0e8e13301e3bdb10b949f8e" 2025-09-07T09:40:49.4105264Z }, 2025-09-07T09:40:49.4105494Z { 2025-09-07T09:40:49.4106105Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4106569Z "size": 608, 2025-09-07T09:40:49.4106882Z "digest": "sha256:fe0486521517e626cae4fcbd9c83eb3956aad3ab0f833becee187b830891417b" 2025-09-07T09:40:49.4107251Z }, 2025-09-07T09:40:49.4107477Z { 2025-09-07T09:40:49.4107762Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4108157Z "size": 7874747615, 2025-09-07T09:40:49.4108576Z "digest": "sha256:8c21cc3715a2d715295f0299d8d2443262a3ae8defc1921f3226a0a24fc9c8fe" 2025-09-07T09:40:49.4109143Z }, 2025-09-07T09:40:49.4109374Z { 2025-09-07T09:40:49.4109770Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4110264Z "size": 829, 2025-09-07T09:40:49.4110742Z "digest": "sha256:d37c58456a6a4aa45d78abdb95553b3de0c79d941e18dc757c2c39fd59819739" 2025-09-07T09:40:49.4111307Z }, 2025-09-07T09:40:49.4111552Z { 2025-09-07T09:40:49.4111862Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4112243Z "size": 36688200, 2025-09-07T09:40:49.4112772Z "digest": "sha256:d042f63abc13891184a9d8e0dcdfae9a0daa140dea919fd319f12dcab5c684eb" 2025-09-07T09:40:49.4113120Z }, 2025-09-07T09:40:49.4113260Z { 2025-09-07T09:40:49.4113480Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4113770Z "size": 104, 2025-09-07T09:40:49.4114048Z "digest": "sha256:621284a9c05a47131a59226f6847b5b76ad211908278c1bdb990029d42259941" 2025-09-07T09:40:49.4114368Z }, 2025-09-07T09:40:49.4114513Z { 2025-09-07T09:40:49.4114780Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4115212Z "size": 1496, 2025-09-07T09:40:49.4115503Z "digest": "sha256:85f605d2dd3a8378567d3d974f0ec4694ef5fd988b25aca5d9aebd7c9b9ff018" 2025-09-07T09:40:49.4115830Z }, 2025-09-07T09:40:49.4115968Z { 2025-09-07T09:40:49.4116197Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4116632Z "size": 454406172, 2025-09-07T09:40:49.4116937Z "digest": "sha256:381b5539e5981dc994e71ab212f50135c32128fe1cc35d78bc386da6dffe1d51" 2025-09-07T09:40:49.4117471Z }, 2025-09-07T09:40:49.4117681Z { 2025-09-07T09:40:49.4117905Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4118196Z "size": 162, 2025-09-07T09:40:49.4118643Z "digest": "sha256:a487c0c800295407a4c7ab88c5b9e891b8b6aab9e35e62994d124369fcd7ba87" 2025-09-07T09:40:49.4119005Z }, 2025-09-07T09:40:49.4119220Z { 2025-09-07T09:40:49.4119489Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4119861Z "size": 346, 2025-09-07T09:40:49.4120208Z "digest": "sha256:48bcb81e256634f4132369d8bac738d9d622b010e5802e5292f565edba9035df" 2025-09-07T09:40:49.4120525Z }, 2025-09-07T09:40:49.4120662Z { 2025-09-07T09:40:49.4121014Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4121382Z "size": 32, 2025-09-07T09:40:49.4121997Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T09:40:49.4122329Z }, 2025-09-07T09:40:49.4122464Z { 2025-09-07T09:40:49.4122754Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4123039Z "size": 106, 2025-09-07T09:40:49.4123319Z "digest": "sha256:e261928c0043c734790a38fa9ebf1bf8674801fa2f5051c3d2eac04e0f02b743" 2025-09-07T09:40:49.4123643Z }, 2025-09-07T09:40:49.4123778Z { 2025-09-07T09:40:49.4123995Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4124281Z "size": 425, 2025-09-07T09:40:49.4124559Z "digest": "sha256:0fea55428091bc98d5c48986120dd1da50b9b6cbd507408b2cdebdbe455e272e" 2025-09-07T09:40:49.4124881Z }, 2025-09-07T09:40:49.4125239Z { 2025-09-07T09:40:49.4125588Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4126111Z "size": 20224775, 2025-09-07T09:40:49.4126725Z "digest": "sha256:b4291bccbb8428a38187cd286fef7c24bd4863c7872c4d1cf96404ec1a69b321" 2025-09-07T09:40:49.4127188Z }, 2025-09-07T09:40:49.4127422Z { 2025-09-07T09:40:49.4127820Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4128304Z "size": 108, 2025-09-07T09:40:49.4128603Z "digest": "sha256:ddc91b09189afc218499daee92ebc22c6deefb22ee115c52c07627ecbaf7b9d5" 2025-09-07T09:40:49.4129103Z }, 2025-09-07T09:40:49.4129335Z { 2025-09-07T09:40:49.4129648Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4129933Z "size": 640, 2025-09-07T09:40:49.4130338Z "digest": "sha256:7540c74286279d1d6a29cdb51d3421e64860c6af74ca4a95736725c0509791ed" 2025-09-07T09:40:49.4130663Z }, 2025-09-07T09:40:49.4130846Z { 2025-09-07T09:40:49.4131103Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4131415Z "size": 724, 2025-09-07T09:40:49.4131878Z "digest": "sha256:553c1d23b6c4dbd8ab136d0c3659460391ffa14cb9b43be9d7b2f47f90895697" 2025-09-07T09:40:49.4132279Z }, 2025-09-07T09:40:49.4132485Z { 2025-09-07T09:40:49.4132710Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4133058Z "size": 149, 2025-09-07T09:40:49.4133330Z "digest": "sha256:003c4e2598fb39f97ec7734271e034a48a3956a58429c9d06601770c2c40de11" 2025-09-07T09:40:49.4133754Z }, 2025-09-07T09:40:49.4133924Z { 2025-09-07T09:40:49.4134285Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4134574Z "size": 135, 2025-09-07T09:40:49.4134855Z "digest": "sha256:5687149362ae68fa2aa7d4ecd39fbf7ea86c0f6ced36a71f3c59f68f6c465cfc" 2025-09-07T09:40:49.4135534Z }, 2025-09-07T09:40:49.4135671Z { 2025-09-07T09:40:49.4135896Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4136185Z "size": 141, 2025-09-07T09:40:49.4136474Z "digest": "sha256:cdd2cf54eb2a3d8d034aa1556c9724d240b06397ba08f8b13b0bed6d65755aeb" 2025-09-07T09:40:49.4136816Z }, 2025-09-07T09:40:49.4136962Z { 2025-09-07T09:40:49.4137182Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4137474Z "size": 18615922074, 2025-09-07T09:40:49.4137783Z "digest": "sha256:d3ad4df1ba3a86ef1f84c427aae440ff027d483949d48eec4be6135260668cad" 2025-09-07T09:40:49.4138117Z }, 2025-09-07T09:40:49.4138246Z { 2025-09-07T09:40:49.4138471Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4138761Z "size": 223, 2025-09-07T09:40:49.4139038Z "digest": "sha256:3c9055753b4c79d74c707a91d8626ce10bc439129ba10dad3ebc643d9d4955dd" 2025-09-07T09:40:49.4139355Z }, 2025-09-07T09:40:49.4139495Z { 2025-09-07T09:40:49.4139720Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4140006Z "size": 353035275, 2025-09-07T09:40:49.4140292Z "digest": "sha256:31cf8d0bd21c76ae21f73d8b19b30949d161a498354f54191b4e5a294e929701" 2025-09-07T09:40:49.4140703Z }, 2025-09-07T09:40:49.4141071Z { 2025-09-07T09:40:49.4141442Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4141802Z "size": 6523020957, 2025-09-07T09:40:49.4142170Z "digest": "sha256:6623ea81497183b62e034e4ea8df8bf00fa75aaa192eea2821b2dd8655383b8f" 2025-09-07T09:40:49.4142630Z }, 2025-09-07T09:40:49.4142771Z { 2025-09-07T09:40:49.4142993Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4143359Z "size": 129, 2025-09-07T09:40:49.4143647Z "digest": "sha256:11696c3aa3808236d49256bc170b49d55cf657e499592b39b4856f6137220f55" 2025-09-07T09:40:49.4144018Z }, 2025-09-07T09:40:49.4144148Z { 2025-09-07T09:40:49.4144470Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4144769Z "size": 778, 2025-09-07T09:40:49.4145353Z "digest": "sha256:ef4d544e35cacc73a229bcbc7a5510f8b156c7b3041f19f3a274562cd97cfd94" 2025-09-07T09:40:49.4145695Z }, 2025-09-07T09:40:49.4145829Z { 2025-09-07T09:40:49.4146228Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4146608Z "size": 724, 2025-09-07T09:40:49.4147021Z "digest": "sha256:553c1d23b6c4dbd8ab136d0c3659460391ffa14cb9b43be9d7b2f47f90895697" 2025-09-07T09:40:49.4147420Z }, 2025-09-07T09:40:49.4147627Z { 2025-09-07T09:40:49.4147943Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4148275Z "size": 141, 2025-09-07T09:40:49.4148549Z "digest": "sha256:5c5108865e5e293209ae9bae8a29645035242e7e4b4433208a777496fddc988c" 2025-09-07T09:40:49.4148862Z }, 2025-09-07T09:40:49.4148994Z { 2025-09-07T09:40:49.4149209Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4149651Z "size": 32, 2025-09-07T09:40:49.4149991Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T09:40:49.4150481Z }, 2025-09-07T09:40:49.4150614Z { 2025-09-07T09:40:49.4150849Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4151140Z "size": 159, 2025-09-07T09:40:49.4151417Z "digest": "sha256:9e97578e9edf1a11187740a5aa102633331fb6a714d0ed48683782de5a36fbd8" 2025-09-07T09:40:49.4151784Z }, 2025-09-07T09:40:49.4151928Z { 2025-09-07T09:40:49.4152157Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4152506Z "size": 1012, 2025-09-07T09:40:49.4152786Z "digest": "sha256:da5a91b54cb51f851560992645bc203f2287d9b1d7a4f04f7f4ea7efe45036ce" 2025-09-07T09:40:49.4153202Z }, 2025-09-07T09:40:49.4153341Z { 2025-09-07T09:40:49.4153573Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4153934Z "size": 724, 2025-09-07T09:40:49.4154234Z "digest": "sha256:553c1d23b6c4dbd8ab136d0c3659460391ffa14cb9b43be9d7b2f47f90895697" 2025-09-07T09:40:49.4154638Z }, 2025-09-07T09:40:49.4154779Z { 2025-09-07T09:40:49.4155165Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4155524Z "size": 135, 2025-09-07T09:40:49.4155871Z "digest": "sha256:1e93be219e89e7733b91ba7e3af1a44d985e84959f732ecd5f5ca61bd13b5d41" 2025-09-07T09:40:49.4156336Z }, 2025-09-07T09:40:49.4156471Z { 2025-09-07T09:40:49.4156705Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4157031Z "size": 32, 2025-09-07T09:40:49.4157402Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T09:40:49.4157735Z }, 2025-09-07T09:40:49.4157870Z { 2025-09-07T09:40:49.4158177Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4158471Z "size": 158, 2025-09-07T09:40:49.4158907Z "digest": "sha256:136825afebb533ee295f0d2523595281086c6410c60d5f712b84cefd24cb31d5" 2025-09-07T09:40:49.4159316Z }, 2025-09-07T09:40:49.4159448Z { 2025-09-07T09:40:49.4159677Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4159964Z "size": 1368, 2025-09-07T09:40:49.4160440Z "digest": "sha256:22b39805302d877e4c1ba433ebc36520438ea29a9ba8bc059efbcd9106f3a82d" 2025-09-07T09:40:49.4160767Z }, 2025-09-07T09:40:49.4160902Z { 2025-09-07T09:40:49.4161124Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4161431Z "size": 32, 2025-09-07T09:40:49.4161790Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T09:40:49.4162196Z }, 2025-09-07T09:40:49.4162323Z { 2025-09-07T09:40:49.4162643Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4162949Z "size": 136, 2025-09-07T09:40:49.4163237Z "digest": "sha256:d12add675e3505e74eb9880eeef540ea0801282ca1ae01c3c221157cec91f5ae" 2025-09-07T09:40:49.4163563Z }, 2025-09-07T09:40:49.4163692Z { 2025-09-07T09:40:49.4163965Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4164276Z "size": 380, 2025-09-07T09:40:49.4164842Z "digest": "sha256:bc127046d33a7a98563698411b54ece8a167d520922879d7b69e8ca73a12d034" 2025-09-07T09:40:49.4165510Z }, 2025-09-07T09:40:49.4165649Z { 2025-09-07T09:40:49.4165978Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4166401Z "size": 32, 2025-09-07T09:40:49.4166740Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T09:40:49.4167174Z }, 2025-09-07T09:40:49.4167360Z { 2025-09-07T09:40:49.4167585Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4167944Z "size": 104, 2025-09-07T09:40:49.4168404Z "digest": "sha256:951e8ce838415c4257680a9d60d216f3750cbb18d243d9a21e2008cce7e589cf" 2025-09-07T09:40:49.4168962Z }, 2025-09-07T09:40:49.4169201Z { 2025-09-07T09:40:49.4169582Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4170088Z "size": 408, 2025-09-07T09:40:49.4170582Z "digest": "sha256:32340b97ae50ba7b2918ab40d6f4a8db875afee69318f484e4deb0a1e2ec4beb" 2025-09-07T09:40:49.4171152Z }, 2025-09-07T09:40:49.4171366Z { 2025-09-07T09:40:49.4171758Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4172236Z "size": 32, 2025-09-07T09:40:49.4172732Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T09:40:49.4173265Z }, 2025-09-07T09:40:49.4173405Z { 2025-09-07T09:40:49.4173686Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4173975Z "size": 109, 2025-09-07T09:40:49.4174267Z "digest": "sha256:5bbb04cd6b57ae13d7cf05ab9e9b4ed9752833ee2dba4eeaac47bde6022c4725" 2025-09-07T09:40:49.4174604Z }, 2025-09-07T09:40:49.4174744Z { 2025-09-07T09:40:49.4175156Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4175448Z "size": 1897, 2025-09-07T09:40:49.4175743Z "digest": "sha256:d8c4b845cfc7ca7cc0604f472bf6da8b1f1d4e98dff3c76e1985a7013a5b9e3f" 2025-09-07T09:40:49.4176084Z }, 2025-09-07T09:40:49.4176224Z { 2025-09-07T09:40:49.4176445Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4176736Z "size": 243440375, 2025-09-07T09:40:49.4177033Z "digest": "sha256:b35c180f4d8ddc2396eac4a6b893f438481a8163ceb0b88f203488bc5f2a8ba4" 2025-09-07T09:40:49.4177366Z }, 2025-09-07T09:40:49.4177498Z { 2025-09-07T09:40:49.4177722Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4178016Z "size": 106, 2025-09-07T09:40:49.4178385Z "digest": "sha256:5f967b3c303a99e609441551f7c8988cca4fd464c0c3127506bff8509583091b" 2025-09-07T09:40:49.4178728Z }, 2025-09-07T09:40:49.4178863Z { 2025-09-07T09:40:49.4179181Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4179500Z "size": 166, 2025-09-07T09:40:49.4179806Z "digest": "sha256:04770904f012e5584f1c19a0bc92d9863baaebf08bf75b4a9981f2b7795c8953" 2025-09-07T09:40:49.4180191Z }, 2025-09-07T09:40:49.4180375Z { 2025-09-07T09:40:49.4180951Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4181412Z "size": 7943, 2025-09-07T09:40:49.4181850Z "digest": "sha256:73373941fb321b4cb4a171b1423a68a4c7fedada3a1498868d7efe93cb03170e" 2025-09-07T09:40:49.4182205Z }, 2025-09-07T09:40:49.4182374Z { 2025-09-07T09:40:49.4182598Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4182977Z "size": 8072, 2025-09-07T09:40:49.4183392Z "digest": "sha256:9572e6cd907bfa4888456dbccc6e22146a0044374585f3fa0a8ced19b831ed62" 2025-09-07T09:40:49.4183715Z }, 2025-09-07T09:40:49.4183844Z { 2025-09-07T09:40:49.4184069Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4184399Z "size": 304, 2025-09-07T09:40:49.4184683Z "digest": "sha256:64a544aba233551e38898f138dd6ba3161ccdb9554e0ffb5b9d8f0f7fe4a7fa8" 2025-09-07T09:40:49.4185277Z }, 2025-09-07T09:40:49.4185423Z { 2025-09-07T09:40:49.4185948Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4186280Z "size": 13362696, 2025-09-07T09:40:49.4186565Z "digest": "sha256:7e35418a24997de5428763c93826679486760a1a9563209ae64de66ba45f99c1" 2025-09-07T09:40:49.4186880Z }, 2025-09-07T09:40:49.4187040Z { 2025-09-07T09:40:49.4187310Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4187656Z "size": 108, 2025-09-07T09:40:49.4187947Z "digest": "sha256:2ed8e82748d4a1131f41d9e41322f47a6ffef67a5a2b7bf5392237db5c035c61" 2025-09-07T09:40:49.4188355Z }, 2025-09-07T09:40:49.4188511Z { 2025-09-07T09:40:49.4188749Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4189214Z "size": 54145663, 2025-09-07T09:40:49.4189552Z "digest": "sha256:c988fbcccd708fb158a81c429d32e1060a7e40924fc3c987c629fa69d9484717" 2025-09-07T09:40:49.4190037Z }, 2025-09-07T09:40:49.4190168Z { 2025-09-07T09:40:49.4190458Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T09:40:49.4190750Z "size": 32, 2025-09-07T09:40:49.4191217Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T09:40:49.4191559Z } 2025-09-07T09:40:49.4191692Z ] 2025-09-07T09:40:49.4191829Z } 2025-09-07T09:40:49.4191994Z + exit 0 2025-09-07T09:40:49.6458656Z ##[group]Run set -eux 2025-09-07T09:40:49.6458891Z set -eux 2025-09-07T09:40:49.6459210Z # It's ok if this steps fails, it would then be an anonymous user like what we used to have 2025-09-07T09:40:49.6460095Z aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin || true 2025-09-07T09:40:49.6475291Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:40:49.6475589Z env: 2025-09-07T09:40:49.6475765Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:40:49.6475964Z ##[endgroup] 2025-09-07T09:40:49.8772076Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token 2025-09-07T09:40:49.8772663Z + jq --raw-output .SecretString 2025-09-07T09:40:49.8774233Z + jq -r .docker_hub_readonly_token 2025-09-07T09:40:49.8775773Z + docker login --username pytorchbot --password-stdin 2025-09-07T09:40:50.4894689Z 2025-09-07T09:40:50.4896563Z An error occurred (AccessDeniedException) when calling the GetSecretValue operation: User: arn:aws:sts::308535385114:assumed-role/gh-ci-github-action-runners-runner-role/i-0d73070610f53945f is not authorized to perform: secretsmanager:GetSecretValue on resource: docker_hub_readonly_token because no identity-based policy allows the secretsmanager:GetSecretValue action 2025-09-07T09:40:50.5682521Z Error: Cannot perform an interactive login from a non TTY device 2025-09-07T09:40:50.5707592Z + true 2025-09-07T09:40:50.9098499Z ##[group]Run tag=${ECR_DOCKER_IMAGE##*:} 2025-09-07T09:40:50.9098836Z tag=${ECR_DOCKER_IMAGE##*:} 2025-09-07T09:40:50.9099150Z echo "docker pull ghcr.io/pytorch/ci-image:${tag/:/-}" 2025-09-07T09:40:50.9113860Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:40:50.9114164Z env: 2025-09-07T09:40:50.9114334Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:40:50.9115210Z ECR_DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T09:40:50.9115942Z ##[endgroup] 2025-09-07T09:40:50.9151964Z docker pull ghcr.io/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T09:40:51.0931236Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main 2025-09-07T09:40:51.0931623Z with: 2025-09-07T09:40:51.0932361Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T09:40:51.0933250Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T09:40:51.0933561Z env: 2025-09-07T09:40:51.0933736Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:40:51.0933951Z ##[endgroup] 2025-09-07T09:40:51.4433906Z ##[group]Run set -x 2025-09-07T09:40:51.4434164Z set -x 2025-09-07T09:40:51.4434363Z set +e 2025-09-07T09:40:51.4434545Z  2025-09-07T09:40:51.4434721Z login() { 2025-09-07T09:40:51.4435371Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-09-07T09:40:51.4435846Z } 2025-09-07T09:40:51.4436038Z  2025-09-07T09:40:51.4436258Z retry () { 2025-09-07T09:40:51.4436503Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-09-07T09:40:51.4436769Z } 2025-09-07T09:40:51.4436944Z  2025-09-07T09:40:51.4437131Z retry login "${DOCKER_REGISTRY}" 2025-09-07T09:40:51.4437352Z  2025-09-07T09:40:51.4437705Z IMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024') 2025-09-07T09:40:51.4438188Z echo "Compressed size of image in MB: ${IMAGE_SIZE}" 2025-09-07T09:40:51.4438455Z  2025-09-07T09:40:51.4438695Z set -e 2025-09-07T09:40:51.4439116Z # ignore output since only exit code is used for conditional 2025-09-07T09:40:51.4439729Z # only pull docker image if it's not available locally 2025-09-07T09:40:51.4440391Z if ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then 2025-09-07T09:40:51.4441004Z  retry docker pull "${DOCKER_IMAGE}" 2025-09-07T09:40:51.4441392Z fi 2025-09-07T09:40:51.4468699Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:40:51.4469016Z env: 2025-09-07T09:40:51.4469192Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:40:51.4469881Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T09:40:51.4470679Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T09:40:51.4470970Z ##[endgroup] 2025-09-07T09:40:51.4512762Z + set +e 2025-09-07T09:40:51.4513022Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T09:40:51.4513350Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T09:40:51.4517059Z + aws ecr get-login-password --region us-east-1 2025-09-07T09:40:51.4518890Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T09:40:52.6405991Z 2025-09-07T09:40:52.6406757Z WARNING! Your credentials are stored unencrypted in '/home/eve/.docker/config.json'. 2025-09-07T09:40:52.6407618Z Configure a credential helper to remove this warning. See 2025-09-07T09:40:52.6408337Z https://docs.docker.com/go/credential-store/ 2025-09-07T09:40:52.6408632Z 2025-09-07T09:40:52.6408790Z Login Succeeded 2025-09-07T09:40:52.6444246Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T09:40:52.6445503Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024' 2025-09-07T09:40:53.2773427Z + IMAGE_SIZE=36183.606596946716 2025-09-07T09:40:53.2773790Z + echo 'Compressed size of image in MB: 36183.606596946716' 2025-09-07T09:40:53.2774129Z + set -e 2025-09-07T09:40:53.2775706Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T09:40:53.2776657Z Compressed size of image in MB: 36183.606596946716 2025-09-07T09:40:53.2916269Z + retry docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T09:40:53.2917558Z + docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T09:40:56.2985875Z pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77: Pulling from pytorch/ci-image 2025-09-07T09:40:56.2987252Z e6fdc8487bfe: Pulling fs layer 2025-09-07T09:40:56.2987633Z 171dcef20c49: Pulling fs layer 2025-09-07T09:40:56.2987976Z 4c92b3f72f1d: Pulling fs layer 2025-09-07T09:40:56.2988318Z 744f9ba90a65: Pulling fs layer 2025-09-07T09:40:56.2988700Z d3c08322a332: Pulling fs layer 2025-09-07T09:40:56.2989040Z ffd43b71f3cc: Pulling fs layer 2025-09-07T09:40:56.2989381Z 830692b57f6e: Pulling fs layer 2025-09-07T09:40:56.2989714Z 5bad36d18468: Pulling fs layer 2025-09-07T09:40:56.2990031Z 744f9ba90a65: Waiting 2025-09-07T09:40:56.2990339Z 0e34fdd9ac5c: Pulling fs layer 2025-09-07T09:40:56.2990672Z d3c08322a332: Waiting 2025-09-07T09:40:56.2990970Z 830692b57f6e: Waiting 2025-09-07T09:40:56.2991249Z ffd43b71f3cc: Waiting 2025-09-07T09:40:56.2991549Z 3c868a62868e: Pulling fs layer 2025-09-07T09:40:56.2991890Z 62170a22dd57: Pulling fs layer 2025-09-07T09:40:56.2992219Z 553c1d23b6c4: Pulling fs layer 2025-09-07T09:40:56.2992540Z 9408d557a804: Pulling fs layer 2025-09-07T09:40:56.2992875Z df607cfc7c07: Pulling fs layer 2025-09-07T09:40:56.2993211Z 4f4fb700ef54: Pulling fs layer 2025-09-07T09:40:56.2993535Z 5bad36d18468: Waiting 2025-09-07T09:40:56.2993812Z 0e34fdd9ac5c: Waiting 2025-09-07T09:40:56.2994134Z 3c868a62868e: Waiting 2025-09-07T09:40:56.2994410Z 62170a22dd57: Waiting 2025-09-07T09:40:56.2994712Z 40a8e39faeda: Pulling fs layer 2025-09-07T09:40:56.2995208Z 553c1d23b6c4: Waiting 2025-09-07T09:40:56.2995498Z d895771c9fac: Pulling fs layer 2025-09-07T09:40:56.2995793Z 9408d557a804: Waiting 2025-09-07T09:40:56.2995990Z df607cfc7c07: Waiting 2025-09-07T09:40:56.2996186Z 40a8e39faeda: Waiting 2025-09-07T09:40:56.2996385Z c4ee04f39d49: Pulling fs layer 2025-09-07T09:40:56.2996629Z d895771c9fac: Waiting 2025-09-07T09:40:56.2996827Z 4f4fb700ef54: Waiting 2025-09-07T09:40:56.2997032Z 3690c9826e48: Pulling fs layer 2025-09-07T09:40:56.2997257Z c4ee04f39d49: Waiting 2025-09-07T09:40:56.2997461Z 57cbc5013733: Pulling fs layer 2025-09-07T09:40:56.2997694Z f5f4b06b58bb: Pulling fs layer 2025-09-07T09:40:56.2997902Z f59713ce4bf4: Pulling fs layer 2025-09-07T09:40:56.2998099Z fe0486521517: Pulling fs layer 2025-09-07T09:40:56.2998300Z 8c21cc3715a2: Pulling fs layer 2025-09-07T09:40:56.2998507Z d37c58456a6a: Pulling fs layer 2025-09-07T09:40:56.2998696Z 57cbc5013733: Waiting 2025-09-07T09:40:56.2998862Z 8c21cc3715a2: Waiting 2025-09-07T09:40:56.2999030Z f5f4b06b58bb: Waiting 2025-09-07T09:40:56.2999204Z d042f63abc13: Pulling fs layer 2025-09-07T09:40:56.2999391Z f59713ce4bf4: Waiting 2025-09-07T09:40:56.2999556Z 621284a9c05a: Pulling fs layer 2025-09-07T09:40:56.2999743Z d042f63abc13: Waiting 2025-09-07T09:40:56.2999915Z 85f605d2dd3a: Pulling fs layer 2025-09-07T09:40:56.3000494Z 621284a9c05a: Waiting 2025-09-07T09:40:56.3000665Z 381b5539e598: Pulling fs layer 2025-09-07T09:40:56.3000864Z a487c0c80029: Pulling fs layer 2025-09-07T09:40:56.3001059Z 48bcb81e2566: Pulling fs layer 2025-09-07T09:40:56.3001259Z e261928c0043: Pulling fs layer 2025-09-07T09:40:56.3001447Z 0fea55428091: Pulling fs layer 2025-09-07T09:40:56.3001643Z b4291bccbb84: Pulling fs layer 2025-09-07T09:40:56.3001832Z 85f605d2dd3a: Waiting 2025-09-07T09:40:56.3002020Z ddc91b09189a: Pulling fs layer 2025-09-07T09:40:56.3002203Z 0fea55428091: Waiting 2025-09-07T09:40:56.3002368Z 48bcb81e2566: Waiting 2025-09-07T09:40:56.3002759Z 381b5539e598: Waiting 2025-09-07T09:40:56.3002937Z 7540c7428627: Pulling fs layer 2025-09-07T09:40:56.3003112Z a487c0c80029: Waiting 2025-09-07T09:40:56.3003272Z e261928c0043: Waiting 2025-09-07T09:40:56.3003447Z ddc91b09189a: Waiting 2025-09-07T09:40:56.3003616Z b4291bccbb84: Waiting 2025-09-07T09:40:56.3003782Z 003c4e2598fb: Pulling fs layer 2025-09-07T09:40:56.3003964Z 7540c7428627: Waiting 2025-09-07T09:40:56.3004137Z 5687149362ae: Pulling fs layer 2025-09-07T09:40:56.3004323Z cdd2cf54eb2a: Pulling fs layer 2025-09-07T09:40:56.3004510Z 003c4e2598fb: Waiting 2025-09-07T09:40:56.3004671Z 5687149362ae: Waiting 2025-09-07T09:40:56.3004855Z d3ad4df1ba3a: Pulling fs layer 2025-09-07T09:40:56.3005242Z 3c9055753b4c: Pulling fs layer 2025-09-07T09:40:56.3005439Z 31cf8d0bd21c: Pulling fs layer 2025-09-07T09:40:56.3005628Z 6623ea814971: Pulling fs layer 2025-09-07T09:40:56.3005818Z 11696c3aa380: Pulling fs layer 2025-09-07T09:40:56.3005999Z 31cf8d0bd21c: Waiting 2025-09-07T09:40:56.3006171Z ef4d544e35ca: Pulling fs layer 2025-09-07T09:40:56.3006360Z 3c9055753b4c: Waiting 2025-09-07T09:40:56.3006526Z cdd2cf54eb2a: Waiting 2025-09-07T09:40:56.3006685Z 6623ea814971: Waiting 2025-09-07T09:40:56.3006848Z d3ad4df1ba3a: Waiting 2025-09-07T09:40:56.3007030Z 5c5108865e5e: Pulling fs layer 2025-09-07T09:40:56.3007226Z 11696c3aa380: Waiting 2025-09-07T09:40:56.3007384Z ef4d544e35ca: Waiting 2025-09-07T09:40:56.3007561Z 9e97578e9edf: Pulling fs layer 2025-09-07T09:40:56.3007746Z 5c5108865e5e: Waiting 2025-09-07T09:40:56.3007917Z da5a91b54cb5: Pulling fs layer 2025-09-07T09:40:56.3008116Z 1e93be219e89: Pulling fs layer 2025-09-07T09:40:56.3008312Z 136825afebb5: Pulling fs layer 2025-09-07T09:40:56.3008505Z 22b39805302d: Pulling fs layer 2025-09-07T09:40:56.3008695Z d12add675e35: Pulling fs layer 2025-09-07T09:40:56.3008882Z bc127046d33a: Pulling fs layer 2025-09-07T09:40:56.3009067Z 9e97578e9edf: Waiting 2025-09-07T09:40:56.3009234Z da5a91b54cb5: Waiting 2025-09-07T09:40:56.3009401Z 136825afebb5: Waiting 2025-09-07T09:40:56.3009563Z d12add675e35: Waiting 2025-09-07T09:40:56.3009729Z 1e93be219e89: Waiting 2025-09-07T09:40:56.3009893Z 22b39805302d: Waiting 2025-09-07T09:40:56.3010063Z 951e8ce83841: Pulling fs layer 2025-09-07T09:40:56.3010248Z bc127046d33a: Waiting 2025-09-07T09:40:56.3010434Z 32340b97ae50: Pulling fs layer 2025-09-07T09:40:56.3010631Z 5bbb04cd6b57: Pulling fs layer 2025-09-07T09:40:56.3010831Z d8c4b845cfc7: Pulling fs layer 2025-09-07T09:40:56.3011018Z b35c180f4d8d: Pulling fs layer 2025-09-07T09:40:56.3011212Z 5f967b3c303a: Pulling fs layer 2025-09-07T09:40:56.3011409Z 04770904f012: Pulling fs layer 2025-09-07T09:40:56.3011609Z 73373941fb32: Pulling fs layer 2025-09-07T09:40:56.3011802Z 9572e6cd907b: Pulling fs layer 2025-09-07T09:40:56.3011993Z 64a544aba233: Pulling fs layer 2025-09-07T09:40:56.3012178Z 7e35418a2499: Pulling fs layer 2025-09-07T09:40:56.3012366Z 5bbb04cd6b57: Waiting 2025-09-07T09:40:56.3012534Z d8c4b845cfc7: Waiting 2025-09-07T09:40:56.3012699Z 04770904f012: Waiting 2025-09-07T09:40:56.3012859Z 73373941fb32: Waiting 2025-09-07T09:40:56.3013033Z 2ed8e82748d4: Pulling fs layer 2025-09-07T09:40:56.3013232Z b35c180f4d8d: Waiting 2025-09-07T09:40:56.3013410Z c988fbcccd70: Pulling fs layer 2025-09-07T09:40:56.3013593Z 64a544aba233: Waiting 2025-09-07T09:40:56.3013758Z 5f967b3c303a: Waiting 2025-09-07T09:40:56.3013919Z 9572e6cd907b: Waiting 2025-09-07T09:40:56.3014079Z 2ed8e82748d4: Waiting 2025-09-07T09:40:56.3014412Z 7e35418a2499: Waiting 2025-09-07T09:40:56.3014581Z c988fbcccd70: Waiting 2025-09-07T09:40:56.4704511Z 171dcef20c49: Download complete 2025-09-07T09:40:56.6357849Z 744f9ba90a65: Verifying Checksum 2025-09-07T09:40:56.6358511Z 744f9ba90a65: Download complete 2025-09-07T09:40:56.7373691Z e6fdc8487bfe: Verifying Checksum 2025-09-07T09:40:56.7374068Z e6fdc8487bfe: Download complete 2025-09-07T09:40:56.8050131Z d3c08322a332: Verifying Checksum 2025-09-07T09:40:56.8050478Z d3c08322a332: Download complete 2025-09-07T09:40:56.9015808Z ffd43b71f3cc: Verifying Checksum 2025-09-07T09:40:57.1948918Z ffd43b71f3cc: Download complete 2025-09-07T09:40:57.1949305Z 830692b57f6e: Download complete 2025-09-07T09:40:57.3588513Z 5bad36d18468: Verifying Checksum 2025-09-07T09:40:57.3588885Z 5bad36d18468: Download complete 2025-09-07T09:40:57.7893130Z 3c868a62868e: Verifying Checksum 2025-09-07T09:40:57.7893506Z 3c868a62868e: Download complete 2025-09-07T09:40:57.9726180Z 62170a22dd57: Verifying Checksum 2025-09-07T09:40:57.9726567Z 62170a22dd57: Download complete 2025-09-07T09:40:58.3386727Z 553c1d23b6c4: Verifying Checksum 2025-09-07T09:40:58.3387101Z 553c1d23b6c4: Download complete 2025-09-07T09:40:58.5440661Z 9408d557a804: Verifying Checksum 2025-09-07T09:40:58.6637687Z 9408d557a804: Download complete 2025-09-07T09:40:58.6638077Z 0e34fdd9ac5c: Verifying Checksum 2025-09-07T09:40:58.6638434Z 0e34fdd9ac5c: Download complete 2025-09-07T09:40:58.7133103Z 4f4fb700ef54: Verifying Checksum 2025-09-07T09:40:58.7133388Z 4f4fb700ef54: Download complete 2025-09-07T09:40:58.8758362Z 40a8e39faeda: Verifying Checksum 2025-09-07T09:40:58.8758770Z 40a8e39faeda: Download complete 2025-09-07T09:40:59.5600530Z d895771c9fac: Verifying Checksum 2025-09-07T09:40:59.5600862Z d895771c9fac: Download complete 2025-09-07T09:40:59.5781990Z 4c92b3f72f1d: Verifying Checksum 2025-09-07T09:40:59.5782287Z 4c92b3f72f1d: Download complete 2025-09-07T09:41:00.0047024Z c4ee04f39d49: Verifying Checksum 2025-09-07T09:41:00.0047357Z c4ee04f39d49: Download complete 2025-09-07T09:41:00.3198939Z 57cbc5013733: Verifying Checksum 2025-09-07T09:41:00.3200193Z 57cbc5013733: Download complete 2025-09-07T09:41:00.3364496Z 3690c9826e48: Verifying Checksum 2025-09-07T09:41:00.3364911Z 3690c9826e48: Download complete 2025-09-07T09:41:00.4930018Z f5f4b06b58bb: Verifying Checksum 2025-09-07T09:41:00.4930384Z f5f4b06b58bb: Download complete 2025-09-07T09:41:00.5621919Z f59713ce4bf4: Verifying Checksum 2025-09-07T09:41:00.5622232Z f59713ce4bf4: Download complete 2025-09-07T09:41:00.9668972Z fe0486521517: Verifying Checksum 2025-09-07T09:41:00.9669392Z fe0486521517: Download complete 2025-09-07T09:41:00.9669685Z e6fdc8487bfe: Pull complete 2025-09-07T09:41:01.1305657Z d37c58456a6a: Verifying Checksum 2025-09-07T09:41:01.1305989Z d37c58456a6a: Download complete 2025-09-07T09:41:02.0266596Z d042f63abc13: Verifying Checksum 2025-09-07T09:41:02.0266968Z d042f63abc13: Download complete 2025-09-07T09:41:02.3431806Z 621284a9c05a: Verifying Checksum 2025-09-07T09:41:02.3432154Z 621284a9c05a: Download complete 2025-09-07T09:41:02.5397193Z 85f605d2dd3a: Verifying Checksum 2025-09-07T09:41:02.5397488Z 85f605d2dd3a: Download complete 2025-09-07T09:41:04.7797084Z 171dcef20c49: Pull complete 2025-09-07T09:41:07.4982855Z 381b5539e598: Verifying Checksum 2025-09-07T09:41:07.4983237Z 381b5539e598: Download complete 2025-09-07T09:41:07.7605200Z a487c0c80029: Verifying Checksum 2025-09-07T09:41:07.7605585Z a487c0c80029: Download complete 2025-09-07T09:41:07.9313253Z 48bcb81e2566: Verifying Checksum 2025-09-07T09:41:07.9313731Z 48bcb81e2566: Download complete 2025-09-07T09:41:08.2282404Z e261928c0043: Download complete 2025-09-07T09:41:08.5270238Z 0fea55428091: Verifying Checksum 2025-09-07T09:41:08.5270634Z 0fea55428091: Download complete 2025-09-07T09:41:09.0448897Z b4291bccbb84: Verifying Checksum 2025-09-07T09:41:09.0449228Z b4291bccbb84: Download complete 2025-09-07T09:41:09.3527598Z ddc91b09189a: Verifying Checksum 2025-09-07T09:41:09.3528123Z ddc91b09189a: Download complete 2025-09-07T09:41:09.6593707Z 7540c7428627: Verifying Checksum 2025-09-07T09:41:09.6594065Z 7540c7428627: Download complete 2025-09-07T09:41:09.9877767Z 003c4e2598fb: Verifying Checksum 2025-09-07T09:41:09.9878095Z 003c4e2598fb: Download complete 2025-09-07T09:41:10.1688422Z 5687149362ae: Verifying Checksum 2025-09-07T09:41:10.1688703Z 5687149362ae: Download complete 2025-09-07T09:41:10.4723200Z cdd2cf54eb2a: Verifying Checksum 2025-09-07T09:41:10.4723514Z cdd2cf54eb2a: Download complete 2025-09-07T09:41:23.1642566Z 4c92b3f72f1d: Pull complete 2025-09-07T09:41:28.0586545Z 744f9ba90a65: Pull complete 2025-09-07T09:41:31.4437401Z d3c08322a332: Pull complete 2025-09-07T09:41:32.6585494Z df607cfc7c07: Verifying Checksum 2025-09-07T09:41:32.6585867Z df607cfc7c07: Download complete 2025-09-07T09:41:33.1227545Z 3c9055753b4c: Verifying Checksum 2025-09-07T09:41:33.1227871Z 3c9055753b4c: Download complete 2025-09-07T09:41:36.1067022Z ffd43b71f3cc: Pull complete 2025-09-07T09:41:38.0929980Z 31cf8d0bd21c: Verifying Checksum 2025-09-07T09:41:38.0930395Z 31cf8d0bd21c: Download complete 2025-09-07T09:41:41.5651205Z 830692b57f6e: Pull complete 2025-09-07T09:41:45.4474808Z 5bad36d18468: Pull complete 2025-09-07T09:41:53.5940238Z 0e34fdd9ac5c: Pull complete 2025-09-07T09:41:58.0012761Z 3c868a62868e: Pull complete 2025-09-07T09:42:02.4759365Z 62170a22dd57: Pull complete 2025-09-07T09:42:06.7791595Z 553c1d23b6c4: Pull complete 2025-09-07T09:42:10.4460378Z 9408d557a804: Pull complete 2025-09-07T09:42:20.3630837Z 8c21cc3715a2: Verifying Checksum 2025-09-07T09:42:20.3631205Z 8c21cc3715a2: Download complete 2025-09-07T09:42:20.7597347Z 11696c3aa380: Verifying Checksum 2025-09-07T09:42:20.7597675Z 11696c3aa380: Download complete 2025-09-07T09:42:21.1271507Z ef4d544e35ca: Verifying Checksum 2025-09-07T09:42:21.1271875Z ef4d544e35ca: Download complete 2025-09-07T09:42:21.5186571Z 5c5108865e5e: Verifying Checksum 2025-09-07T09:42:21.5186917Z 5c5108865e5e: Download complete 2025-09-07T09:42:21.8146154Z 9e97578e9edf: Verifying Checksum 2025-09-07T09:42:21.8146551Z 9e97578e9edf: Download complete 2025-09-07T09:42:22.2129050Z da5a91b54cb5: Download complete 2025-09-07T09:42:22.3612258Z 1e93be219e89: Verifying Checksum 2025-09-07T09:42:22.3612787Z 1e93be219e89: Download complete 2025-09-07T09:42:22.6815266Z 136825afebb5: Verifying Checksum 2025-09-07T09:42:22.6815649Z 136825afebb5: Download complete 2025-09-07T09:42:22.8474358Z 22b39805302d: Verifying Checksum 2025-09-07T09:42:22.8474658Z 22b39805302d: Download complete 2025-09-07T09:42:23.1492758Z d12add675e35: Verifying Checksum 2025-09-07T09:42:23.1493096Z d12add675e35: Download complete 2025-09-07T09:42:23.5369618Z bc127046d33a: Verifying Checksum 2025-09-07T09:42:23.5370191Z bc127046d33a: Download complete 2025-09-07T09:42:23.7277832Z 951e8ce83841: Verifying Checksum 2025-09-07T09:42:23.7278145Z 951e8ce83841: Download complete 2025-09-07T09:42:23.8991570Z 32340b97ae50: Download complete 2025-09-07T09:42:24.1148828Z 5bbb04cd6b57: Verifying Checksum 2025-09-07T09:42:24.1149184Z 5bbb04cd6b57: Download complete 2025-09-07T09:42:24.4286696Z d8c4b845cfc7: Verifying Checksum 2025-09-07T09:42:24.4287044Z d8c4b845cfc7: Download complete 2025-09-07T09:42:27.2456769Z b35c180f4d8d: Verifying Checksum 2025-09-07T09:42:27.2457134Z b35c180f4d8d: Download complete 2025-09-07T09:42:27.4109019Z 5f967b3c303a: Download complete 2025-09-07T09:42:27.5806062Z 04770904f012: Verifying Checksum 2025-09-07T09:42:27.5806418Z 04770904f012: Download complete 2025-09-07T09:42:27.7682156Z 73373941fb32: Verifying Checksum 2025-09-07T09:42:27.7682463Z 73373941fb32: Download complete 2025-09-07T09:42:28.0920224Z 9572e6cd907b: Verifying Checksum 2025-09-07T09:42:28.0920566Z 9572e6cd907b: Download complete 2025-09-07T09:42:28.2447376Z 64a544aba233: Download complete 2025-09-07T09:42:28.5954707Z 7e35418a2499: Verifying Checksum 2025-09-07T09:42:28.5955255Z 7e35418a2499: Download complete 2025-09-07T09:42:28.7705987Z 2ed8e82748d4: Verifying Checksum 2025-09-07T09:42:28.7706337Z 2ed8e82748d4: Download complete 2025-09-07T09:42:29.4508934Z c988fbcccd70: Verifying Checksum 2025-09-07T09:42:29.4509262Z c988fbcccd70: Download complete 2025-09-07T09:42:43.4479421Z 6623ea814971: Verifying Checksum 2025-09-07T09:42:43.4479746Z 6623ea814971: Download complete 2025-09-07T09:43:06.5279117Z df607cfc7c07: Pull complete 2025-09-07T09:43:11.7646623Z 4f4fb700ef54: Pull complete 2025-09-07T09:43:16.1968586Z 40a8e39faeda: Pull complete 2025-09-07T09:43:21.7816157Z d895771c9fac: Pull complete 2025-09-07T09:43:28.9160311Z c4ee04f39d49: Pull complete 2025-09-07T09:43:34.6889582Z 3690c9826e48: Pull complete 2025-09-07T09:43:42.2312101Z 57cbc5013733: Pull complete 2025-09-07T09:43:48.6004695Z f5f4b06b58bb: Pull complete 2025-09-07T09:43:56.0957589Z f59713ce4bf4: Pull complete 2025-09-07T09:44:04.0357107Z fe0486521517: Pull complete 2025-09-07T09:44:24.4167182Z d3ad4df1ba3a: Verifying Checksum 2025-09-07T09:44:24.4167516Z d3ad4df1ba3a: Download complete 2025-09-07T09:47:32.2674146Z 8c21cc3715a2: Pull complete 2025-09-07T09:47:36.6686518Z d37c58456a6a: Pull complete 2025-09-07T09:47:42.6166959Z d042f63abc13: Pull complete 2025-09-07T09:47:48.1587772Z 621284a9c05a: Pull complete 2025-09-07T09:47:52.5948868Z 85f605d2dd3a: Pull complete 2025-09-07T09:48:03.1756861Z 381b5539e598: Pull complete 2025-09-07T09:48:08.7646657Z a487c0c80029: Pull complete 2025-09-07T09:48:14.9676526Z 48bcb81e2566: Pull complete 2025-09-07T09:54:15.8156542Z e261928c0043: Pull complete 2025-09-07T09:54:20.8161318Z 0fea55428091: Pull complete 2025-09-07T09:54:24.4824401Z b4291bccbb84: Pull complete 2025-09-07T09:54:26.1699751Z ddc91b09189a: Pull complete 2025-09-07T09:54:28.8418482Z 7540c7428627: Pull complete 2025-09-07T10:03:16.6316155Z 003c4e2598fb: Pull complete 2025-09-07T10:03:16.6933330Z 5687149362ae: Pull complete 2025-09-07T10:03:16.7510160Z cdd2cf54eb2a: Pull complete 2025-09-07T10:05:29.2928882Z d3ad4df1ba3a: Pull complete 2025-09-07T10:05:30.2700608Z 3c9055753b4c: Pull complete 2025-09-07T10:05:32.3537154Z 31cf8d0bd21c: Pull complete 2025-09-07T10:06:52.8755897Z 6623ea814971: Pull complete 2025-09-07T10:06:56.8635218Z 11696c3aa380: Pull complete 2025-09-07T10:06:59.9684644Z ef4d544e35ca: Pull complete 2025-09-07T10:07:40.5980028Z 5c5108865e5e: Pull complete 2025-09-07T10:07:46.8980719Z 9e97578e9edf: Pull complete 2025-09-07T10:07:49.1739455Z da5a91b54cb5: Pull complete 2025-09-07T10:07:54.6759968Z 1e93be219e89: Pull complete 2025-09-07T10:07:58.4975894Z 136825afebb5: Pull complete 2025-09-07T10:08:00.6523976Z 22b39805302d: Pull complete 2025-09-07T10:08:11.9854163Z d12add675e35: Pull complete 2025-09-07T10:08:15.7118976Z bc127046d33a: Pull complete 2025-09-07T10:08:22.8872696Z 951e8ce83841: Pull complete 2025-09-07T10:08:26.4161606Z 32340b97ae50: Pull complete 2025-09-07T10:08:34.1312615Z 5bbb04cd6b57: Pull complete 2025-09-07T10:08:37.5084843Z d8c4b845cfc7: Pull complete 2025-09-07T10:08:52.7801964Z b35c180f4d8d: Pull complete 2025-09-07T10:08:54.6294645Z 5f967b3c303a: Pull complete 2025-09-07T10:08:56.8033122Z 04770904f012: Pull complete 2025-09-07T10:08:59.3362816Z 73373941fb32: Pull complete 2025-09-07T10:09:01.7847228Z 9572e6cd907b: Pull complete 2025-09-07T10:09:05.2355098Z 64a544aba233: Pull complete 2025-09-07T10:09:09.6229973Z 7e35418a2499: Pull complete 2025-09-07T10:09:12.8203826Z 2ed8e82748d4: Pull complete 2025-09-07T10:09:15.0024928Z c988fbcccd70: Pull complete 2025-09-07T10:09:16.2282811Z Digest: sha256:f30843ff9ea9e117a2c8e6d207e85c9e77dfe682f1dfcdfea5b94178d1bf00b3 2025-09-07T10:09:16.3348016Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T10:09:16.3450355Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T10:09:16.3533430Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T10:09:16.3534398Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T10:09:16.3549327Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T10:09:16.3549616Z env: 2025-09-07T10:09:16.3549786Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:09:16.3549984Z ##[endgroup] 2025-09-07T10:09:16.3632805Z ##[group]Run echo "GPU_FLAG=--gpus all -e NVIDIA_DRIVER_CAPABILITIES=all" >> "${GITHUB_ENV}" 2025-09-07T10:09:16.3633318Z echo "GPU_FLAG=--gpus all -e NVIDIA_DRIVER_CAPABILITIES=all" >> "${GITHUB_ENV}" 2025-09-07T10:09:16.3647314Z shell: /usr/bin/bash -e {0} 2025-09-07T10:09:16.3647653Z env: 2025-09-07T10:09:16.3647913Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:09:16.3648222Z ##[endgroup] 2025-09-07T10:09:16.3718335Z ##[group]Run echo "SCCACHE_SERVER_PORT_DOCKER_FLAG=-e SCCACHE_SERVER_PORT=$((RUNNER_UID + 4226))" >> "${GITHUB_ENV}" 2025-09-07T10:09:16.3719013Z echo "SCCACHE_SERVER_PORT_DOCKER_FLAG=-e SCCACHE_SERVER_PORT=$((RUNNER_UID + 4226))" >> "${GITHUB_ENV}" 2025-09-07T10:09:16.3731948Z shell: /usr/bin/bash -e {0} 2025-09-07T10:09:16.3732164Z env: 2025-09-07T10:09:16.3732333Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:09:16.3732592Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:09:16.3732852Z ##[endgroup] 2025-09-07T10:09:16.3802701Z Prepare all required actions 2025-09-07T10:09:16.3869291Z ##[group]Run ./.github/actions/get-workflow-job-id 2025-09-07T10:09:16.3869547Z with: 2025-09-07T10:09:16.3870079Z github-token: *** 2025-09-07T10:09:16.3870265Z env: 2025-09-07T10:09:16.3870433Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:09:16.3870692Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:09:16.3871033Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:09:16.3871310Z ##[endgroup] 2025-09-07T10:09:16.5769946Z ##[group]Run set -eux 2025-09-07T10:09:16.5770210Z set -eux 2025-09-07T10:09:16.5770571Z python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}" 2025-09-07T10:09:16.5784007Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T10:09:16.5784296Z env: 2025-09-07T10:09:16.5784460Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:09:16.5784712Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:09:16.5785224Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:09:16.5785989Z GITHUB_TOKEN: *** 2025-09-07T10:09:16.5786187Z ##[endgroup] 2025-09-07T10:09:16.7303685Z + python3 .github/scripts/get_workflow_job_id.py 17525296438 i-0d73070610f53945f-1005 2025-09-07T10:09:17.5661955Z Setting output job-id=49775781837 2025-09-07T10:09:17.5662479Z Setting output job-name=test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100) 2025-09-07T10:09:17.7438664Z ##[group]Run python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84 2025-09-07T10:09:17.7439345Z python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84 2025-09-07T10:09:17.7440160Z python3 -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 & 2025-09-07T10:09:17.7440854Z echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}" 2025-09-07T10:09:17.7456050Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T10:09:17.7456345Z env: 2025-09-07T10:09:17.7456509Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:09:17.7469646Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:09:17.7470010Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:09:17.7470306Z JOB_ID: 49775781837 2025-09-07T10:09:17.7470632Z JOB_NAME: test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100) 2025-09-07T10:09:17.7471013Z WORKFLOW_NAME: inductor-perf-nightly-h100 2025-09-07T10:09:17.7471259Z WORKFLOW_RUN_ID: 17525296438 2025-09-07T10:09:17.7471468Z MONITOR_LOG_INTERVAL: 15 2025-09-07T10:09:17.7471940Z MONITOR_DATA_COLLECT_INTERVAL: 4 2025-09-07T10:09:17.7472154Z ##[endgroup] 2025-09-07T10:09:18.2402225Z Defaulting to user installation because normal site-packages is not writeable 2025-09-07T10:09:19.0642576Z Collecting psutil==5.9.8 2025-09-07T10:09:19.1197286Z Downloading psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (288 kB) 2025-09-07T10:09:19.2352681Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 288.2/288.2 KB 2.5 MB/s eta 0:00:00 2025-09-07T10:09:19.4304884Z Collecting dataclasses_json==0.6.7 2025-09-07T10:09:19.4406291Z Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB) 2025-09-07T10:09:19.5674350Z Collecting nvidia-ml-py==11.525.84 2025-09-07T10:09:19.5789975Z Downloading nvidia_ml_py-11.525.84-py3-none-any.whl (34 kB) 2025-09-07T10:09:20.0324367Z Collecting marshmallow<4.0.0,>=3.18.0 2025-09-07T10:09:20.0424264Z Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB) 2025-09-07T10:09:20.1140953Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50.9/50.9 KB 582.6 kB/s eta 0:00:00 2025-09-07T10:09:20.2098404Z Collecting typing-inspect<1,>=0.4.0 2025-09-07T10:09:20.2198167Z Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB) 2025-09-07T10:09:20.4809012Z Collecting packaging>=17.0 2025-09-07T10:09:20.4910737Z Downloading packaging-25.0-py3-none-any.whl (66 kB) 2025-09-07T10:09:20.5965615Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.5/66.5 KB 542.7 kB/s eta 0:00:00 2025-09-07T10:09:20.6824546Z Collecting mypy-extensions>=0.3.0 2025-09-07T10:09:20.6926243Z Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB) 2025-09-07T10:09:21.9698893Z Collecting typing-extensions>=3.7.4 2025-09-07T10:09:21.9804591Z Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB) 2025-09-07T10:09:22.2271842Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 140.5 kB/s eta 0:00:00 2025-09-07T10:09:22.2906183Z Installing collected packages: nvidia-ml-py, typing-extensions, psutil, packaging, mypy-extensions, typing-inspect, marshmallow, dataclasses_json 2025-09-07T10:09:24.9604433Z Successfully installed dataclasses_json-0.6.7 marshmallow-3.26.1 mypy-extensions-1.1.0 nvidia-ml-py-11.525.84 packaging-25.0 psutil-5.9.8 typing-extensions-4.15.0 typing-inspect-0.9.0 2025-09-07T10:09:25.0175739Z Prepare all required actions 2025-09-07T10:09:25.0176092Z Getting action download info 2025-09-07T10:09:25.1949874Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-09-07T10:09:25.7823539Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093) 2025-09-07T10:09:26.4241027Z ##[group]Run ./.github/actions/download-build-artifacts 2025-09-07T10:09:26.4241302Z with: 2025-09-07T10:09:26.4241507Z name: linux-jammy-cuda12.8-py3.10-gcc9-sm90 2025-09-07T10:09:26.4241768Z s3-bucket: gha-artifacts 2025-09-07T10:09:26.4241962Z env: 2025-09-07T10:09:26.4242122Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:09:26.4242385Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:09:26.4242712Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:09:26.4242988Z ##[endgroup] 2025-09-07T10:09:26.4582816Z ##[group]Run seemethere/download-artifact-s3@v4 2025-09-07T10:09:26.4583087Z with: 2025-09-07T10:09:26.4583281Z name: linux-jammy-cuda12.8-py3.10-gcc9-sm90 2025-09-07T10:09:26.4583537Z s3-bucket: gha-artifacts 2025-09-07T10:09:26.4583742Z region: us-east-1 2025-09-07T10:09:26.4583910Z env: 2025-09-07T10:09:26.4584070Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:09:26.4584327Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:09:26.4584671Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:09:26.4585155Z ##[endgroup] 2025-09-07T10:09:26.8778924Z (node:7959) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-09-07T10:09:26.8779592Z 2025-09-07T10:09:26.8780266Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-09-07T10:09:26.8780990Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-09-07T10:09:26.8781932Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-09-07T10:09:27.0039291Z Found 1 objects with prefix pytorch/pytorch/17525296438/linux-jammy-cuda12.8-py3.10-gcc9-sm90/ 2025-09-07T10:09:27.0039907Z Starting download (1/1): /home/eve/_work/pytorch/pytorch/artifacts.zip 2025-09-07T10:09:36.7003331Z Finished download (1/1): /home/eve/_work/pytorch/pytorch/artifacts.zip 2025-09-07T10:09:36.7011634Z Artifact download has finished successfully 2025-09-07T10:09:36.7818960Z ##[group]Run unzip -o artifacts.zip 2025-09-07T10:09:36.7819267Z unzip -o artifacts.zip 2025-09-07T10:09:36.7833798Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T10:09:36.7834123Z env: 2025-09-07T10:09:36.7834316Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:09:36.7834626Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:09:36.7835219Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:09:36.7835539Z ##[endgroup] 2025-09-07T10:09:36.8330181Z Archive: artifacts.zip 2025-09-07T10:09:36.8331521Z creating: dist/ 2025-09-07T10:09:39.0073689Z inflating: dist/torch-2.9.0a0+git93fb23d-cp310-cp310-linux_x86_64.whl 2025-09-07T10:09:39.0074158Z creating: dist/vision/ 2025-09-07T10:09:39.0183518Z inflating: dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl 2025-09-07T10:09:39.0183918Z creating: dist/audio/ 2025-09-07T10:09:39.0242416Z inflating: dist/audio/torchaudio-2.8.0a0+2e30055-cp310-cp310-linux_x86_64.whl 2025-09-07T10:09:39.0242872Z creating: dist/torchrec/ 2025-09-07T10:09:39.0266651Z inflating: dist/torchrec/torchrec-0.3.2-py3-none-any.whl 2025-09-07T10:09:39.0267009Z creating: dist/fbgemm_gpu/ 2025-09-07T10:09:39.8658774Z inflating: dist/fbgemm_gpu/fbgemm_gpu-0.4.1.post421-cp310-cp310-linux_x86_64.whl 2025-09-07T10:09:39.8659346Z creating: dist/ao/ 2025-09-07T10:09:39.8697523Z inflating: dist/ao/torchao-0.7.0+git51c87b6e-py3-none-any.whl 2025-09-07T10:09:39.8817305Z inflating: dist/.ninja_log 2025-09-07T10:09:39.8817837Z creating: build/custom_test_artifacts/ 2025-09-07T10:09:39.8818806Z creating: build/custom_test_artifacts/custom-op-build/ 2025-09-07T10:09:39.8819317Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2025-09-07T10:09:39.8819859Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/ 2025-09-07T10:09:39.8826313Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml 2025-09-07T10:09:39.8826916Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/ 2025-09-07T10:09:39.8827491Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeSystem.cmake 2025-09-07T10:09:39.8828115Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/ 2025-09-07T10:09:39.8828718Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/tmp/ 2025-09-07T10:09:39.8830529Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/CMakeCCompilerId.c 2025-09-07T10:09:39.8831757Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/a.out 2025-09-07T10:09:39.8832413Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeCCompiler.cmake 2025-09-07T10:09:39.8833035Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/ 2025-09-07T10:09:39.8833641Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/tmp/ 2025-09-07T10:09:39.8835634Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-09-07T10:09:39.8836756Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/a.out 2025-09-07T10:09:39.8837652Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeCXXCompiler.cmake 2025-09-07T10:09:39.8839066Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_C.bin 2025-09-07T10:09:39.8840244Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CXX.bin 2025-09-07T10:09:39.8840892Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/ 2025-09-07T10:09:39.8841472Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/ 2025-09-07T10:09:39.8881089Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-09-07T10:09:39.8921073Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-09-07T10:09:39.8922006Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-09-07T10:09:39.8967224Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-09-07T10:09:39.8968164Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-09-07T10:09:39.8969088Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-09-07T10:09:39.8970017Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-09-07T10:09:39.8970926Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-09-07T10:09:39.8971809Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-09-07T10:09:39.8972702Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-09-07T10:09:39.8973826Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-09-07T10:09:39.8974720Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-09-07T10:09:39.8975715Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-09-07T10:09:39.8976357Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-09-07T10:09:39.8976995Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-09-07T10:09:39.8977636Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-09-07T10:09:39.8978258Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.o 2025-09-07T10:09:39.8978875Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-09-07T10:09:39.9044096Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/a.out 2025-09-07T10:09:39.9044801Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeCUDACompiler.cmake 2025-09-07T10:09:39.9114039Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CUDA.bin 2025-09-07T10:09:39.9114723Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/ 2025-09-07T10:09:39.9115414Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2025-09-07T10:09:39.9116110Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2025-09-07T10:09:39.9116995Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2025-09-07T10:09:39.9117650Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts 2025-09-07T10:09:39.9118394Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make 2025-09-07T10:09:39.9119117Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2025-09-07T10:09:39.9119772Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2025-09-07T10:09:39.9120466Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2025-09-07T10:09:39.9121153Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2025-09-07T10:09:39.9121845Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2025-09-07T10:09:39.9122526Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2025-09-07T10:09:39.9123208Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2025-09-07T10:09:39.9139625Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d 2025-09-07T10:09:39.9324237Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2025-09-07T10:09:39.9324887Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2025-09-07T10:09:39.9325767Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts 2025-09-07T10:09:39.9326543Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make 2025-09-07T10:09:39.9327290Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2025-09-07T10:09:39.9327977Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2025-09-07T10:09:39.9328702Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2025-09-07T10:09:39.9329638Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2025-09-07T10:09:39.9330396Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2025-09-07T10:09:39.9331114Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2025-09-07T10:09:39.9331831Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2025-09-07T10:09:39.9348916Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d 2025-09-07T10:09:39.9422682Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2025-09-07T10:09:39.9423460Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-09-07T10:09:39.9424152Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2025-09-07T10:09:39.9424773Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2025-09-07T10:09:39.9425589Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2025-09-07T10:09:39.9426382Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2025-09-07T10:09:39.9427041Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/InstallScripts.json 2025-09-07T10:09:39.9427656Z inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc 2025-09-07T10:09:39.9429363Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2025-09-07T10:09:39.9430059Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2025-09-07T10:09:39.9430945Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2025-09-07T10:09:39.9590026Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2025-09-07T10:09:39.9640680Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2025-09-07T10:09:39.9641152Z creating: build/custom_test_artifacts/jit-hook-build/ 2025-09-07T10:09:39.9641592Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2025-09-07T10:09:39.9642112Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/ 2025-09-07T10:09:39.9648711Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml 2025-09-07T10:09:39.9649207Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/ 2025-09-07T10:09:39.9649702Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeSystem.cmake 2025-09-07T10:09:39.9650222Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/ 2025-09-07T10:09:39.9650732Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/tmp/ 2025-09-07T10:09:39.9652595Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/CMakeCCompilerId.c 2025-09-07T10:09:39.9653765Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/a.out 2025-09-07T10:09:39.9654444Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeCCompiler.cmake 2025-09-07T10:09:39.9655190Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/ 2025-09-07T10:09:39.9655770Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/tmp/ 2025-09-07T10:09:39.9657699Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-09-07T10:09:39.9658851Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/a.out 2025-09-07T10:09:39.9659690Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeCXXCompiler.cmake 2025-09-07T10:09:39.9661374Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_C.bin 2025-09-07T10:09:39.9662376Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CXX.bin 2025-09-07T10:09:39.9662941Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/ 2025-09-07T10:09:39.9663439Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/ 2025-09-07T10:09:39.9703366Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-09-07T10:09:39.9742873Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-09-07T10:09:39.9743811Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-09-07T10:09:39.9788515Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-09-07T10:09:39.9789417Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-09-07T10:09:39.9790310Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-09-07T10:09:39.9791229Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-09-07T10:09:39.9792115Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-09-07T10:09:39.9792982Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-09-07T10:09:39.9794049Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-09-07T10:09:39.9794925Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-09-07T10:09:39.9795913Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-09-07T10:09:39.9796692Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-09-07T10:09:39.9797365Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-09-07T10:09:39.9798028Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-09-07T10:09:39.9798695Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-09-07T10:09:39.9799347Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.o 2025-09-07T10:09:39.9799999Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-09-07T10:09:39.9865566Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/a.out 2025-09-07T10:09:39.9866203Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeCUDACompiler.cmake 2025-09-07T10:09:39.9932655Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CUDA.bin 2025-09-07T10:09:39.9933343Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/ 2025-09-07T10:09:39.9933882Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2025-09-07T10:09:39.9934438Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2025-09-07T10:09:39.9935195Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2025-09-07T10:09:39.9935874Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts 2025-09-07T10:09:39.9936856Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make 2025-09-07T10:09:39.9937594Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2025-09-07T10:09:39.9938274Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2025-09-07T10:09:39.9938968Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2025-09-07T10:09:39.9939676Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2025-09-07T10:09:39.9940419Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2025-09-07T10:09:39.9941132Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2025-09-07T10:09:39.9941901Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2025-09-07T10:09:39.9958429Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d 2025-09-07T10:09:40.0018198Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2025-09-07T10:09:40.0018970Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-09-07T10:09:40.0019658Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2025-09-07T10:09:40.0020272Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2025-09-07T10:09:40.0020844Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2025-09-07T10:09:40.0021683Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2025-09-07T10:09:40.0022279Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/InstallScripts.json 2025-09-07T10:09:40.0022866Z inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc 2025-09-07T10:09:40.0024780Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2025-09-07T10:09:40.0025589Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2025-09-07T10:09:40.0026777Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2025-09-07T10:09:40.0060982Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2025-09-07T10:09:40.0062015Z creating: build/custom_test_artifacts/custom-backend-build/ 2025-09-07T10:09:40.0062784Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2025-09-07T10:09:40.0063692Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/ 2025-09-07T10:09:40.0068915Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml 2025-09-07T10:09:40.0069581Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/ 2025-09-07T10:09:40.0070213Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeSystem.cmake 2025-09-07T10:09:40.0070876Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/ 2025-09-07T10:09:40.0071522Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/tmp/ 2025-09-07T10:09:40.0072538Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/CMakeCCompilerId.c 2025-09-07T10:09:40.0073816Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/a.out 2025-09-07T10:09:40.0074513Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeCCompiler.cmake 2025-09-07T10:09:40.0075358Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/ 2025-09-07T10:09:40.0076014Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/tmp/ 2025-09-07T10:09:40.0078197Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-09-07T10:09:40.0078939Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/a.out 2025-09-07T10:09:40.0079766Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeCXXCompiler.cmake 2025-09-07T10:09:40.0081406Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_C.bin 2025-09-07T10:09:40.0082364Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CXX.bin 2025-09-07T10:09:40.0083048Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/ 2025-09-07T10:09:40.0083671Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/ 2025-09-07T10:09:40.0123978Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-09-07T10:09:40.0163526Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-09-07T10:09:40.0165345Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-09-07T10:09:40.0209320Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-09-07T10:09:40.0210906Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-09-07T10:09:40.0212478Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-09-07T10:09:40.0214520Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-09-07T10:09:40.0216201Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-09-07T10:09:40.0216968Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-09-07T10:09:40.0217736Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-09-07T10:09:40.0218496Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-09-07T10:09:40.0219245Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-09-07T10:09:40.0219952Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-09-07T10:09:40.0220635Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-09-07T10:09:40.0221379Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-09-07T10:09:40.0222061Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-09-07T10:09:40.0222708Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.o 2025-09-07T10:09:40.0223370Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-09-07T10:09:40.0286067Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/a.out 2025-09-07T10:09:40.0287272Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeCUDACompiler.cmake 2025-09-07T10:09:40.0352886Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CUDA.bin 2025-09-07T10:09:40.0353686Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/ 2025-09-07T10:09:40.0354187Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2025-09-07T10:09:40.0354728Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2025-09-07T10:09:40.0355443Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2025-09-07T10:09:40.0356659Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts 2025-09-07T10:09:40.0357961Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make 2025-09-07T10:09:40.0359222Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2025-09-07T10:09:40.0360449Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2025-09-07T10:09:40.0361683Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2025-09-07T10:09:40.0362920Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2025-09-07T10:09:40.0364139Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2025-09-07T10:09:40.0365660Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2025-09-07T10:09:40.0366478Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2025-09-07T10:09:40.0367145Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d 2025-09-07T10:09:40.0473829Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2025-09-07T10:09:40.0474590Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2025-09-07T10:09:40.0475582Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts 2025-09-07T10:09:40.0477146Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make 2025-09-07T10:09:40.0478452Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2025-09-07T10:09:40.0479688Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2025-09-07T10:09:40.0480946Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2025-09-07T10:09:40.0482236Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2025-09-07T10:09:40.0503714Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2025-09-07T10:09:40.0504662Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2025-09-07T10:09:40.0505569Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2025-09-07T10:09:40.0506342Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d 2025-09-07T10:09:40.0547929Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2025-09-07T10:09:40.0548659Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-09-07T10:09:40.0549279Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2025-09-07T10:09:40.0549826Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2025-09-07T10:09:40.0550574Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2025-09-07T10:09:40.0551124Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2025-09-07T10:09:40.0551680Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/InstallScripts.json 2025-09-07T10:09:40.0552222Z inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc 2025-09-07T10:09:40.0554269Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2025-09-07T10:09:40.0555236Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2025-09-07T10:09:40.0555812Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2025-09-07T10:09:40.0645866Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2025-09-07T10:09:40.0680851Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2025-09-07T10:09:40.0681307Z creating: build/lib/ 2025-09-07T10:09:40.0759467Z inflating: build/lib/libprotobuf-lite.a 2025-09-07T10:09:40.1165721Z inflating: build/lib/libprotobuf.a 2025-09-07T10:09:40.1174079Z inflating: build/lib/libpthreadpool.a 2025-09-07T10:09:40.1181055Z inflating: build/lib/libcpuinfo.a 2025-09-07T10:09:40.1630141Z inflating: build/lib/libprotoc.a 2025-09-07T10:09:40.1637103Z inflating: build/lib/libcpuinfo_internals.a 2025-09-07T10:09:40.1637825Z inflating: build/lib/libclog.a 2025-09-07T10:09:40.1639786Z inflating: build/lib/libnnpack_reference_layers.a 2025-09-07T10:09:40.1656973Z inflating: build/lib/libpytorch_qnnpack.a 2025-09-07T10:09:40.1816807Z inflating: build/lib/libmicrokernels-prod.a 2025-09-07T10:09:40.1832704Z inflating: build/lib/libnnpack.a 2025-09-07T10:09:40.2537645Z inflating: build/lib/libmicrokernels-all.a 2025-09-07T10:09:40.2598679Z inflating: build/lib/libgtest.a 2025-09-07T10:09:40.2613787Z inflating: build/lib/libgmock.a 2025-09-07T10:09:40.2614333Z inflating: build/lib/libgmock_main.a 2025-09-07T10:09:40.2615135Z inflating: build/lib/libgtest_main.a 2025-09-07T10:09:40.2683619Z inflating: build/lib/libbenchmark.a 2025-09-07T10:09:40.2684163Z inflating: build/lib/libbenchmark_main.a 2025-09-07T10:09:40.2765760Z inflating: build/lib/libXNNPACK.a 2025-09-07T10:09:40.2766524Z inflating: build/lib/libjitprofiling.a 2025-09-07T10:09:40.2773624Z inflating: build/lib/libittnotify.a 2025-09-07T10:09:40.2831667Z inflating: build/lib/libasmjit.a 2025-09-07T10:09:40.4097132Z inflating: build/lib/libfbgemm.a 2025-09-07T10:09:40.4124691Z inflating: build/lib/libtensorpipe_uv.a 2025-09-07T10:09:40.4632011Z inflating: build/lib/libtensorpipe.a 2025-09-07T10:09:40.4861846Z inflating: build/lib/libtensorpipe_cuda.a 2025-09-07T10:09:40.4980458Z inflating: build/lib/libgloo.a 2025-09-07T10:09:40.5024300Z inflating: build/lib/libonnx_proto.a 2025-09-07T10:09:40.5678352Z inflating: build/lib/libonnx.a 2025-09-07T10:09:40.6087075Z inflating: build/lib/libgloo_cuda.a 2025-09-07T10:09:40.6103780Z inflating: build/lib/libfmt.a 2025-09-07T10:09:41.5560025Z inflating: build/lib/libdnnl.a 2025-09-07T10:09:41.5980054Z inflating: build/lib/libkineto.a 2025-09-07T10:09:41.6082903Z inflating: build/lib/libc10.so 2025-09-07T10:09:41.6084086Z inflating: build/lib/libtorch_global_deps.so 2025-09-07T10:09:41.6085840Z inflating: build/lib/libcaffe2_nvrtc.so 2025-09-07T10:09:41.6141327Z inflating: build/lib/libc10_cuda.so 2025-09-07T10:09:44.7484336Z inflating: build/lib/libtorch_cpu.so 2025-09-07T10:09:44.8161752Z inflating: build/lib/libtorch_nvshmem.so 2025-09-07T10:09:46.6785547Z inflating: build/lib/libtorch_cuda.so 2025-09-07T10:09:46.6786580Z inflating: build/lib/libtorch.so 2025-09-07T10:09:46.6832054Z inflating: build/lib/libtorch_cuda_linalg.so 2025-09-07T10:09:46.6895559Z inflating: build/lib/libtorchbind_test.so 2025-09-07T10:09:46.6913611Z inflating: build/lib/libjitbackend_test.so 2025-09-07T10:09:46.6935399Z inflating: build/lib/libbackend_with_compiler.so 2025-09-07T10:09:46.6960103Z inflating: build/lib/libaoti_custom_ops.so 2025-09-07T10:09:46.6962432Z inflating: build/lib/libc10d_cuda_test.so 2025-09-07T10:09:46.6966206Z inflating: build/lib/libshm.so 2025-09-07T10:09:46.8985908Z inflating: build/lib/libtorch_python.so 2025-09-07T10:09:46.9018246Z inflating: build/lib/libnnapi_backend.so 2025-09-07T10:09:46.9019037Z creating: build/bin/ 2025-09-07T10:09:46.9420873Z inflating: build/bin/protoc-3.13.0.0 2025-09-07T10:09:46.9822270Z inflating: build/bin/protoc 2025-09-07T10:09:46.9872790Z inflating: build/bin/c10_AllocatorConfig_test 2025-09-07T10:09:46.9921074Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2025-09-07T10:09:46.9970328Z inflating: build/bin/c10_Device_test 2025-09-07T10:09:47.0017013Z inflating: build/bin/c10_StreamGuard_test 2025-09-07T10:09:47.0071314Z inflating: build/bin/c10_SymInt_test 2025-09-07T10:09:47.0120922Z inflating: build/bin/c10_DeviceGuard_test 2025-09-07T10:09:47.0177765Z inflating: build/bin/c10_DispatchKeySet_test 2025-09-07T10:09:47.0231354Z inflating: build/bin/c10_SizesAndStrides_test 2025-09-07T10:09:47.0282733Z inflating: build/bin/c10_InlineDeviceGuard_test 2025-09-07T10:09:47.0351697Z inflating: build/bin/c10_cow_test 2025-09-07T10:09:47.0402995Z inflating: build/bin/c10_Scalar_test 2025-09-07T10:09:47.0456396Z inflating: build/bin/c10_InlineStreamGuard_test 2025-09-07T10:09:47.0506567Z inflating: build/bin/c10_Bitset_test 2025-09-07T10:09:47.0553965Z inflating: build/bin/c10_ArrayRef_test 2025-09-07T10:09:47.0608935Z inflating: build/bin/c10_Enumerate_test 2025-09-07T10:09:47.0655569Z inflating: build/bin/c10_ConstexprCrc_test 2025-09-07T10:09:47.0703044Z inflating: build/bin/c10_DeadlockDetection_test 2025-09-07T10:09:47.0751044Z inflating: build/bin/c10_Half_test 2025-09-07T10:09:47.0804729Z inflating: build/bin/c10_LeftRight_test 2025-09-07T10:09:47.0855399Z inflating: build/bin/c10_IntrusiveList_test 2025-09-07T10:09:47.0907537Z inflating: build/bin/c10_Metaprogramming_test 2025-09-07T10:09:47.0960531Z inflating: build/bin/c10_NetworkFlow_test 2025-09-07T10:09:47.1008020Z inflating: build/bin/c10_Semaphore_test 2025-09-07T10:09:47.1055938Z inflating: build/bin/c10_Synchronized_test 2025-09-07T10:09:47.1108525Z inflating: build/bin/c10_ThreadLocal_test 2025-09-07T10:09:47.1158773Z inflating: build/bin/c10_TypeIndex_test 2025-09-07T10:09:47.1207481Z inflating: build/bin/c10_TypeList_test 2025-09-07T10:09:47.1256716Z inflating: build/bin/c10_accumulate_test 2025-09-07T10:09:47.1309666Z inflating: build/bin/c10_bfloat16_test 2025-09-07T10:09:47.1356383Z inflating: build/bin/c10_TypeTraits_test 2025-09-07T10:09:47.1404915Z inflating: build/bin/c10_bit_cast_test 2025-09-07T10:09:47.1459321Z inflating: build/bin/c10_complex_math_test 2025-09-07T10:09:47.1509617Z inflating: build/bin/c10_exception_test 2025-09-07T10:09:47.1560758Z inflating: build/bin/c10_generic_math_test 2025-09-07T10:09:47.1612637Z inflating: build/bin/c10_complex_test 2025-09-07T10:09:47.1661779Z inflating: build/bin/c10_irange_test 2025-09-07T10:09:47.1715894Z inflating: build/bin/c10_logging_test 2025-09-07T10:09:47.1766847Z inflating: build/bin/c10_lazy_test 2025-09-07T10:09:47.1815892Z inflating: build/bin/c10_flags_test 2025-09-07T10:09:47.1969288Z inflating: build/bin/c10_intrusive_ptr_test 2025-09-07T10:09:47.2016685Z inflating: build/bin/c10_error_test 2025-09-07T10:09:47.2060035Z inflating: build/bin/c10_intrusive_ptr_benchmark 2025-09-07T10:09:47.2110885Z inflating: build/bin/c10_registry_test 2025-09-07T10:09:47.2161893Z inflating: build/bin/c10_tempfile_test 2025-09-07T10:09:47.2215862Z inflating: build/bin/c10_string_util_test 2025-09-07T10:09:47.2359202Z inflating: build/bin/c10_small_vector_test 2025-09-07T10:09:47.2405728Z inflating: build/bin/c10_string_view_test 2025-09-07T10:09:47.2455455Z inflating: build/bin/c10_ssize_test 2025-09-07T10:09:47.2514184Z inflating: build/bin/c10_ordered_preserving_dict_test 2025-09-07T10:09:47.2585447Z inflating: build/bin/c10_optional_test 2025-09-07T10:09:47.2638899Z inflating: build/bin/c10_typeid_test 2025-09-07T10:09:47.2686020Z inflating: build/bin/c10_cuda_CUDATest 2025-09-07T10:09:47.3232042Z inflating: build/bin/vec_test_all_types_DEFAULT 2025-09-07T10:09:47.3796689Z inflating: build/bin/vec_test_all_types_AVX2 2025-09-07T10:09:47.4350227Z inflating: build/bin/vec_test_all_types_AVX512 2025-09-07T10:09:47.4400733Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_stream 2025-09-07T10:09:47.4451144Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_1_var_test 2025-09-07T10:09:47.4501786Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads 2025-09-07T10:09:47.4551331Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block 2025-09-07T10:09:47.4601338Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_from_2_processes 2025-09-07T10:09:47.4651737Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device 2025-09-07T10:09:47.4704367Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks 2025-09-07T10:09:47.4754276Z inflating: build/bin/BackoffTest 2025-09-07T10:09:47.4808073Z inflating: build/bin/TCPStoreTest 2025-09-07T10:09:47.4859077Z inflating: build/bin/HashStoreTest 2025-09-07T10:09:47.4909485Z inflating: build/bin/FileStoreTest 2025-09-07T10:09:47.4922371Z inflating: build/bin/ProcessGroupMPITest 2025-09-07T10:09:47.4925150Z inflating: build/bin/example_allreduce 2025-09-07T10:09:47.4995322Z inflating: build/bin/Dict_test 2025-09-07T10:09:47.5045523Z inflating: build/bin/Dimname_test 2025-09-07T10:09:47.5100289Z inflating: build/bin/NamedTensor_test 2025-09-07T10:09:47.5161994Z inflating: build/bin/MaybeOwned_test 2025-09-07T10:09:47.5218141Z inflating: build/bin/atest 2025-09-07T10:09:47.5278741Z inflating: build/bin/basic 2025-09-07T10:09:47.5337826Z inflating: build/bin/apply_utils_test 2025-09-07T10:09:47.5390053Z inflating: build/bin/broadcast_test 2025-09-07T10:09:47.5439250Z inflating: build/bin/cpu_allocator_test 2025-09-07T10:09:47.5494736Z inflating: build/bin/cpu_generator_test 2025-09-07T10:09:47.5545875Z inflating: build/bin/cpu_profiling_allocator_test 2025-09-07T10:09:47.5632613Z inflating: build/bin/cpu_rng_test 2025-09-07T10:09:47.5682076Z inflating: build/bin/dlconvertor_test 2025-09-07T10:09:47.5737409Z inflating: build/bin/extension_backend_test 2025-09-07T10:09:47.5789895Z inflating: build/bin/half_test 2025-09-07T10:09:47.5838208Z inflating: build/bin/lazy_tensor_test 2025-09-07T10:09:47.5890066Z inflating: build/bin/memory_format_test 2025-09-07T10:09:47.5944240Z inflating: build/bin/math_kernel_test 2025-09-07T10:09:47.6034610Z inflating: build/bin/ivalue_test 2025-09-07T10:09:47.6085964Z inflating: build/bin/memory_overlapping_test 2025-09-07T10:09:47.6136606Z inflating: build/bin/mobile_memory_cleanup 2025-09-07T10:09:47.6189832Z inflating: build/bin/native_test 2025-09-07T10:09:47.6238776Z inflating: build/bin/operator_name_test 2025-09-07T10:09:47.6287728Z inflating: build/bin/operators_test 2025-09-07T10:09:47.6337732Z inflating: build/bin/packedtensoraccessor_test 2025-09-07T10:09:47.6401093Z inflating: build/bin/pow_test 2025-09-07T10:09:47.6455823Z inflating: build/bin/quantized_test 2025-09-07T10:09:47.6503376Z inflating: build/bin/reduce_ops_test 2025-09-07T10:09:47.6555343Z inflating: build/bin/reportMemoryUsage_test 2025-09-07T10:09:47.6609749Z inflating: build/bin/scalar_tensor_test 2025-09-07T10:09:47.6666105Z inflating: build/bin/scalar_test 2025-09-07T10:09:47.6715078Z inflating: build/bin/StorageUtils_test 2025-09-07T10:09:47.6765522Z inflating: build/bin/stride_properties_test 2025-09-07T10:09:47.6840370Z inflating: build/bin/tensor_iterator_test 2025-09-07T10:09:47.6892440Z inflating: build/bin/test_parallel 2025-09-07T10:09:47.6941005Z inflating: build/bin/thread_init_test 2025-09-07T10:09:47.6993610Z inflating: build/bin/type_ptr_test 2025-09-07T10:09:47.7050740Z inflating: build/bin/type_test 2025-09-07T10:09:47.7100644Z inflating: build/bin/undefined_tensor_test 2025-09-07T10:09:47.7150878Z inflating: build/bin/verify_api_visibility 2025-09-07T10:09:47.7217273Z inflating: build/bin/legacy_vmap_test 2025-09-07T10:09:47.7266583Z inflating: build/bin/weakref_test 2025-09-07T10:09:47.7315878Z inflating: build/bin/xla_tensor_test 2025-09-07T10:09:47.7365288Z inflating: build/bin/wrapdim_test 2025-09-07T10:09:47.7421908Z inflating: build/bin/IListRef_test 2025-09-07T10:09:47.7535208Z inflating: build/bin/kernel_function_legacy_test 2025-09-07T10:09:47.7635572Z inflating: build/bin/List_test 2025-09-07T10:09:47.7701364Z inflating: build/bin/KernelFunction_test 2025-09-07T10:09:47.7791328Z inflating: build/bin/kernel_function_test 2025-09-07T10:09:47.7910344Z inflating: build/bin/kernel_lambda_legacy_test 2025-09-07T10:09:47.8007188Z inflating: build/bin/kernel_lambda_test 2025-09-07T10:09:47.8065801Z inflating: build/bin/kernel_stackbased_test 2025-09-07T10:09:47.8155333Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2025-09-07T10:09:47.8204571Z inflating: build/bin/CppSignature_test 2025-09-07T10:09:47.8258124Z inflating: build/bin/backend_fallback_test 2025-09-07T10:09:47.8307887Z inflating: build/bin/op_allowlist_test 2025-09-07T10:09:47.8584362Z inflating: build/bin/op_registration_test 2025-09-07T10:09:47.8647216Z inflating: build/bin/inline_container_test 2025-09-07T10:09:47.8696714Z inflating: build/bin/cuda_allocator_test 2025-09-07T10:09:47.8747131Z inflating: build/bin/cuda_apply_test 2025-09-07T10:09:47.8806309Z inflating: build/bin/cuda_atomic_ops_test 2025-09-07T10:09:47.8860347Z inflating: build/bin/cuda_caching_host_allocator_test 2025-09-07T10:09:47.8927383Z inflating: build/bin/cuda_complex_math_test 2025-09-07T10:09:47.8984021Z inflating: build/bin/cuda_complex_test 2025-09-07T10:09:47.9043189Z inflating: build/bin/cuda_cub_test 2025-09-07T10:09:47.9090976Z inflating: build/bin/cuda_device_test 2025-09-07T10:09:47.9152575Z inflating: build/bin/cuda_distributions_test 2025-09-07T10:09:47.9202933Z inflating: build/bin/cuda_dlconvertor_test 2025-09-07T10:09:47.9251242Z inflating: build/bin/cuda_exchange_device_test 2025-09-07T10:09:47.9299218Z inflating: build/bin/cuda_half_test 2025-09-07T10:09:47.9353116Z inflating: build/bin/cuda_generator_test 2025-09-07T10:09:47.9405265Z inflating: build/bin/cuda_integer_divider_test 2025-09-07T10:09:47.9453131Z inflating: build/bin/cuda_optional_test 2025-09-07T10:09:47.9502554Z inflating: build/bin/cuda_packedtensoraccessor_test 2025-09-07T10:09:47.9552583Z inflating: build/bin/cuda_reportMemoryUsage_test 2025-09-07T10:09:47.9600502Z inflating: build/bin/cuda_allocatorTraceTracker_test 2025-09-07T10:09:47.9657857Z inflating: build/bin/cuda_stream_test 2025-09-07T10:09:47.9705526Z inflating: build/bin/cuda_cudnn_test 2025-09-07T10:09:47.9755262Z inflating: build/bin/cuda_vectorized_test 2025-09-07T10:09:48.0104224Z inflating: build/bin/test_nativert 2025-09-07T10:09:48.0156657Z inflating: build/bin/test_dist_autograd 2025-09-07T10:09:48.0222146Z inflating: build/bin/test_cpp_rpc 2025-09-07T10:09:48.4126554Z inflating: build/bin/test_api 2025-09-07T10:09:48.4128698Z inflating: build/bin/parallel_benchmark 2025-09-07T10:09:48.4190977Z inflating: build/bin/ProcessGroupGlooTest 2025-09-07T10:09:48.4251691Z inflating: build/bin/ProcessGroupNCCLTest 2025-09-07T10:09:48.4306417Z inflating: build/bin/ProcessGroupGlooAsyncTest 2025-09-07T10:09:48.4365328Z inflating: build/bin/ProcessGroupNCCLErrorsTest 2025-09-07T10:09:48.5372351Z inflating: build/bin/test_jit 2025-09-07T10:09:48.5694790Z inflating: build/bin/test_lazy 2025-09-07T10:09:48.5698453Z inflating: build/bin/torch_shm_manager 2025-09-07T10:09:48.5698976Z creating: .additional_ci_files/ 2025-09-07T10:09:48.5779957Z inflating: .additional_ci_files/test-times.json 2025-09-07T10:09:48.6085723Z inflating: .additional_ci_files/test-class-times.json 2025-09-07T10:09:48.6332123Z ##[group]Run rm artifacts.zip 2025-09-07T10:09:48.6332377Z rm artifacts.zip 2025-09-07T10:09:48.6347176Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T10:09:48.6347476Z env: 2025-09-07T10:09:48.6347646Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:09:48.6347898Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:09:48.6348268Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:09:48.6348545Z ##[endgroup] 2025-09-07T10:09:52.4131313Z ##[group]Run df -H 2025-09-07T10:09:52.4131522Z df -H 2025-09-07T10:09:52.4150328Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T10:09:52.4150630Z env: 2025-09-07T10:09:52.4150794Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:09:52.4151058Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:09:52.4151394Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:09:52.4151681Z ##[endgroup] 2025-09-07T10:09:52.4532914Z Filesystem Size Used Avail Use% Mounted on 2025-09-07T10:09:52.4533299Z overlay 7.3T 560G 6.8T 8% / 2025-09-07T10:09:52.4534011Z tmpfs 68M 0 68M 0% /dev 2025-09-07T10:09:52.4534316Z shm 68M 0 68M 0% /dev/shm 2025-09-07T10:09:52.4534642Z /dev/root 7.3T 560G 6.8T 8% /home/eve/_work 2025-09-07T10:09:52.4535253Z tmpfs 215G 111k 215G 1% /run/docker.sock 2025-09-07T10:09:52.4535633Z tmpfs 1.1T 13k 1.1T 1% /proc/driver/nvidia 2025-09-07T10:09:52.4536073Z tmpfs 430G 2.9M 430G 1% /run/.ro3583804320/nvidia-persistenced/socket 2025-09-07T10:09:52.4536501Z tmpfs 1.1T 0 1.1T 0% /proc/acpi 2025-09-07T10:09:52.4536820Z tmpfs 1.1T 0 1.1T 0% /proc/scsi 2025-09-07T10:09:52.4537124Z tmpfs 1.1T 0 1.1T 0% /sys/firmware 2025-09-07T10:09:52.4564424Z Prepare all required actions 2025-09-07T10:09:52.4565433Z Getting action download info 2025-09-07T10:09:52.6781446Z ##[group]Run ./.github/actions/download-td-artifacts 2025-09-07T10:09:52.6781718Z with: 2025-09-07T10:09:52.6781869Z env: 2025-09-07T10:09:52.6782041Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:09:52.6782296Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:09:52.6782640Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:09:52.6782928Z ##[endgroup] 2025-09-07T10:09:52.7543663Z ##[group]Run seemethere/download-artifact-s3@v4 2025-09-07T10:09:52.7543942Z with: 2025-09-07T10:09:52.7544110Z name: td_results 2025-09-07T10:09:52.7544297Z s3-bucket: gha-artifacts 2025-09-07T10:09:52.7544502Z region: us-east-1 2025-09-07T10:09:52.7544665Z env: 2025-09-07T10:09:52.7544831Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:09:52.7545251Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:09:52.7545613Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:09:52.7545909Z ##[endgroup] 2025-09-07T10:09:53.4464039Z (node:7982) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-09-07T10:09:53.4464764Z 2025-09-07T10:09:53.4465615Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-09-07T10:09:53.4466440Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-09-07T10:09:53.4467250Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-09-07T10:09:53.5631438Z Found 0 objects with prefix pytorch/pytorch/17525296438/td_results/ 2025-09-07T10:09:53.5637379Z Artifact download has finished successfully 2025-09-07T10:09:53.6120706Z ##[group]Run mkdir -p .additional_ci_files 2025-09-07T10:09:53.6121013Z mkdir -p .additional_ci_files 2025-09-07T10:09:53.6121365Z mv td_results.json .additional_ci_files/td_results.json || true 2025-09-07T10:09:53.6135903Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T10:09:53.6136198Z env: 2025-09-07T10:09:53.6136368Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:09:53.6136622Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:09:53.6136965Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:09:53.6137251Z ##[endgroup] 2025-09-07T10:09:53.6565960Z mv: cannot stat 'td_results.json': No such file or directory 2025-09-07T10:09:53.7242754Z ##[group]Run .github/scripts/parse_ref.py 2025-09-07T10:09:53.7243076Z .github/scripts/parse_ref.py 2025-09-07T10:09:53.7256905Z shell: /usr/bin/bash -e {0} 2025-09-07T10:09:53.7257116Z env: 2025-09-07T10:09:53.7257275Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:09:53.7257526Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:09:53.7257874Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:09:53.7258156Z ##[endgroup] 2025-09-07T10:09:53.7766392Z Setting output branch=main 2025-09-07T10:09:53.7864221Z Prepare all required actions 2025-09-07T10:09:53.7864553Z Getting action download info 2025-09-07T10:09:53.9934449Z ##[group]Run ./.github/actions/filter-test-configs 2025-09-07T10:09:53.9934733Z with: 2025-09-07T10:09:53.9935453Z github-token: *** 2025-09-07T10:09:53.9940726Z test-matrix: {"include": [{"config": "inductor_huggingface_perf_cuda_h100", "shard": 1, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 2, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 3, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 4, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 5, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 1, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 2, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 3, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 4, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 5, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 6, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 7, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 1, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 2, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 3, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 4, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 5, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 6, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 7, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 8, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 9, "num_shards": 9, "runner": "linux.aws.h100"}]} 2025-09-07T10:09:53.9945917Z job-name: test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100) 2025-09-07T10:09:53.9946250Z env: 2025-09-07T10:09:53.9946416Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:09:53.9946667Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:09:53.9946994Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:09:53.9947274Z ##[endgroup] 2025-09-07T10:09:54.0348770Z ##[group]Run nick-fields/retry@v3.0.0 2025-09-07T10:09:54.0349020Z with: 2025-09-07T10:09:54.0349168Z shell: bash 2025-09-07T10:09:54.0349336Z timeout_minutes: 10 2025-09-07T10:09:54.0349517Z max_attempts: 5 2025-09-07T10:09:54.0349705Z retry_wait_seconds: 30 2025-09-07T10:09:54.0350274Z command: set -eux # PyYAML 6.0 doesn't work with MacOS x86 anymore # This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2 python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-09-07T10:09:54.0350879Z polling_interval_seconds: 1 2025-09-07T10:09:54.0351103Z warning_on_retry: true 2025-09-07T10:09:54.0351310Z continue_on_error: false 2025-09-07T10:09:54.0351498Z env: 2025-09-07T10:09:54.0351657Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:09:54.0351907Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:09:54.0352250Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:09:54.0352679Z GITHUB_TOKEN: *** 2025-09-07T10:09:54.0352857Z ##[endgroup] 2025-09-07T10:09:54.1081154Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-09-07T10:09:54.3765896Z Defaulting to user installation because normal site-packages is not writeable 2025-09-07T10:09:54.9942754Z Collecting requests==2.27.1 2025-09-07T10:09:55.0499618Z Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB) 2025-09-07T10:09:55.6712147Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.1/63.1 KB 85.5 kB/s eta 0:00:00 2025-09-07T10:09:56.2093383Z Collecting pyyaml==6.0.2 2025-09-07T10:09:56.2194669Z Downloading PyYAML-6.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (751 kB) 2025-09-07T10:09:56.7123002Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 751.2/751.2 KB 1.5 MB/s eta 0:00:00 2025-09-07T10:09:56.7234157Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3/dist-packages (from requests==2.27.1) (1.26.5) 2025-09-07T10:09:56.7244524Z Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests==2.27.1) (3.3) 2025-09-07T10:09:57.1986620Z Collecting charset-normalizer~=2.0.0 2025-09-07T10:09:57.2087092Z Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB) 2025-09-07T10:09:57.5870351Z Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests==2.27.1) (2020.6.20) 2025-09-07T10:09:57.6555882Z Installing collected packages: pyyaml, charset-normalizer, requests 2025-09-07T10:09:58.2623841Z WARNING: The script normalizer is installed in '/home/eve/.local/bin' which is not on PATH. 2025-09-07T10:09:58.2624605Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2025-09-07T10:09:59.8021642Z Successfully installed charset-normalizer-2.0.12 pyyaml-6.0.2 requests-2.27.1 2025-09-07T10:10:00.1092835Z Command completed after 1 attempt(s). 2025-09-07T10:10:00.1176275Z ##[group]Run set -x 2025-09-07T10:10:00.1176488Z set -x 2025-09-07T10:10:00.1176656Z  2025-09-07T10:10:00.1176952Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-09-07T10:10:00.1177307Z # in runner workspace 2025-09-07T10:10:00.1177591Z python3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py" 2025-09-07T10:10:00.1192119Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T10:10:00.1192417Z env: 2025-09-07T10:10:00.1192596Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:10:00.1192848Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:10:00.1193189Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:10:00.1193482Z ##[endgroup] 2025-09-07T10:10:00.1724629Z + python3 /home/eve/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py 2025-09-07T10:10:00.1871706Z Setting output branch=main 2025-09-07T10:10:00.2572564Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}" 2025-09-07T10:10:00.2572906Z echo "Workflow: ${GITHUB_WORKFLOW}" 2025-09-07T10:10:00.2573160Z echo "Job name: ${JOB_NAME}" 2025-09-07T10:10:00.2573395Z  2025-09-07T10:10:00.2573707Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-09-07T10:10:00.2574066Z # in runner workspace 2025-09-07T10:10:00.2574386Z python3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \ 2025-09-07T10:10:00.2574771Z  --workflow "${GITHUB_WORKFLOW}" \ 2025-09-07T10:10:00.2575200Z  --job-name "${JOB_NAME}" \ 2025-09-07T10:10:00.2580566Z  --test-matrix "{"include": [{"config": "inductor_huggingface_perf_cuda_h100", "shard": 1, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 2, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 3, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 4, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 5, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 1, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 2, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 3, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 4, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 5, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 6, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 7, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 1, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 2, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 3, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 4, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 5, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 6, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 7, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 8, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 9, "num_shards": 9, "runner": "linux.aws.h100"}]}" \ 2025-09-07T10:10:00.2585960Z  --selected-test-configs "" \ 2025-09-07T10:10:00.2586202Z  --pr-number "${PR_NUMBER}" \ 2025-09-07T10:10:00.2586427Z  --tag "${TAG}" \ 2025-09-07T10:10:00.2586644Z  --event-name "${EVENT_NAME}" \ 2025-09-07T10:10:00.2586880Z  --schedule "${SCHEDULE}" \ 2025-09-07T10:10:00.2587107Z  --branch "${HEAD_BRANCH}" 2025-09-07T10:10:00.2601148Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T10:10:00.2601440Z env: 2025-09-07T10:10:00.2601604Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:10:00.2601861Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:10:00.2602192Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:10:00.2602676Z GITHUB_TOKEN: *** 2025-09-07T10:10:00.2602981Z JOB_NAME: test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100) 2025-09-07T10:10:00.2603313Z PR_NUMBER: 2025-09-07T10:10:00.2603475Z TAG: 2025-09-07T10:10:00.2603626Z EVENT_NAME: schedule 2025-09-07T10:10:00.2603809Z SCHEDULE: 0 7 * * 0 2025-09-07T10:10:00.2603996Z HEAD_BRANCH: main 2025-09-07T10:10:00.2604181Z ##[endgroup] 2025-09-07T10:10:00.3068880Z Workflow: inductor-perf-nightly-h100 2025-09-07T10:10:00.3069269Z Job name: test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100) 2025-09-07T10:10:00.5258282Z Setting output keep-going=True 2025-09-07T10:10:00.5258578Z Setting output ci-verbose-test-logs=False 2025-09-07T10:10:00.5258840Z Setting output ci-test-showlocals=False 2025-09-07T10:10:00.5259088Z Setting output ci-no-test-timeout=False 2025-09-07T10:10:00.5259327Z Setting output ci-no-td=False 2025-09-07T10:10:00.5259574Z Setting output ci-td-distributed=False 2025-09-07T10:10:00.5259818Z Setting output is-unstable=False 2025-09-07T10:10:00.5260042Z Setting output reenabled-issues= 2025-09-07T10:10:00.5265375Z Setting output test-matrix={"include": [{"config": "inductor_huggingface_perf_cuda_h100", "shard": 1, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 2, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 3, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 4, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 5, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 1, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 2, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 3, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 4, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 5, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 6, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 7, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 1, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 2, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 3, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 4, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 5, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 6, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 7, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 8, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 9, "num_shards": 9, "runner": "linux.aws.h100"}]} 2025-09-07T10:10:00.5270431Z Setting output is-test-matrix-empty=False 2025-09-07T10:10:00.5444408Z ##[group]Run echo "Filtered matrix:" 2025-09-07T10:10:00.5444698Z echo "Filtered matrix:" 2025-09-07T10:10:00.5450496Z echo "{"include": [{"config": "inductor_huggingface_perf_cuda_h100", "shard": 1, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 2, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 3, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 4, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 5, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 1, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 2, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 3, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 4, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 5, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 6, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 7, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 1, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 2, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 3, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 4, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 5, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 6, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 7, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 8, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 9, "num_shards": 9, "runner": "linux.aws.h100"}]}" 2025-09-07T10:10:00.5455764Z  2025-09-07T10:10:00.5455921Z echo 2025-09-07T10:10:00.5456126Z echo "Is the current job unstable? False" 2025-09-07T10:10:00.5456543Z  2025-09-07T10:10:00.5456690Z echo 2025-09-07T10:10:00.5456878Z echo "Is keep-going label set? True" 2025-09-07T10:10:00.5457109Z  2025-09-07T10:10:00.5457266Z echo 2025-09-07T10:10:00.5457441Z echo "Reenabled issues? " 2025-09-07T10:10:00.5471669Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T10:10:00.5471971Z env: 2025-09-07T10:10:00.5472133Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:10:00.5472384Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:10:00.5472729Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:10:00.5473016Z ##[endgroup] 2025-09-07T10:10:00.5941207Z Filtered matrix: 2025-09-07T10:10:00.5947794Z {include: [{config: inductor_huggingface_perf_cuda_h100, shard: 1, num_shards: 5, runner: linux.aws.h100}, {config: inductor_huggingface_perf_cuda_h100, shard: 2, num_shards: 5, runner: linux.aws.h100}, {config: inductor_huggingface_perf_cuda_h100, shard: 3, num_shards: 5, runner: linux.aws.h100}, {config: inductor_huggingface_perf_cuda_h100, shard: 4, num_shards: 5, runner: linux.aws.h100}, {config: inductor_huggingface_perf_cuda_h100, shard: 5, num_shards: 5, runner: linux.aws.h100}, {config: inductor_timm_perf_cuda_h100, shard: 1, num_shards: 7, runner: linux.aws.h100}, {config: inductor_timm_perf_cuda_h100, shard: 2, num_shards: 7, runner: linux.aws.h100}, {config: inductor_timm_perf_cuda_h100, shard: 3, num_shards: 7, runner: linux.aws.h100}, {config: inductor_timm_perf_cuda_h100, shard: 4, num_shards: 7, runner: linux.aws.h100}, {config: inductor_timm_perf_cuda_h100, shard: 5, num_shards: 7, runner: linux.aws.h100}, {config: inductor_timm_perf_cuda_h100, shard: 6, num_shards: 7, runner: linux.aws.h100}, {config: inductor_timm_perf_cuda_h100, shard: 7, num_shards: 7, runner: linux.aws.h100}, {config: inductor_torchbench_perf_cuda_h100, shard: 1, num_shards: 9, runner: linux.aws.h100}, {config: inductor_torchbench_perf_cuda_h100, shard: 2, num_shards: 9, runner: linux.aws.h100}, {config: inductor_torchbench_perf_cuda_h100, shard: 3, num_shards: 9, runner: linux.aws.h100}, {config: inductor_torchbench_perf_cuda_h100, shard: 4, num_shards: 9, runner: linux.aws.h100}, {config: inductor_torchbench_perf_cuda_h100, shard: 5, num_shards: 9, runner: linux.aws.h100}, {config: inductor_torchbench_perf_cuda_h100, shard: 6, num_shards: 9, runner: linux.aws.h100}, {config: inductor_torchbench_perf_cuda_h100, shard: 7, num_shards: 9, runner: linux.aws.h100}, {config: inductor_torchbench_perf_cuda_h100, shard: 8, num_shards: 9, runner: linux.aws.h100}, {config: inductor_torchbench_perf_cuda_h100, shard: 9, num_shards: 9, runner: linux.aws.h100}]} 2025-09-07T10:10:00.5953133Z 2025-09-07T10:10:00.5953236Z Is the current job unstable? False 2025-09-07T10:10:00.5953388Z 2025-09-07T10:10:00.5953472Z Is keep-going label set? True 2025-09-07T10:10:00.5953614Z 2025-09-07T10:10:00.5953682Z Reenabled issues? 2025-09-07T10:10:00.6480983Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-09-07T10:10:00.6481471Z echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-09-07T10:10:00.6494472Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T10:10:00.6494766Z env: 2025-09-07T10:10:00.6494933Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:10:00.6495350Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:10:00.6495686Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:10:00.6495960Z JOB_TIMEOUT: 1440 2025-09-07T10:10:00.6496137Z ##[endgroup] 2025-09-07T10:10:00.7492589Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-09-07T10:10:00.7493024Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-09-07T10:10:00.7493359Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-09-07T10:10:00.7507265Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T10:10:00.7507554Z env: 2025-09-07T10:10:00.7507936Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:10:00.7508180Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:10:00.7508523Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:10:00.7508796Z ##[endgroup] 2025-09-07T10:10:00.8277260Z ##[group]Run set -x 2025-09-07T10:10:00.8277563Z set -x 2025-09-07T10:10:00.8277774Z  2025-09-07T10:10:00.8277995Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2025-09-07T10:10:00.8278338Z  TEST_COMMAND=.ci/pytorch/multigpu-test.sh 2025-09-07T10:10:00.8278691Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2025-09-07T10:10:00.8279001Z  TEST_COMMAND=.ci/onnx/test.sh 2025-09-07T10:10:00.8279261Z else 2025-09-07T10:10:00.8279481Z  TEST_COMMAND=.ci/pytorch/test.sh 2025-09-07T10:10:00.8279746Z fi 2025-09-07T10:10:00.8279918Z  2025-09-07T10:10:00.8280142Z # Leaving 1GB for the runner and other things 2025-09-07T10:10:00.8280667Z TOTAL_AVAILABLE_MEMORY_IN_GB=$(awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo) 2025-09-07T10:10:00.8281441Z # https://docs.docker.com/engine/containers/resource_constraints/#--memory-swap-details, the 3GB swap 2025-09-07T10:10:00.8282053Z # comes from https://github.com/pytorch/test-infra/pull/6058 2025-09-07T10:10:00.8282509Z TOTAL_MEMORY_WITH_SWAP=$(("${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}" + 3)) 2025-09-07T10:10:00.8282813Z  2025-09-07T10:10:00.8283010Z if [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then 2025-09-07T10:10:00.8283260Z  SHM_OPTS= 2025-09-07T10:10:00.8283445Z  JENKINS_USER= 2025-09-07T10:10:00.8283716Z  # ensure that docker container cleanly exits in 12 hours 2025-09-07T10:10:00.8284066Z  # if for some reason cleanup action doesn't stop container 2025-09-07T10:10:00.8284362Z  # when job is cancelled 2025-09-07T10:10:00.8284602Z  DOCKER_SHELL_CMD="sleep 12h" 2025-09-07T10:10:00.8284826Z else 2025-09-07T10:10:00.8285194Z  SHM_OPTS="--shm-size=${SHM_SIZE}" 2025-09-07T10:10:00.8285450Z  JENKINS_USER="--user jenkins" 2025-09-07T10:10:00.8285693Z  DOCKER_SHELL_CMD= 2025-09-07T10:10:00.8285897Z fi 2025-09-07T10:10:00.8286057Z  2025-09-07T10:10:00.8286308Z # detached container should get cleaned up by teardown_ec2_linux 2025-09-07T10:10:00.8286701Z # TODO: Stop building test binaries as part of the build phase 2025-09-07T10:10:00.8287156Z # Used for GPU_FLAG, SHM_OPTS, JENKINS_USER and DOCKER_SHELL_CMD since that doesn't play nice 2025-09-07T10:10:00.8287545Z # shellcheck disable=SC2086,SC2090 2025-09-07T10:10:00.8287793Z container_name=$(docker run \ 2025-09-07T10:10:00.8288020Z  ${GPU_FLAG:-} \ 2025-09-07T10:10:00.8288252Z  ${SCCACHE_SERVER_PORT_DOCKER_FLAG:-} \ 2025-09-07T10:10:00.8288517Z  -e BUILD_ENVIRONMENT \ 2025-09-07T10:10:00.8288739Z  -e PR_NUMBER \ 2025-09-07T10:10:00.8288947Z  -e GITHUB_ACTIONS \ 2025-09-07T10:10:00.8289162Z  -e GITHUB_REPOSITORY \ 2025-09-07T10:10:00.8289386Z  -e GITHUB_WORKFLOW \ 2025-09-07T10:10:00.8289598Z  -e GITHUB_JOB \ 2025-09-07T10:10:00.8289792Z  -e GITHUB_RUN_ID \ 2025-09-07T10:10:00.8289992Z  -e GITHUB_RUN_NUMBER \ 2025-09-07T10:10:00.8290215Z  -e GITHUB_RUN_ATTEMPT \ 2025-09-07T10:10:00.8290433Z  -e JOB_ID \ 2025-09-07T10:10:00.8290621Z  -e JOB_NAME \ 2025-09-07T10:10:00.8290807Z  -e BASE_SHA \ 2025-09-07T10:10:00.8291004Z  -e BRANCH \ 2025-09-07T10:10:00.8291185Z  -e SHA1 \ 2025-09-07T10:10:00.8291380Z  -e AWS_DEFAULT_REGION \ 2025-09-07T10:10:00.8291588Z  -e IN_WHEEL_TEST \ 2025-09-07T10:10:00.8291789Z  -e SHARD_NUMBER \ 2025-09-07T10:10:00.8292227Z  -e TEST_CONFIG \ 2025-09-07T10:10:00.8292432Z  -e NUM_TEST_SHARDS \ 2025-09-07T10:10:00.8292638Z  -e REENABLED_ISSUES \ 2025-09-07T10:10:00.8292863Z  -e CONTINUE_THROUGH_ERROR \ 2025-09-07T10:10:00.8293258Z  -e VERBOSE_TEST_LOGS \ 2025-09-07T10:10:00.8293484Z  -e TEST_SHOWLOCALS \ 2025-09-07T10:10:00.8293701Z  -e NO_TEST_TIMEOUT \ 2025-09-07T10:10:00.8293906Z  -e NO_TD \ 2025-09-07T10:10:00.8294095Z  -e TD_DISTRIBUTED \ 2025-09-07T10:10:00.8294302Z  -e PR_LABELS \ 2025-09-07T10:10:00.8294520Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2025-09-07T10:10:00.8294821Z  -e SCCACHE_BUCKET \ 2025-09-07T10:10:00.8295285Z  -e SCCACHE_REGION \ 2025-09-07T10:10:00.8295531Z  -e XLA_CUDA \ 2025-09-07T10:10:00.8295743Z  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ 2025-09-07T10:10:00.8296009Z  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \ 2025-09-07T10:10:00.8296285Z  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \ 2025-09-07T10:10:00.8296555Z  -e SKIP_SCCACHE_INITIALIZATION=1 \ 2025-09-07T10:10:00.8296807Z  -e HUGGING_FACE_HUB_TOKEN \ 2025-09-07T10:10:00.8297046Z  -e VLLM_TEST_HUGGING_FACE_TOKEN \ 2025-09-07T10:10:00.8297294Z  -e SCRIBE_GRAPHQL_ACCESS_TOKEN \ 2025-09-07T10:10:00.8297525Z  -e DASHBOARD_TAG \ 2025-09-07T10:10:00.8297742Z  -e ARTIFACTS_FILE_SUFFIX \ 2025-09-07T10:10:00.8298005Z  --memory="${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}g" \ 2025-09-07T10:10:00.8298309Z  --memory-swap="${TOTAL_MEMORY_WITH_SWAP}g" \ 2025-09-07T10:10:00.8298609Z  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ 2025-09-07T10:10:00.8298892Z  --security-opt seccomp=unconfined \ 2025-09-07T10:10:00.8299140Z  --cap-add=SYS_PTRACE \ 2025-09-07T10:10:00.8299357Z  --ipc=host \ 2025-09-07T10:10:00.8299549Z  ${SHM_OPTS} \ 2025-09-07T10:10:00.8299743Z  --tty \ 2025-09-07T10:10:00.8299911Z  --detach \ 2025-09-07T10:10:00.8300108Z  --name="${container_name}" \ 2025-09-07T10:10:00.8300337Z  ${JENKINS_USER} \ 2025-09-07T10:10:00.8300596Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2025-09-07T10:10:00.8300880Z  -w /var/lib/jenkins/workspace \ 2025-09-07T10:10:00.8301126Z  "${DOCKER_IMAGE}" \ 2025-09-07T10:10:00.8301407Z  ${DOCKER_SHELL_CMD} 2025-09-07T10:10:00.8301614Z ) 2025-09-07T10:10:00.8301836Z # Propagate download.pytorch.org IP to container 2025-09-07T10:10:00.8302324Z grep download.pytorch.org /etc/hosts | docker exec -i "${container_name}" sudo bash -c "/bin/cat >> /etc/hosts" 2025-09-07T10:10:00.8302834Z echo "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}" 2025-09-07T10:10:00.8303139Z  2025-09-07T10:10:00.8303343Z if [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then 2025-09-07T10:10:00.8303769Z  docker exec -t "${container_name}" sh -c "python3 -m pip install -r .ci/docker/requirements-ci.txt" 2025-09-07T10:10:00.8304139Z fi 2025-09-07T10:10:00.8304302Z  2025-09-07T10:10:00.8304660Z docker exec -t "${container_name}" sh -c "python3 -m pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}" 2025-09-07T10:10:00.8318305Z shell: /usr/bin/bash -e {0} 2025-09-07T10:10:00.8318532Z env: 2025-09-07T10:10:00.8318700Z GIT_DEFAULT_BRANCH: main 2025-09-07T10:10:00.8318968Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:10:00.8319311Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T10:10:00.8319671Z BUILD_ENVIRONMENT: linux-jammy-cuda12.8-py3.10-gcc9-sm90 2025-09-07T10:10:00.8319955Z PR_NUMBER: 2025-09-07T10:10:00.8320148Z GITHUB_REPOSITORY: pytorch/pytorch 2025-09-07T10:10:00.8320410Z GITHUB_WORKFLOW: inductor-perf-nightly-h100 2025-09-07T10:10:00.8320816Z GITHUB_JOB: test 2025-09-07T10:10:00.8320995Z GITHUB_RUN_ID: 17525296438 2025-09-07T10:10:00.8321207Z GITHUB_RUN_NUMBER: 662 2025-09-07T10:10:00.8321405Z GITHUB_RUN_ATTEMPT: 1 2025-09-07T10:10:00.8321585Z JOB_ID: 49775781837 2025-09-07T10:10:00.8322016Z JOB_NAME: test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100) 2025-09-07T10:10:00.8322362Z BRANCH: main 2025-09-07T10:10:00.8322559Z SHA1: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T10:10:00.8322827Z BASE_SHA: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T10:10:00.8323094Z TEST_CONFIG: inductor_timm_perf_cuda_h100 2025-09-07T10:10:00.8323327Z SHARD_NUMBER: 4 2025-09-07T10:10:00.8323497Z NUM_TEST_SHARDS: 7 2025-09-07T10:10:00.8323680Z REENABLED_ISSUES: 2025-09-07T10:10:00.8323867Z CONTINUE_THROUGH_ERROR: True 2025-09-07T10:10:00.8324084Z VERBOSE_TEST_LOGS: False 2025-09-07T10:10:00.8324283Z TEST_SHOWLOCALS: False 2025-09-07T10:10:00.8324472Z NO_TEST_TIMEOUT: False 2025-09-07T10:10:00.8324659Z NO_TD: False 2025-09-07T10:10:00.8324829Z TD_DISTRIBUTED: False 2025-09-07T10:10:00.8325234Z SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 2025-09-07T10:10:00.8325499Z SCCACHE_REGION: us-east-1 2025-09-07T10:10:00.8325698Z SHM_SIZE: 2g 2025-09-07T10:10:00.8326364Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T10:10:00.8327062Z XLA_CUDA: 2025-09-07T10:10:00.8327328Z XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla 2025-09-07T10:10:00.8327658Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 0 2025-09-07T10:10:00.8327896Z PYTORCH_TEST_RERUN_DISABLED_TESTS: 0 2025-09-07T10:10:00.8328815Z DASHBOARD_TAG: training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true 2025-09-07T10:10:00.8329893Z VLLM_TEST_HUGGING_FACE_TOKEN: *** 2025-09-07T10:10:00.8330222Z HUGGING_FACE_HUB_TOKEN: *** 2025-09-07T10:10:00.8330536Z SCRIBE_GRAPHQL_ACCESS_TOKEN: *** 2025-09-07T10:10:00.8330906Z ARTIFACTS_FILE_SUFFIX: test-inductor_timm_perf_cuda_h100-4-7-linux.aws.h100_49775781837 2025-09-07T10:10:00.8331254Z ##[endgroup] 2025-09-07T10:10:00.8774142Z + [[ inductor_timm_perf_cuda_h100 == \m\u\l\t\i\g\p\u ]] 2025-09-07T10:10:00.8774526Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *onnx* ]] 2025-09-07T10:10:00.8774840Z + TEST_COMMAND=.ci/pytorch/test.sh 2025-09-07T10:10:00.8778293Z ++ awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo 2025-09-07T10:10:00.8790168Z + TOTAL_AVAILABLE_MEMORY_IN_GB='1998.949 ' 2025-09-07T10:10:00.8790574Z + TOTAL_MEMORY_WITH_SWAP=2001 2025-09-07T10:10:00.8790979Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *\s\3\9\0\x* ]] 2025-09-07T10:10:00.8791431Z + SHM_OPTS=--shm-size=2g 2025-09-07T10:10:00.8791675Z + JENKINS_USER='--user jenkins' 2025-09-07T10:10:00.8791938Z + DOCKER_SHELL_CMD= 2025-09-07T10:10:00.8800148Z +++ nproc --ignore=2 2025-09-07T10:10:00.8814566Z ++ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all -e SCCACHE_SERVER_PORT=5231 -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e TD_DISTRIBUTED -e PR_LABELS -e MAX_JOBS=22 -e SCCACHE_BUCKET -e SCCACHE_REGION -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e SKIP_SCCACHE_INITIALIZATION=1 -e HUGGING_FACE_HUB_TOKEN -e VLLM_TEST_HUGGING_FACE_TOKEN -e SCRIBE_GRAPHQL_ACCESS_TOKEN -e DASHBOARD_TAG -e ARTIFACTS_FILE_SUFFIX --memory=1998g --memory-swap=2001g --env-file=/tmp/github_env_17525296438 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/eve/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T10:11:50.5181628Z + container_name=f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T10:11:50.5184634Z + grep download.pytorch.org /etc/hosts 2025-09-07T10:11:50.5186879Z + docker exec -i f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 sudo bash -c '/bin/cat >> /etc/hosts' 2025-09-07T10:11:50.5719230Z + echo DOCKER_CONTAINER_ID=f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T10:11:50.5721183Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *\s\3\9\0\x* ]] 2025-09-07T10:11:50.5724192Z ++ echo dist/torch-2.9.0a0+git93fb23d-cp310-cp310-linux_x86_64.whl 2025-09-07T10:11:50.5726867Z + docker exec -t f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 sh -c 'python3 -m pip install dist/torch-2.9.0a0+git93fb23d-cp310-cp310-linux_x86_64.whl[opt-einsum] && .ci/pytorch/test.sh' 2025-09-07T10:11:50.9861582Z Processing ./dist/torch-2.9.0a0+git93fb23d-cp310-cp310-linux_x86_64.whl (from torch==2.9.0a0+git93fb23d) 2025-09-07T10:11:51.2995255Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (3.19.1) 2025-09-07T10:11:51.2998527Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (4.15.0) 2025-09-07T10:11:51.3002314Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (1.13.3) 2025-09-07T10:11:51.3006170Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (2.8.8) 2025-09-07T10:11:51.3009354Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (3.1.6) 2025-09-07T10:11:51.3012894Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (2025.3.0) 2025-09-07T10:11:51.3024902Z Requirement already satisfied: opt-einsum>=3.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (3.3.0) 2025-09-07T10:11:51.3343044Z Requirement already satisfied: numpy>=1.7 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from opt-einsum>=3.3->torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (1.22.4) 2025-09-07T10:11:51.3359530Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (1.3.0) 2025-09-07T10:11:51.3391029Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (3.0.2) 2025-09-07T10:11:52.1309452Z Installing collected packages: torch 2025-09-07T10:12:02.5568881Z ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. 2025-09-07T10:12:02.5569745Z dall-e 0.1 requires torchvision, which is not installed. 2025-09-07T10:12:02.5570158Z effdet 0.4.1 requires torchvision, which is not installed. 2025-09-07T10:12:02.5570614Z python-doctr 1.0.0 requires torchvision>=0.15.0, which is not installed. 2025-09-07T10:12:02.5571170Z pytorch-labs-segment-anything-fast 0.2 requires torchao, which is not installed. 2025-09-07T10:12:02.5572395Z pytorch-labs-segment-anything-fast 0.2 requires torchvision>=0.17.0.dev20231026, which is not installed. 2025-09-07T10:12:02.5573073Z timm 1.0.14 requires torchvision, which is not installed. 2025-09-07T10:12:02.5573776Z Successfully installed torch-2.9.0a0+git93fb23d 2025-09-07T10:12:02.6271645Z + export TERM=vt100 2025-09-07T10:12:02.6271861Z + TERM=vt100 2025-09-07T10:12:02.6275403Z ++ dirname .ci/pytorch/test.sh 2025-09-07T10:12:02.6287963Z + source .ci/pytorch/common.sh 2025-09-07T10:12:02.6293386Z +++ dirname .ci/pytorch/common.sh 2025-09-07T10:12:02.6303374Z ++ source .ci/pytorch/common_utils.sh 2025-09-07T10:12:02.6304781Z +++ declare -f -t trap_add 2025-09-07T10:12:02.6309286Z ++ set -ex -o pipefail 2025-09-07T10:12:02.6309565Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *rocm* ]] 2025-09-07T10:12:02.6309892Z ++ BUILD_TEST_LIBTORCH=0 2025-09-07T10:12:02.6313408Z ++ dirname .ci/pytorch/test.sh 2025-09-07T10:12:02.6322308Z + source .ci/pytorch/common-build.sh 2025-09-07T10:12:02.6323864Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 != *win-* ]] 2025-09-07T10:12:02.6331910Z ++++ dirname .ci/pytorch/common-build.sh 2025-09-07T10:12:02.6340658Z +++ cd .ci/pytorch 2025-09-07T10:12:02.6340951Z +++ pwd -P 2025-09-07T10:12:02.6343306Z ++ script_dir=/var/lib/jenkins/workspace/.ci/pytorch 2025-09-07T10:12:02.6343757Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *-pch* ]] 2025-09-07T10:12:02.6344111Z ++ which sccache 2025-09-07T10:12:02.6359584Z ++ [[ -z ossci-compiler-cache-circleci-v2 ]] 2025-09-07T10:12:02.6359903Z ++ sccache --stop-server 2025-09-07T10:12:02.6392990Z ++ true 2025-09-07T10:12:02.6393206Z ++ rm -f /var/lib/jenkins/sccache_error.log 2025-09-07T10:12:02.6404510Z ++ trap_add sccache_epilogue EXIT 2025-09-07T10:12:02.6404770Z ++ trap_add_cmd=sccache_epilogue 2025-09-07T10:12:02.6405112Z ++ shift 2025-09-07T10:12:02.6405306Z ++ for trap_add_name in "$@" 2025-09-07T10:12:02.6411762Z ++++ trap -p EXIT 2025-09-07T10:12:02.6414355Z +++ eval 'extract_trap_cmd ' 2025-09-07T10:12:02.6414628Z ++++ extract_trap_cmd 2025-09-07T10:12:02.6415127Z ++++ printf '%s\n' '' 2025-09-07T10:12:02.6415385Z +++ printf '%s\n' sccache_epilogue 2025-09-07T10:12:02.6417488Z ++ trap -- ' 2025-09-07T10:12:02.6417740Z sccache_epilogue' EXIT 2025-09-07T10:12:02.6418021Z ++ [[ -n 1 ]] 2025-09-07T10:12:02.6418391Z ++ echo 'Skipping sccache server initialization, setting environment variables' 2025-09-07T10:12:02.6418949Z Skipping sccache server initialization, setting environment variables 2025-09-07T10:12:02.6419365Z ++ export SCCACHE_IDLE_TIMEOUT=0 2025-09-07T10:12:02.6419636Z ++ SCCACHE_IDLE_TIMEOUT=0 2025-09-07T10:12:02.6419956Z ++ export SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-09-07T10:12:02.6420359Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-09-07T10:12:02.6420742Z ++ export RUST_LOG=sccache::server=error 2025-09-07T10:12:02.6421043Z ++ RUST_LOG=sccache::server=error 2025-09-07T10:12:02.6421303Z ++ sccache --zero-stats 2025-09-07T10:12:02.8324158Z Statistics zeroed. 2025-09-07T10:12:02.8330218Z ++ which ccache 2025-09-07T10:12:02.8344229Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 != *rocm* ]] 2025-09-07T10:12:02.8344593Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 != *s390x* ]] 2025-09-07T10:12:02.8345088Z + [[ -d /var/lib/jenkins/workspace ]] 2025-09-07T10:12:02.8347296Z ++ stat -c %u /var/lib/jenkins/workspace 2025-09-07T10:12:02.8359083Z + WORKSPACE_ORIGINAL_OWNER_ID=1000 2025-09-07T10:12:02.8359353Z + trap_add cleanup_workspace EXIT 2025-09-07T10:12:02.8359640Z + trap_add_cmd=cleanup_workspace 2025-09-07T10:12:02.8359879Z + shift 2025-09-07T10:12:02.8360068Z + for trap_add_name in "$@" 2025-09-07T10:12:02.8366517Z +++ trap -p EXIT 2025-09-07T10:12:02.8369219Z ++ eval 'extract_trap_cmd trap -- '\'' 2025-09-07T10:12:02.8369488Z sccache_epilogue'\'' EXIT' 2025-09-07T10:12:02.8369718Z +++ extract_trap_cmd trap -- ' 2025-09-07T10:12:02.8369949Z sccache_epilogue' EXIT 2025-09-07T10:12:02.8370147Z +++ printf '%s\n' ' 2025-09-07T10:12:02.8370823Z sccache_epilogue' 2025-09-07T10:12:02.8371033Z ++ printf '%s\n' cleanup_workspace 2025-09-07T10:12:02.8372132Z + trap -- ' 2025-09-07T10:12:02.8372312Z sccache_epilogue 2025-09-07T10:12:02.8372515Z cleanup_workspace' EXIT 2025-09-07T10:12:02.8373004Z + sudo chown -R jenkins /var/lib/jenkins/workspace 2025-09-07T10:12:07.2673522Z + git config --global --add safe.directory /var/lib/jenkins/workspace 2025-09-07T10:12:07.2699344Z + echo 'Environment variables:' 2025-09-07T10:12:07.2699604Z Environment variables: 2025-09-07T10:12:07.2699801Z + env 2025-09-07T10:12:07.2709091Z GITHUB_WORKSPACE=/home/eve/_work/pytorch/pytorch 2025-09-07T10:12:07.2709408Z CONTINUE_THROUGH_ERROR=True 2025-09-07T10:12:07.2709710Z BUILD_ENVIRONMENT=linux-jammy-cuda12.8-py3.10-gcc9-sm90 2025-09-07T10:12:07.2712504Z VLLM_TEST_HUGGING_FACE_TOKEN=*** 2025-09-07T10:12:07.2712803Z HOSTNAME=f6780263fb6a 2025-09-07T10:12:07.2713224Z GITHUB_PATH=/home/eve/_work/_temp/_runner_file_commands/add_path_968ff597-39bb-45d2-b1e7-83430a4990dd 2025-09-07T10:12:07.2713710Z GITHUB_ACTION=__run_2 2025-09-07T10:12:07.2713979Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=0 2025-09-07T10:12:07.2714236Z GITHUB_RUN_NUMBER=662 2025-09-07T10:12:07.2714471Z TEST_CONFIG=inductor_timm_perf_cuda_h100 2025-09-07T10:12:07.2714769Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-09-07T10:12:07.2715389Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2025-09-07T10:12:07.2715664Z SCCACHE_IDLE_TIMEOUT=0 2025-09-07T10:12:07.2716010Z SCRIBE_GRAPHQL_ACCESS_TOKEN=*** 2025-09-07T10:12:07.2716275Z GITHUB_TRIGGERING_ACTOR=pytorchmergebot 2025-09-07T10:12:07.2716544Z GITHUB_REF_TYPE=branch 2025-09-07T10:12:07.2716789Z BASE_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T10:12:07.2717074Z XLA_CUDA= 2025-09-07T10:12:07.2717285Z NCCL_LIB_DIR=/usr/local/cuda/lib64/ 2025-09-07T10:12:07.2717637Z HUGGING_FACE_HUB_TOKEN=*** 2025-09-07T10:12:07.2717994Z *** 2025-09-07T10:12:07.2718178Z GITHUB_REPOSITORY_ID=65600975 2025-09-07T10:12:07.2718417Z GITHUB_ACTIONS=true 2025-09-07T10:12:07.2718621Z NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:12:07.2718916Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-09-07T10:12:07.2719233Z SHA1=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T10:12:07.2719538Z GITHUB_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T10:12:07.2720077Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/inductor-perf-test-nightly-h100.yml@refs/heads/main 2025-09-07T10:12:07.2720569Z UCC_HOME=/usr 2025-09-07T10:12:07.2720747Z VERBOSE_TEST_LOGS=False 2025-09-07T10:12:07.2720959Z GITHUB_REF=refs/heads/main 2025-09-07T10:12:07.2721172Z SHARD_NUMBER=4 2025-09-07T10:12:07.2721365Z GITHUB_REF_PROTECTED=true 2025-09-07T10:12:07.2721593Z HOME=/var/lib/jenkins 2025-09-07T10:12:07.2721797Z SCCACHE_SERVER_PORT=5231 2025-09-07T10:12:07.2722044Z GITHUB_API_URL=https://api.github.com 2025-09-07T10:12:07.2722316Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-09-07T10:12:07.2722592Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152 2025-09-07T10:12:07.2722870Z USE_SYSTEM_NCCL=1 2025-09-07T10:12:07.2723070Z NUM_TEST_SHARDS=7 2025-09-07T10:12:07.2723255Z UCX_HOME=/usr 2025-09-07T10:12:07.2723628Z GITHUB_STATE=/home/eve/_work/_temp/_runner_file_commands/save_state_968ff597-39bb-45d2-b1e7-83430a4990dd 2025-09-07T10:12:07.2724196Z JOB_NAME=test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100) 2025-09-07T10:12:07.2724734Z GITHUB_ENV=/home/eve/_work/_temp/_runner_file_commands/set_env_968ff597-39bb-45d2-b1e7-83430a4990dd 2025-09-07T10:12:07.2725396Z GITHUB_EVENT_PATH=/home/eve/_work/_temp/_github_workflow/event.json 2025-09-07T10:12:07.2725733Z GITHUB_EVENT_NAME=schedule 2025-09-07T10:12:07.2726761Z DASHBOARD_TAG=training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true 2025-09-07T10:12:07.2727792Z GITHUB_RUN_ID=17525296438 2025-09-07T10:12:07.2727986Z INSTALLED_OPENBLAS= 2025-09-07T10:12:07.2728355Z GITHUB_STEP_SUMMARY=/home/eve/_work/_temp/_runner_file_commands/step_summary_968ff597-39bb-45d2-b1e7-83430a4990dd 2025-09-07T10:12:07.2729125Z GITHUB_ACTOR=pytorchmergebot 2025-09-07T10:12:07.2729329Z PR_NUMBER= 2025-09-07T10:12:07.2729486Z DESIRED_CUDA=12.8.1 2025-09-07T10:12:07.2729659Z GITHUB_RUN_ATTEMPT=1 2025-09-07T10:12:07.2730052Z ANACONDA_PYTHON_VERSION=3.10 2025-09-07T10:12:07.2730310Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-09-07T10:12:07.2730561Z TERM=vt100 2025-09-07T10:12:07.2730731Z INSTALLED_VISION=yes 2025-09-07T10:12:07.2730904Z BRANCH=main 2025-09-07T10:12:07.2731068Z SCCACHE_REGION=us-east-1 2025-09-07T10:12:07.2731269Z OPENSSL_ROOT_DIR=/opt/openssl 2025-09-07T10:12:07.2731469Z CUDA_PATH=/usr/local/cuda 2025-09-07T10:12:07.2731774Z GITHUB_ACTION_PATH=/home/eve/_work/pytorch/pytorch/./.github/actions/setup-linux 2025-09-07T10:12:07.2732125Z GITHUB_SERVER_URL=https://github.com 2025-09-07T10:12:07.2732388Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96 2025-09-07T10:12:07.2732639Z REENABLED_ISSUES= 2025-09-07T10:12:07.2732816Z DOCS= 2025-09-07T10:12:07.2732962Z SHLVL=1 2025-09-07T10:12:07.2733112Z MAX_JOBS=22 2025-09-07T10:12:07.2733269Z GITHUB_ACTOR_ID=97764156 2025-09-07T10:12:07.2733520Z GITHUB_WORKFLOW_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T10:12:07.2733800Z GITHUB_REF_NAME=main 2025-09-07T10:12:07.2734077Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2025-09-07T10:12:07.2734378Z GITHUB_JOB=test 2025-09-07T10:12:07.2734551Z NO_TEST_TIMEOUT=False 2025-09-07T10:12:07.2734733Z TD_DISTRIBUTED=False 2025-09-07T10:12:07.2735065Z GITHUB_REPOSITORY=pytorch/pytorch 2025-09-07T10:12:07.2735300Z GITHUB_RETENTION_DAYS=90 2025-09-07T10:12:07.2735498Z OPENSSL_DIR=/opt/openssl 2025-09-07T10:12:07.2735698Z GITHUB_ACTION_REPOSITORY= 2025-09-07T10:12:07.2736254Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-09-07T10:12:07.2736820Z GITHUB_BASE_REF= 2025-09-07T10:12:07.2736991Z INSTALLED_ACL= 2025-09-07T10:12:07.2737314Z ARTIFACTS_FILE_SUFFIX=test-inductor_timm_perf_cuda_h100-4-7-linux.aws.h100_49775781837 2025-09-07T10:12:07.2737660Z CI=true 2025-09-07T10:12:07.2737826Z GITHUB_REPOSITORY_OWNER=pytorch 2025-09-07T10:12:07.2738073Z RUST_LOG=sccache::server=error 2025-09-07T10:12:07.2738275Z JOB_ID=49775781837 2025-09-07T10:12:07.2738443Z GITHUB_HEAD_REF= 2025-09-07T10:12:07.2738611Z GITHUB_ACTION_REF= 2025-09-07T10:12:07.2738819Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2025-09-07T10:12:07.2739075Z TEST_SHOWLOCALS=False 2025-09-07T10:12:07.2739295Z GITHUB_WORKFLOW=inductor-perf-nightly-h100 2025-09-07T10:12:07.2739546Z DEBIAN_FRONTEND=noninteractive 2025-09-07T10:12:07.2739922Z GITHUB_OUTPUT=/home/eve/_work/_temp/_runner_file_commands/set_output_968ff597-39bb-45d2-b1e7-83430a4990dd 2025-09-07T10:12:07.2740306Z NO_TD=False 2025-09-07T10:12:07.2740475Z SKIP_SCCACHE_INITIALIZATION=1 2025-09-07T10:12:07.2740694Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/ 2025-09-07T10:12:07.2740914Z _=/usr/bin/env 2025-09-07T10:12:07.2741150Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2025-09-07T10:12:07.2993393Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch 2025-09-07T10:12:07.2993933Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-09-07T10:12:07.2994437Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib 2025-09-07T10:12:07.2995095Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/test 2025-09-07T10:12:07.2995497Z + BUILD_DIR=build 2025-09-07T10:12:07.2995715Z + BUILD_RENAMED_DIR=build_renamed 2025-09-07T10:12:07.2995971Z + BUILD_BIN_DIR=build/bin 2025-09-07T10:12:07.2996193Z + SHARD_NUMBER=4 2025-09-07T10:12:07.2996387Z + NUM_TEST_SHARDS=7 2025-09-07T10:12:07.2996608Z + export TORCH_SERIALIZATION_DEBUG=1 2025-09-07T10:12:07.2996876Z + TORCH_SERIALIZATION_DEBUG=1 2025-09-07T10:12:07.2997116Z + export VALGRIND=ON 2025-09-07T10:12:07.2997320Z + VALGRIND=ON 2025-09-07T10:12:07.2997810Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *clang9* ]] 2025-09-07T10:12:07.2998178Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *xpu* ]] 2025-09-07T10:12:07.2998480Z + detect_cuda_arch 2025-09-07T10:12:07.2998851Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *cuda* ]] 2025-09-07T10:12:07.2999149Z + command -v nvidia-smi 2025-09-07T10:12:07.2999353Z /usr/bin/nvidia-smi 2025-09-07T10:12:07.3004503Z ++ nvidia-smi --query-gpu=compute_cap --format=csv 2025-09-07T10:12:07.3005930Z ++ tail -n 1 2025-09-07T10:12:07.3282995Z + TORCH_CUDA_ARCH_LIST=9.0 2025-09-07T10:12:07.3283260Z + export TORCH_CUDA_ARCH_LIST 2025-09-07T10:12:07.3283555Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *s390x* ]] 2025-09-07T10:12:07.3283845Z + [[ 0 == \1 ]] 2025-09-07T10:12:07.3284024Z + [[ True == \1 ]] 2025-09-07T10:12:07.3284273Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 != *bazel* ]] 2025-09-07T10:12:07.3288557Z ++ realpath build/custom_test_artifacts 2025-09-07T10:12:07.3298620Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts 2025-09-07T10:12:07.3299047Z + [[ -n '' ]] 2025-09-07T10:12:07.3299247Z + echo 'Environment variables' 2025-09-07T10:12:07.3299479Z Environment variables 2025-09-07T10:12:07.3299671Z + env 2025-09-07T10:12:07.3306499Z GITHUB_WORKSPACE=/home/eve/_work/pytorch/pytorch 2025-09-07T10:12:07.3306796Z CONTINUE_THROUGH_ERROR=True 2025-09-07T10:12:07.3307112Z BUILD_ENVIRONMENT=linux-jammy-cuda12.8-py3.10-gcc9-sm90 2025-09-07T10:12:07.3307615Z VLLM_TEST_HUGGING_FACE_TOKEN=*** 2025-09-07T10:12:07.3307887Z HOSTNAME=f6780263fb6a 2025-09-07T10:12:07.3308305Z GITHUB_PATH=/home/eve/_work/_temp/_runner_file_commands/add_path_968ff597-39bb-45d2-b1e7-83430a4990dd 2025-09-07T10:12:07.3308762Z GITHUB_ACTION=__run_2 2025-09-07T10:12:07.3308993Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=0 2025-09-07T10:12:07.3309250Z GITHUB_RUN_NUMBER=662 2025-09-07T10:12:07.3309488Z TEST_CONFIG=inductor_timm_perf_cuda_h100 2025-09-07T10:12:07.3309771Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-09-07T10:12:07.3310055Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2025-09-07T10:12:07.3310332Z SCCACHE_IDLE_TIMEOUT=0 2025-09-07T10:12:07.3310667Z SCRIBE_GRAPHQL_ACCESS_TOKEN=*** 2025-09-07T10:12:07.3310934Z GITHUB_TRIGGERING_ACTOR=pytorchmergebot 2025-09-07T10:12:07.3311214Z GITHUB_REF_TYPE=branch 2025-09-07T10:12:07.3311438Z TORCH_CUDA_ARCH_LIST=9.0 2025-09-07T10:12:07.3311698Z BASE_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T10:12:07.3311975Z XLA_CUDA= 2025-09-07T10:12:07.3312174Z NCCL_LIB_DIR=/usr/local/cuda/lib64/ 2025-09-07T10:12:07.3312522Z HUGGING_FACE_HUB_TOKEN=*** 2025-09-07T10:12:07.3312853Z *** 2025-09-07T10:12:07.3313044Z GITHUB_REPOSITORY_ID=65600975 2025-09-07T10:12:07.3313283Z GITHUB_ACTIONS=true 2025-09-07T10:12:07.3313501Z NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T10:12:07.3313791Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-09-07T10:12:07.3314127Z SHA1=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T10:12:07.3314437Z GITHUB_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T10:12:07.3315196Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/inductor-perf-test-nightly-h100.yml@refs/heads/main 2025-09-07T10:12:07.3315725Z UCC_HOME=/usr 2025-09-07T10:12:07.3315923Z TORCH_SERIALIZATION_DEBUG=1 2025-09-07T10:12:07.3316159Z VERBOSE_TEST_LOGS=False 2025-09-07T10:12:07.3316381Z GITHUB_REF=refs/heads/main 2025-09-07T10:12:07.3316602Z SHARD_NUMBER=4 2025-09-07T10:12:07.3316799Z GITHUB_REF_PROTECTED=true 2025-09-07T10:12:07.3317026Z HOME=/var/lib/jenkins 2025-09-07T10:12:07.3317247Z SCCACHE_SERVER_PORT=5231 2025-09-07T10:12:07.3317498Z GITHUB_API_URL=https://api.github.com 2025-09-07T10:12:07.3317792Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-09-07T10:12:07.3318084Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152 2025-09-07T10:12:07.3318367Z USE_SYSTEM_NCCL=1 2025-09-07T10:12:07.3318534Z NUM_TEST_SHARDS=7 2025-09-07T10:12:07.3318699Z UCX_HOME=/usr 2025-09-07T10:12:07.3319036Z GITHUB_STATE=/home/eve/_work/_temp/_runner_file_commands/save_state_968ff597-39bb-45d2-b1e7-83430a4990dd 2025-09-07T10:12:07.3319889Z JOB_NAME=test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100) 2025-09-07T10:12:07.3320376Z GITHUB_ENV=/home/eve/_work/_temp/_runner_file_commands/set_env_968ff597-39bb-45d2-b1e7-83430a4990dd 2025-09-07T10:12:07.3320991Z GITHUB_EVENT_PATH=/home/eve/_work/_temp/_github_workflow/event.json 2025-09-07T10:12:07.3321298Z GITHUB_EVENT_NAME=schedule 2025-09-07T10:12:07.3322194Z DASHBOARD_TAG=training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true 2025-09-07T10:12:07.3323101Z GITHUB_RUN_ID=17525296438 2025-09-07T10:12:07.3323296Z INSTALLED_OPENBLAS= 2025-09-07T10:12:07.3323677Z GITHUB_STEP_SUMMARY=/home/eve/_work/_temp/_runner_file_commands/step_summary_968ff597-39bb-45d2-b1e7-83430a4990dd 2025-09-07T10:12:07.3324096Z GITHUB_ACTOR=pytorchmergebot 2025-09-07T10:12:07.3324295Z PR_NUMBER= 2025-09-07T10:12:07.3324461Z DESIRED_CUDA=12.8.1 2025-09-07T10:12:07.3324634Z GITHUB_RUN_ATTEMPT=1 2025-09-07T10:12:07.3324807Z VALGRIND=ON 2025-09-07T10:12:07.3325138Z ANACONDA_PYTHON_VERSION=3.10 2025-09-07T10:12:07.3325405Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-09-07T10:12:07.3325665Z TERM=vt100 2025-09-07T10:12:07.3325819Z INSTALLED_VISION=yes 2025-09-07T10:12:07.3326003Z BRANCH=main 2025-09-07T10:12:07.3326166Z SCCACHE_REGION=us-east-1 2025-09-07T10:12:07.3326373Z OPENSSL_ROOT_DIR=/opt/openssl 2025-09-07T10:12:07.3326581Z CUDA_PATH=/usr/local/cuda 2025-09-07T10:12:07.3326894Z GITHUB_ACTION_PATH=/home/eve/_work/pytorch/pytorch/./.github/actions/setup-linux 2025-09-07T10:12:07.3327249Z GITHUB_SERVER_URL=https://github.com 2025-09-07T10:12:07.3327521Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96 2025-09-07T10:12:07.3327766Z REENABLED_ISSUES= 2025-09-07T10:12:07.3327928Z DOCS= 2025-09-07T10:12:07.3328072Z SHLVL=1 2025-09-07T10:12:07.3328222Z MAX_JOBS=22 2025-09-07T10:12:07.3328375Z GITHUB_ACTOR_ID=97764156 2025-09-07T10:12:07.3328629Z GITHUB_WORKFLOW_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T10:12:07.3328902Z GITHUB_REF_NAME=main 2025-09-07T10:12:07.3329177Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2025-09-07T10:12:07.3329479Z GITHUB_JOB=test 2025-09-07T10:12:07.3329651Z NO_TEST_TIMEOUT=False 2025-09-07T10:12:07.3329847Z TD_DISTRIBUTED=False 2025-09-07T10:12:07.3330049Z GITHUB_REPOSITORY=pytorch/pytorch 2025-09-07T10:12:07.3330269Z GITHUB_RETENTION_DAYS=90 2025-09-07T10:12:07.3330461Z OPENSSL_DIR=/opt/openssl 2025-09-07T10:12:07.3330652Z GITHUB_ACTION_REPOSITORY= 2025-09-07T10:12:07.3331201Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-09-07T10:12:07.3331758Z GITHUB_BASE_REF= 2025-09-07T10:12:07.3331920Z INSTALLED_ACL= 2025-09-07T10:12:07.3332224Z ARTIFACTS_FILE_SUFFIX=test-inductor_timm_perf_cuda_h100-4-7-linux.aws.h100_49775781837 2025-09-07T10:12:07.3332571Z CI=true 2025-09-07T10:12:07.3332737Z GITHUB_REPOSITORY_OWNER=pytorch 2025-09-07T10:12:07.3332979Z RUST_LOG=sccache::server=error 2025-09-07T10:12:07.3333183Z JOB_ID=49775781837 2025-09-07T10:12:07.3333351Z GITHUB_HEAD_REF= 2025-09-07T10:12:07.3333523Z GITHUB_ACTION_REF= 2025-09-07T10:12:07.3333736Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2025-09-07T10:12:07.3333996Z TEST_SHOWLOCALS=False 2025-09-07T10:12:07.3334206Z GITHUB_WORKFLOW=inductor-perf-nightly-h100 2025-09-07T10:12:07.3334457Z DEBIAN_FRONTEND=noninteractive 2025-09-07T10:12:07.3334831Z GITHUB_OUTPUT=/home/eve/_work/_temp/_runner_file_commands/set_output_968ff597-39bb-45d2-b1e7-83430a4990dd 2025-09-07T10:12:07.3335347Z NO_TD=False 2025-09-07T10:12:07.3335525Z SKIP_SCCACHE_INITIALIZATION=1 2025-09-07T10:12:07.3335746Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/ 2025-09-07T10:12:07.3335967Z _=/usr/bin/env 2025-09-07T10:12:07.3336136Z + echo 'Testing pytorch' 2025-09-07T10:12:07.3336330Z Testing pytorch 2025-09-07T10:12:07.3336691Z + export LANG=C.UTF-8 2025-09-07T10:12:07.3336862Z + LANG=C.UTF-8 2025-09-07T10:12:07.3337026Z + PR_NUMBER= 2025-09-07T10:12:07.3337229Z + [[ inductor_timm_perf_cuda_h100 == \d\e\f\a\u\l\t ]] 2025-09-07T10:12:07.3337657Z + [[ inductor_timm_perf_cuda_h100 == \d\i\s\t\r\i\b\u\t\e\d ]] 2025-09-07T10:12:07.3337951Z + [[ inductor_timm_perf_cuda_h100 == \s\l\o\w ]] 2025-09-07T10:12:07.3338268Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *slow-gradcheck* ]] 2025-09-07T10:12:07.3338610Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *cuda* ]] 2025-09-07T10:12:07.3338892Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-09-07T10:12:07.3339142Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-09-07T10:12:07.3339408Z + [[ inductor_timm_perf_cuda_h100 == *crossref* ]] 2025-09-07T10:12:07.3339695Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *rocm* ]] 2025-09-07T10:12:07.3339990Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *xpu* ]] 2025-09-07T10:12:07.3340284Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 != *-bazel-* ]] 2025-09-07T10:12:07.3340556Z + pip_install ninja==1.10.2 2025-09-07T10:12:07.3340823Z + pip_install_pkg='python3 -m pip install --progress-bar off' 2025-09-07T10:12:07.3341158Z + python3 -m pip install --progress-bar off ninja==1.10.2 2025-09-07T10:12:08.1706291Z Collecting ninja==1.10.2 2025-09-07T10:12:08.2132841Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB) 2025-09-07T10:12:08.6845478Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) 2025-09-07T10:12:10.2907936Z Installing collected packages: ninja 2025-09-07T10:12:10.2908371Z Attempting uninstall: ninja 2025-09-07T10:12:10.2916540Z Found existing installation: ninja 1.11.1.3 2025-09-07T10:12:10.2938279Z Uninstalling ninja-1.11.1.3: 2025-09-07T10:12:11.0454609Z Successfully uninstalled ninja-1.11.1.3 2025-09-07T10:12:12.0588594Z Successfully installed ninja-1.10.2 2025-09-07T10:12:12.2864234Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-09-07T10:12:12.2865977Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-09-07T10:12:12.2866850Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *aarch64* ]] 2025-09-07T10:12:12.2867286Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *asan* ]] 2025-09-07T10:12:12.2867811Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *-debug* ]] 2025-09-07T10:12:12.2868291Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 != *-bazel-* ]] 2025-09-07T10:12:12.2869016Z + echo 'We are not in debug mode: linux-jammy-cuda12.8-py3.10-gcc9-sm90. Expect the assertion to pass' 2025-09-07T10:12:12.2869660Z We are not in debug mode: linux-jammy-cuda12.8-py3.10-gcc9-sm90. Expect the assertion to pass 2025-09-07T10:12:12.2871476Z + cd test 2025-09-07T10:12:12.2881829Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)' 2025-09-07T10:12:12.8461582Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:12:12.8463185Z import pynvml # type: ignore[import] 2025-09-07T10:12:13.8872393Z + [[ inductor_timm_perf_cuda_h100 == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2025-09-07T10:12:13.8872861Z + [[ inductor_timm_perf_cuda_h100 == \n\o\g\p\u\_\A\V\X\5\1\2 ]] 2025-09-07T10:12:13.8873309Z + [[ inductor_timm_perf_cuda_h100 == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]] 2025-09-07T10:12:13.8875998Z + DYNAMO_BENCHMARK_FLAGS=() 2025-09-07T10:12:13.8876582Z + [[ inductor_timm_perf_cuda_h100 == *pr_time_benchmarks* ]] 2025-09-07T10:12:13.8876989Z + [[ inductor_timm_perf_cuda_h100 == *dynamo_eager* ]] 2025-09-07T10:12:14.2321888Z + [[ inductor_timm_perf_cuda_h100 == *aot_eager* ]] 2025-09-07T10:12:14.2322262Z + [[ inductor_timm_perf_cuda_h100 == *aot_inductor* ]] 2025-09-07T10:12:14.2322659Z + [[ inductor_timm_perf_cuda_h100 == *max_autotune_inductor* ]] 2025-09-07T10:12:14.2323334Z + [[ inductor_timm_perf_cuda_h100 == *inductor* ]] 2025-09-07T10:12:14.2323685Z + [[ inductor_timm_perf_cuda_h100 != *perf* ]] 2025-09-07T10:12:14.2324026Z + [[ inductor_timm_perf_cuda_h100 == *dynamic* ]] 2025-09-07T10:12:14.2324343Z + [[ inductor_timm_perf_cuda_h100 == *cpu* ]] 2025-09-07T10:12:14.2324661Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda) 2025-09-07T10:12:14.2325232Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *libtorch* ]] 2025-09-07T10:12:14.2325640Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *-bazel-* ]] 2025-09-07T10:12:14.2325961Z + cd test 2025-09-07T10:12:14.2326241Z + python -c 'import torch; print(torch.__config__.show())' 2025-09-07T10:12:14.4169317Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:12:14.4170947Z import pynvml # type: ignore[import] 2025-09-07T10:12:15.9779235Z PyTorch built with: 2025-09-07T10:12:15.9779530Z - GCC 9.5 2025-09-07T10:12:15.9779742Z - C++ Version: 201703 2025-09-07T10:12:15.9780295Z - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-09-07T10:12:15.9780981Z - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-09-07T10:12:15.9781477Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-09-07T10:12:15.9781801Z - LAPACK is enabled (usually provided by MKL) 2025-09-07T10:12:15.9782108Z - NNPACK is enabled 2025-09-07T10:12:15.9782346Z - CPU capability usage: AVX2 2025-09-07T10:12:15.9782602Z - CUDA Runtime 12.8 2025-09-07T10:12:15.9782957Z - NVCC architecture flags: -gencode;arch=compute_90,code=sm_90 2025-09-07T10:12:15.9783325Z - CuDNN 90.8 2025-09-07T10:12:15.9787806Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=93fb23d6fae7c4e82c4239a1033e522088742634, CUDA_VERSION=12.8, CUDNN_VERSION=9.8.0, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Werror -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=ON, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 2025-09-07T10:12:15.9792129Z 2025-09-07T10:12:16.9438026Z + cd test 2025-09-07T10:12:16.9438389Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2025-09-07T10:12:17.4607099Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:12:17.4609327Z import pynvml # type: ignore[import] 2025-09-07T10:12:18.2301242Z ATen/Parallel: 2025-09-07T10:12:18.2301572Z at::get_num_threads() : 24 2025-09-07T10:12:18.2301830Z at::get_num_interop_threads() : 96 2025-09-07T10:12:18.2302517Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-09-07T10:12:18.2302758Z omp_get_max_threads() : 24 2025-09-07T10:12:18.2303207Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-09-07T10:12:18.2303883Z mkl_get_max_threads() : 24 2025-09-07T10:12:18.2304200Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-09-07T10:12:18.2304560Z std::thread::hardware_concurrency() : 192 2025-09-07T10:12:18.2304829Z Environment variables: 2025-09-07T10:12:18.2305209Z OMP_NUM_THREADS : [not set] 2025-09-07T10:12:18.2305441Z MKL_NUM_THREADS : [not set] 2025-09-07T10:12:18.2305675Z ATen parallel backend: OpenMP 2025-09-07T10:12:18.2305826Z 2025-09-07T10:12:18.5186091Z + [[ inductor_timm_perf_cuda_h100 == *numpy_2* ]] 2025-09-07T10:12:18.5186480Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *aarch64* ]] 2025-09-07T10:12:18.5186823Z + [[ inductor_timm_perf_cuda_h100 == *backward* ]] 2025-09-07T10:12:18.5187128Z + [[ inductor_timm_perf_cuda_h100 == *xla* ]] 2025-09-07T10:12:18.5187400Z + [[ inductor_timm_perf_cuda_h100 == *vllm* ]] 2025-09-07T10:12:18.5187692Z + [[ inductor_timm_perf_cuda_h100 == *executorch* ]] 2025-09-07T10:12:18.5188029Z + [[ inductor_timm_perf_cuda_h100 == \j\i\t\_\l\e\g\a\c\y ]] 2025-09-07T10:12:18.5188381Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *libtorch* ]] 2025-09-07T10:12:18.5188722Z + [[ inductor_timm_perf_cuda_h100 == distributed ]] 2025-09-07T10:12:18.5189037Z + [[ inductor_timm_perf_cuda_h100 == *operator_benchmark* ]] 2025-09-07T10:12:18.5189386Z + [[ inductor_timm_perf_cuda_h100 == *inductor_distributed* ]] 2025-09-07T10:12:18.5189732Z + [[ inductor_timm_perf_cuda_h100 == *inductor-halide* ]] 2025-09-07T10:12:18.5190081Z + [[ inductor_timm_perf_cuda_h100 == *inductor-triton-cpu* ]] 2025-09-07T10:12:18.5190477Z + [[ inductor_timm_perf_cuda_h100 == *inductor-micro-benchmark* ]] 2025-09-07T10:12:18.5190864Z + [[ inductor_timm_perf_cuda_h100 == *huggingface* ]] 2025-09-07T10:12:18.5191178Z + [[ inductor_timm_perf_cuda_h100 == *timm* ]] 2025-09-07T10:12:18.5191461Z + install_torchvision 2025-09-07T10:12:18.5191669Z + local orig_preload 2025-09-07T10:12:18.5191872Z + local commit 2025-09-07T10:12:18.5192073Z ++ get_pinned_commit vision 2025-09-07T10:12:18.5192328Z ++ cat .github/ci_commit_pins/vision.txt 2025-09-07T10:12:18.5206807Z + commit=966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-09-07T10:12:18.5207074Z + orig_preload= 2025-09-07T10:12:18.5207253Z + '[' -n '' ']' 2025-09-07T10:12:18.5207478Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *cuda* ]] 2025-09-07T10:12:18.5207740Z + export FORCE_CUDA=1 2025-09-07T10:12:18.5207912Z + FORCE_CUDA=1 2025-09-07T10:12:18.5208074Z + export WITH_CUDA=1 2025-09-07T10:12:18.5208246Z + WITH_CUDA=1 2025-09-07T10:12:18.5208679Z + pip_build_and_install git+https://github.com/pytorch/vision.git@966da7e46f65d6d49df3e31214470a4fe5cc8e66 dist/vision 2025-09-07T10:12:18.5209324Z + local build_target=git+https://github.com/pytorch/vision.git@966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-09-07T10:12:18.5209750Z + local wheel_dir=dist/vision 2025-09-07T10:12:18.5209959Z + local found_whl=0 2025-09-07T10:12:18.5210154Z + for file in "${wheel_dir}"/*.whl 2025-09-07T10:12:18.5210499Z + [[ -f dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl ]] 2025-09-07T10:12:18.5210829Z + found_whl=1 2025-09-07T10:12:18.5210987Z + break 2025-09-07T10:12:18.5211135Z + '[' 1 == 0 ']' 2025-09-07T10:12:18.5211305Z + for file in "${wheel_dir}"/*.whl 2025-09-07T10:12:18.5211674Z + pip_install_whl dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl 2025-09-07T10:12:18.5212149Z + args=('dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl') 2025-09-07T10:12:18.5212472Z + local args 2025-09-07T10:12:18.5212755Z + [[ dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl == *\ * ]] 2025-09-07T10:12:18.5213106Z + for path in "${args[@]}" 2025-09-07T10:12:18.5213451Z + echo 'Installing dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl' 2025-09-07T10:12:18.5214291Z Installing dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl 2025-09-07T10:12:18.5215426Z + python3 -mpip install --no-index --no-deps dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl 2025-09-07T10:12:18.8625241Z Processing ./dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl 2025-09-07T10:12:18.8714273Z Installing collected packages: torchvision 2025-09-07T10:12:19.3055192Z Successfully installed torchvision-0.22.0a0+966da7e 2025-09-07T10:12:19.3376242Z + '[' -n '' ']' 2025-09-07T10:12:19.3376474Z + id=3 2025-09-07T10:12:19.3376699Z + test_dynamo_benchmark timm_models 3 2025-09-07T10:12:19.3382782Z ++ pwd 2025-09-07T10:12:19.3385499Z + TEST_REPORTS_DIR=/var/lib/jenkins/workspace/test/test-reports 2025-09-07T10:12:19.3385812Z + local suite=timm_models 2025-09-07T10:12:19.3386029Z + shift 2025-09-07T10:12:19.3386190Z + local shard_id=3 2025-09-07T10:12:19.3386365Z + shift 2025-09-07T10:12:19.3386595Z + [[ inductor_timm_perf_cuda_h100 == *perf_compare* ]] 2025-09-07T10:12:19.3386887Z + [[ inductor_timm_perf_cuda_h100 == *perf* ]] 2025-09-07T10:12:19.3387144Z + [[ inductor_timm_perf_cuda_h100 == *b200* ]] 2025-09-07T10:12:19.3387445Z + test_single_dynamo_benchmark dashboard timm_models 3 2025-09-07T10:12:19.3391136Z ++ pwd 2025-09-07T10:12:19.3393370Z + TEST_REPORTS_DIR=/var/lib/jenkins/workspace/test/test-reports 2025-09-07T10:12:19.3393779Z + mkdir -p /var/lib/jenkins/workspace/test/test-reports 2025-09-07T10:12:19.3414813Z + local name=dashboard 2025-09-07T10:12:19.3415260Z + shift 2025-09-07T10:12:19.3415421Z + local suite=timm_models 2025-09-07T10:12:19.3415613Z + shift 2025-09-07T10:12:19.3415766Z + local shard_id=3 2025-09-07T10:12:19.3415932Z + shift 2025-09-07T10:12:19.3416076Z + partition_flags=() 2025-09-07T10:12:19.3416264Z + local partition_flags 2025-09-07T10:12:19.3416450Z + [[ -n 7 ]] 2025-09-07T10:12:19.3416608Z + [[ -n 3 ]] 2025-09-07T10:12:19.3416902Z + partition_flags=(--total-partitions "$NUM_TEST_SHARDS" --partition-id "$shard_id") 2025-09-07T10:12:19.3417318Z + [[ inductor_timm_perf_cuda_h100 == *perf_compare* ]] 2025-09-07T10:12:19.3417600Z + [[ inductor_timm_perf_cuda_h100 == *perf* ]] 2025-09-07T10:12:19.3417993Z + test_perf_for_dashboard timm_models --device cuda --total-partitions 7 --partition-id 3 2025-09-07T10:12:19.3419882Z ++ pwd 2025-09-07T10:12:19.3422407Z + TEST_REPORTS_DIR=/var/lib/jenkins/workspace/test/test-reports 2025-09-07T10:12:19.3422831Z + mkdir -p /var/lib/jenkins/workspace/test/test-reports 2025-09-07T10:12:19.3439000Z + local suite=timm_models 2025-09-07T10:12:19.3439235Z + shift 2025-09-07T10:12:19.3439417Z + local backend=inductor 2025-09-07T10:12:19.3439643Z + modes=() 2025-09-07T10:12:19.3439822Z + local modes 2025-09-07T10:12:19.3440926Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *training-true* ]] 2025-09-07T10:12:19.3442102Z + modes+=(training) 2025-09-07T10:12:19.3443138Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *inference-true* ]] 2025-09-07T10:12:19.3444215Z + modes+=(inference) 2025-09-07T10:12:19.3444432Z + targets=('accuracy' 'performance') 2025-09-07T10:12:19.3444678Z + local targets 2025-09-07T10:12:19.3444860Z + local device=cuda 2025-09-07T10:12:19.3445235Z + [[ inductor_timm_perf_cuda_h100 == *cpu* ]] 2025-09-07T10:12:19.3445522Z + [[ inductor_timm_perf_cuda_h100 == *cuda_a10g* ]] 2025-09-07T10:12:19.3445818Z + [[ inductor_timm_perf_cuda_h100 == *h100* ]] 2025-09-07T10:12:19.3446067Z + device=cuda_h100 2025-09-07T10:12:19.3446265Z + for mode in "${modes[@]}" 2025-09-07T10:12:19.3446488Z + [[ training == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T10:12:19.3446731Z + [[ training == \t\r\a\i\n\i\n\g ]] 2025-09-07T10:12:19.3447421Z + dtype=amp 2025-09-07T10:12:19.3447607Z + for target in "${targets[@]}" 2025-09-07T10:12:19.3447838Z + target_flag=('--accuracy') 2025-09-07T10:12:19.3448056Z + local target_flag 2025-09-07T10:12:19.3448446Z + [[ accuracy == \p\e\r\f\o\r\m\a\n\c\e ]] 2025-09-07T10:12:19.3448716Z + [[ accuracy == \a\c\c\u\r\a\c\y ]] 2025-09-07T10:12:19.3448977Z + target_flag+=(--no-translation-validation) 2025-09-07T10:12:19.3450080Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *freezing-true* ]] 2025-09-07T10:12:19.3451873Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *default-true* ]] 2025-09-07T10:12:19.3453607Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --training --amp --backend inductor --disable-cudagraphs --device cuda --total-partitions 7 --partition-id 3 --output /var/lib/jenkins/workspace/test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.csv 2025-09-07T10:12:20.3597192Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:12:20.3598361Z import pynvml # type: ignore[import] 2025-09-07T10:12:25.3223771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:12:25.3225443Z import pynvml # type: ignore[import] 2025-09-07T10:12:28.3801403Z 2025-09-07T10:12:29.3844632Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:12:29.3844913Z 2025-09-07T10:12:29.5041029Z model.safetensors: 0% 0.00/85.6M [00:00 will be ignored 2025-09-07T10:17:27.8515969Z E0907 10:17:27.850000 3549 site-packages/torch/_dynamo/utils.py:3115] RMSE (res-fp64): 0.04286, (ref-fp64): 0.01055 and shape=torch.Size([128, 128, 7, 1]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.040000, use_larger_multiplier_for_smaller_tensor: 1 2025-09-07T10:17:27.8519108Z E0907 10:17:27.851000 3549 site-packages/torch/_dynamo/utils.py:2976] Accuracy failed for key name Mixed_6b.branch7x7dbl_4.conv.weight.grad 2025-09-07T10:17:27.8807135Z fail_accuracy 2025-09-07T10:17:35.2869596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:17:35.2871052Z import pynvml # type: ignore[import] 2025-09-07T10:17:38.2833801Z 2025-09-07T10:17:39.5352871Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:17:39.5353127Z 2025-09-07T10:17:39.6518397Z model.safetensors: 0% 0.00/271M [00:00 will be ignored 2025-09-07T10:19:24.8188576Z pass 2025-09-07T10:19:31.5499882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:19:31.5501119Z import pynvml # type: ignore[import] 2025-09-07T10:19:34.5623051Z 2025-09-07T10:19:34.7569231Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:19:34.7569604Z 2025-09-07T10:19:34.7931261Z model.safetensors: 0% 0.00/7.56M [00:00 will be ignored 2025-09-07T10:21:31.4616293Z pass 2025-09-07T10:21:37.8386062Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:21:37.8387323Z import pynvml # type: ignore[import] 2025-09-07T10:21:40.8828143Z 2025-09-07T10:21:41.7432615Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:21:41.7432855Z 2025-09-07T10:21:41.8603123Z model.safetensors: 0% 0.00/240M [00:00 will be ignored 2025-09-07T10:22:25.6775793Z pass 2025-09-07T10:22:30.3732201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:22:30.3734015Z import pynvml # type: ignore[import] 2025-09-07T10:22:33.4536105Z 2025-09-07T10:22:33.8125733Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:22:33.8126019Z 2025-09-07T10:22:33.9231118Z model.safetensors: 0% 0.00/29.6M [00:00 will be ignored 2025-09-07T10:24:39.1616891Z pass 2025-09-07T10:24:43.4542576Z accuracy pass_rate=87.50% 2025-09-07T10:24:43.4549749Z calls_captured gmean=969.23x mean=1248.750x 2025-09-07T10:24:43.4553858Z unique_graphs gmean=2.67x mean=2.750x 2025-09-07T10:24:43.4558260Z graph_breaks gmean=6.46x mean=6.500x 2025-09-07T10:24:43.4561809Z unique_graph_breaks gmean=4.86x mean=4.875x 2025-09-07T10:24:43.4565875Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T10:24:43.4570127Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T10:24:43.4573555Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T10:24:43.4574800Z compilation_latency mean=79.462 seconds 2025-09-07T10:24:44.5924922Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *cudagraphs-true* ]] 2025-09-07T10:24:44.5927575Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --training --amp --backend inductor --device cuda --total-partitions 7 --partition-id 3 --output /var/lib/jenkins/workspace/test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.csv 2025-09-07T10:24:45.6268169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:24:45.6269332Z import pynvml # type: ignore[import] 2025-09-07T10:24:50.5296516Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:24:50.5297978Z import pynvml # type: ignore[import] 2025-09-07T10:24:53.5343224Z 2025-09-07T10:24:55.6760217Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:24:55.6760555Z loading model: 0it [00:02, ?it/s] 2025-09-07T10:24:55.6760838Z cuda train hrnet_w18 2025-09-07T10:27:34.9828870Z skipping cudagraphs due to disabling cudagraphs due to incompatible op aten.index_put.default Found from File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 442, in torch_dynamo_resume_in_forward_and_backward_pass_at_440 2025-09-07T10:27:34.9830125Z pred = mod(*cloned_inputs) 2025-09-07T10:27:34.9830612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/hrnet.py", line 792, in forward 2025-09-07T10:27:34.9831100Z y = self.forward_features(x) 2025-09-07T10:27:34.9831610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/hrnet.py", line 770, in forward_features 2025-09-07T10:27:34.9832126Z yl = self.stages(x) 2025-09-07T10:27:34.9832550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/hrnet.py", line 757, in stages 2025-09-07T10:27:34.9833014Z yl = self.stage4(xl) 2025-09-07T10:27:34.9833439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/hrnet.py", line 506, in forward 2025-09-07T10:27:34.9833893Z x = module(x) 2025-09-07T10:27:34.9834291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/hrnet.py", line 484, in forward 2025-09-07T10:27:34.9834797Z y = y + f(x[j]) 2025-09-07T10:27:34.9835930Z 2025-09-07T10:27:34.9835934Z 2025-09-07T10:27:36.5382390Z pass 2025-09-07T10:27:44.2172904Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:27:44.2175500Z import pynvml # type: ignore[import] 2025-09-07T10:27:47.2092649Z 2025-09-07T10:27:49.2235313Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:27:49.2235758Z loading model: 0it [00:02, ?it/s] 2025-09-07T10:27:49.2236135Z cuda train inception_v3 2025-09-07T10:29:12.6382649Z W0907 10:29:12.637000 23569 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T10:29:55.2313731Z E0907 10:29:55.230000 23569 site-packages/torch/_dynamo/utils.py:3115] RMSE (res-fp64): 0.04286, (ref-fp64): 0.01055 and shape=torch.Size([128, 128, 7, 1]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.040000, use_larger_multiplier_for_smaller_tensor: 1 2025-09-07T10:29:55.2316364Z E0907 10:29:55.231000 23569 site-packages/torch/_dynamo/utils.py:2976] Accuracy failed for key name Mixed_6b.branch7x7dbl_4.conv.weight.grad 2025-09-07T10:29:55.2891904Z fail_accuracy 2025-09-07T10:30:02.6240209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:30:02.6241437Z import pynvml # type: ignore[import] 2025-09-07T10:30:05.6374190Z 2025-09-07T10:30:07.8119505Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:30:07.8119853Z loading model: 0it [00:02, ?it/s] 2025-09-07T10:30:07.8120129Z cuda train jx_nest_base 2025-09-07T10:31:04.5280141Z W0907 10:31:04.527000 23828 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T10:31:50.6969099Z pass 2025-09-07T10:31:57.4020045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:31:57.4022117Z import pynvml # type: ignore[import] 2025-09-07T10:32:00.3984345Z 2025-09-07T10:32:01.3660607Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:32:01.3661166Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:32:01.3661806Z cuda train lcnet_050 2025-09-07T10:32:24.4743247Z pass 2025-09-07T10:32:28.4869706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:32:28.4871577Z import pynvml # type: ignore[import] 2025-09-07T10:32:31.4754727Z 2025-09-07T10:32:32.7214918Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:32:32.7215505Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:32:32.7215791Z cuda train levit_128 2025-09-07T10:33:25.4858022Z skipping cudagraphs due to disabling cudagraphs due to incompatible op aten.index_put.default Found from File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 442, in torch_dynamo_resume_in_forward_and_backward_pass_at_440 2025-09-07T10:33:25.4859580Z pred = mod(*cloned_inputs) 2025-09-07T10:33:25.4860300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 718, in forward 2025-09-07T10:33:25.4860995Z x = self.forward_features(x) 2025-09-07T10:33:25.4861800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 709, in forward_features 2025-09-07T10:33:25.4874202Z x = self.stages(x) 2025-09-07T10:33:25.4874877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 520, in forward 2025-09-07T10:33:25.4876197Z x = self.blocks(x) 2025-09-07T10:33:25.4876841Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 458, in forward 2025-09-07T10:33:25.4877571Z x = x + self.drop_path1(self.attn(x)) 2025-09-07T10:33:25.4878277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 237, in forward 2025-09-07T10:33:25.4879051Z attn = q @ k * self.scale + self.get_attention_biases(x.device) 2025-09-07T10:33:25.4879886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 212, in get_attention_biases 2025-09-07T10:33:25.4880756Z return self.attention_biases[:, self.attention_bias_idxs] 2025-09-07T10:33:25.4881131Z 2025-09-07T10:33:25.4881136Z 2025-09-07T10:33:25.8800992Z W0907 10:33:25.879000 24346 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T10:33:55.1094403Z pass 2025-09-07T10:34:01.1316319Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:34:01.1317539Z import pynvml # type: ignore[import] 2025-09-07T10:34:04.1933886Z 2025-09-07T10:34:05.5145554Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:34:05.5145927Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:34:05.5146227Z cuda train mixer_b16_224 2025-09-07T10:34:26.3334205Z W0907 10:34:26.332000 24606 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T10:34:46.5378330Z pass 2025-09-07T10:34:51.8783803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:34:51.8784894Z import pynvml # type: ignore[import] 2025-09-07T10:34:54.9713891Z 2025-09-07T10:34:56.3823840Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:34:56.3824215Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:34:56.3824510Z cuda train mixnet_l 2025-09-07T10:35:55.9420074Z pass 2025-09-07T10:36:01.3609714Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:36:01.3611583Z import pynvml # type: ignore[import] 2025-09-07T10:36:04.3487753Z 2025-09-07T10:36:05.6749996Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:36:05.6750353Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:36:05.6750695Z cuda train mnasnet_100 2025-09-07T10:36:37.3066793Z W0907 10:36:37.305000 25125 site-packages/torch/_logging/_internal.py:1199] [7/0] Profiler function will be ignored 2025-09-07T10:37:00.3752839Z pass 2025-09-07T10:37:04.1020308Z accuracy pass_rate=87.50% 2025-09-07T10:37:04.1024539Z calls_captured gmean=969.23x mean=1248.750x 2025-09-07T10:37:04.1028323Z unique_graphs gmean=2.67x mean=2.750x 2025-09-07T10:37:04.1031895Z graph_breaks gmean=6.46x mean=6.500x 2025-09-07T10:37:04.1035285Z unique_graph_breaks gmean=4.86x mean=4.875x 2025-09-07T10:37:04.1038942Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T10:37:04.1042133Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T10:37:04.1045880Z cudagraph_skips gmean=0.00x mean=0.250x 2025-09-07T10:37:04.1046910Z compilation_latency mean=79.048 seconds 2025-09-07T10:37:05.3086616Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *dynamic-true* ]] 2025-09-07T10:37:05.3088997Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --training --amp --backend inductor --dynamic-shapes --dynamic-batch-only --device cuda --total-partitions 7 --partition-id 3 --output /var/lib/jenkins/workspace/test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_accuracy.csv 2025-09-07T10:37:06.3288046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:37:06.3289302Z import pynvml # type: ignore[import] 2025-09-07T10:37:11.1054481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:37:11.1056345Z import pynvml # type: ignore[import] 2025-09-07T10:37:14.1913707Z 2025-09-07T10:37:16.2558140Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:37:16.2558507Z loading model: 0it [00:02, ?it/s] 2025-09-07T10:37:16.2558831Z cuda train hrnet_w18 2025-09-07T10:37:43.1498151Z skipping cudagraphs due to disabling cudagraphs due to incompatible op aten.index_put.default Found from File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 442, in torch_dynamo_resume_in_forward_and_backward_pass_at_440 2025-09-07T10:37:43.1499269Z pred = mod(*cloned_inputs) 2025-09-07T10:37:43.1499785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/hrnet.py", line 792, in forward 2025-09-07T10:37:43.1500328Z y = self.forward_features(x) 2025-09-07T10:37:43.1500861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/hrnet.py", line 770, in forward_features 2025-09-07T10:37:43.1501529Z yl = self.stages(x) 2025-09-07T10:37:43.1501992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/hrnet.py", line 757, in stages 2025-09-07T10:37:43.1502544Z yl = self.stage4(xl) 2025-09-07T10:37:43.1502970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/hrnet.py", line 506, in forward 2025-09-07T10:37:43.1503397Z x = module(x) 2025-09-07T10:37:43.1503768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/hrnet.py", line 484, in forward 2025-09-07T10:37:43.1504202Z y = y + f(x[j]) 2025-09-07T10:37:43.1504322Z 2025-09-07T10:37:43.1504325Z 2025-09-07T10:37:48.5821845Z pass 2025-09-07T10:37:52.5017846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:37:52.5019087Z import pynvml # type: ignore[import] 2025-09-07T10:37:55.4945151Z 2025-09-07T10:37:57.0859842Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:37:57.0860228Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:37:57.0866207Z cuda train inception_v3 2025-09-07T10:38:09.3697510Z W0907 10:38:09.368000 25772 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T10:38:18.9456161Z E0907 10:38:18.944000 25772 site-packages/torch/_dynamo/utils.py:3115] RMSE (res-fp64): 0.04286, (ref-fp64): 0.01055 and shape=torch.Size([128, 128, 7, 1]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.040000, use_larger_multiplier_for_smaller_tensor: 1 2025-09-07T10:38:18.9459254Z E0907 10:38:18.945000 25772 site-packages/torch/_dynamo/utils.py:2976] Accuracy failed for key name Mixed_6b.branch7x7dbl_4.conv.weight.grad 2025-09-07T10:38:18.9746715Z fail_accuracy 2025-09-07T10:38:22.7211230Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:38:22.7213007Z import pynvml # type: ignore[import] 2025-09-07T10:38:25.7093397Z 2025-09-07T10:38:28.6183626Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:38:28.6184142Z loading model: 0it [00:02, ?it/s] 2025-09-07T10:38:28.6184588Z cuda train jx_nest_base 2025-09-07T10:38:39.3629393Z W0907 10:38:39.362000 26042 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T10:38:50.0123521Z pass 2025-09-07T10:38:53.7079634Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:38:53.7081315Z import pynvml # type: ignore[import] 2025-09-07T10:38:56.6845879Z 2025-09-07T10:38:57.6828222Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:38:57.6828523Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:38:57.6828802Z cuda train lcnet_050 2025-09-07T10:39:04.9372188Z pass 2025-09-07T10:39:08.2945481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:39:08.2946927Z import pynvml # type: ignore[import] 2025-09-07T10:39:11.2809515Z 2025-09-07T10:39:12.6847695Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:39:12.6848053Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:39:12.6848360Z cuda train levit_128 2025-09-07T10:39:22.1058065Z skipping cudagraphs due to disabling cudagraphs due to incompatible op aten.index_put.default Found from File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 442, in torch_dynamo_resume_in_forward_and_backward_pass_at_440 2025-09-07T10:39:22.1059127Z pred = mod(*cloned_inputs) 2025-09-07T10:39:22.1059644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 718, in forward 2025-09-07T10:39:22.1060155Z x = self.forward_features(x) 2025-09-07T10:39:22.1060729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 709, in forward_features 2025-09-07T10:39:22.1061187Z x = self.stages(x) 2025-09-07T10:39:22.1061650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 520, in forward 2025-09-07T10:39:22.1062054Z x = self.blocks(x) 2025-09-07T10:39:22.1062415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 458, in forward 2025-09-07T10:39:22.1062836Z x = x + self.drop_path1(self.attn(x)) 2025-09-07T10:39:22.1063237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 237, in forward 2025-09-07T10:39:22.1063702Z attn = q @ k * self.scale + self.get_attention_biases(x.device) 2025-09-07T10:39:22.1064209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 212, in get_attention_biases 2025-09-07T10:39:22.1064729Z return self.attention_biases[:, self.attention_bias_idxs] 2025-09-07T10:39:22.1065719Z 2025-09-07T10:39:22.1065723Z 2025-09-07T10:39:23.1999059Z W0907 10:39:23.199000 26614 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T10:39:29.2189073Z pass 2025-09-07T10:39:32.7315867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:39:32.7317165Z import pynvml # type: ignore[import] 2025-09-07T10:39:35.7375915Z 2025-09-07T10:39:37.3098421Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:39:37.3098786Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:39:37.3099095Z cuda train mixer_b16_224 2025-09-07T10:39:43.9777234Z W0907 10:39:43.976000 26884 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T10:39:48.5009298Z pass 2025-09-07T10:39:51.8968394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:39:51.8969846Z import pynvml # type: ignore[import] 2025-09-07T10:39:54.9356252Z 2025-09-07T10:39:56.7223676Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:39:56.7224009Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:39:56.7224379Z cuda train mixnet_l 2025-09-07T10:40:09.9231388Z pass 2025-09-07T10:40:13.4239701Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:40:13.4241500Z import pynvml # type: ignore[import] 2025-09-07T10:40:16.4142867Z 2025-09-07T10:40:17.7860830Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:40:17.7861462Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:40:17.7861911Z cuda train mnasnet_100 2025-09-07T10:40:26.4884495Z W0907 10:40:26.487000 27424 site-packages/torch/_logging/_internal.py:1199] [7/0] Profiler function will be ignored 2025-09-07T10:40:31.7047449Z pass 2025-09-07T10:40:34.1953554Z accuracy pass_rate=87.50% 2025-09-07T10:40:34.1958318Z calls_captured gmean=969.23x mean=1248.750x 2025-09-07T10:40:34.1961965Z unique_graphs gmean=2.67x mean=2.750x 2025-09-07T10:40:34.1965805Z graph_breaks gmean=6.46x mean=6.500x 2025-09-07T10:40:34.1969142Z unique_graph_breaks gmean=4.86x mean=4.875x 2025-09-07T10:40:34.1972709Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T10:40:34.1976133Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T10:40:34.1979379Z cudagraph_skips gmean=0.00x mean=0.250x 2025-09-07T10:40:34.1980550Z compilation_latency mean=15.025 seconds 2025-09-07T10:40:35.2710387Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *cppwrapper-true* ]] 2025-09-07T10:40:35.2711813Z + TORCHINDUCTOR_CPP_WRAPPER=1 2025-09-07T10:40:35.2713092Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --training --amp --backend inductor --disable-cudagraphs --device cuda --total-partitions 7 --partition-id 3 --output /var/lib/jenkins/workspace/test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_accuracy.csv 2025-09-07T10:40:36.2765670Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:40:36.2767546Z import pynvml # type: ignore[import] 2025-09-07T10:40:41.1283905Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:40:41.1285231Z import pynvml # type: ignore[import] 2025-09-07T10:40:44.1402230Z 2025-09-07T10:40:46.6772304Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:40:46.6772810Z loading model: 0it [00:02, ?it/s] 2025-09-07T10:40:46.6773224Z cuda train hrnet_w18 2025-09-07T10:48:59.8030541Z pass 2025-09-07T10:49:11.5083876Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:49:11.5085284Z import pynvml # type: ignore[import] 2025-09-07T10:49:14.5349260Z 2025-09-07T10:49:16.1688525Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:49:16.1688876Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:49:16.1689155Z cuda train inception_v3 2025-09-07T10:51:28.2656662Z W0907 10:51:28.264000 29803 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T10:52:33.5778707Z E0907 10:52:33.577000 29803 site-packages/torch/_dynamo/utils.py:3115] RMSE (res-fp64): 0.04286, (ref-fp64): 0.01055 and shape=torch.Size([128, 128, 7, 1]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.040000, use_larger_multiplier_for_smaller_tensor: 1 2025-09-07T10:52:33.5780649Z E0907 10:52:33.577000 29803 site-packages/torch/_dynamo/utils.py:2976] Accuracy failed for key name Mixed_6b.branch7x7dbl_4.conv.weight.grad 2025-09-07T10:52:33.6077424Z fail_accuracy 2025-09-07T10:52:42.2665671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:52:42.2666730Z import pynvml # type: ignore[import] 2025-09-07T10:52:45.2682912Z 2025-09-07T10:52:47.2015339Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:52:47.2015686Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:52:47.2015932Z cuda train jx_nest_base 2025-09-07T10:54:50.6396820Z W0907 10:54:50.638000 30616 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T10:56:01.0627897Z pass 2025-09-07T10:56:08.7543786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:56:08.7546014Z import pynvml # type: ignore[import] 2025-09-07T10:56:11.7392229Z 2025-09-07T10:56:12.7248224Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:56:12.7248531Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:56:12.7248769Z cuda train lcnet_050 2025-09-07T10:56:51.5199946Z pass 2025-09-07T10:56:55.7652586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:56:55.7655799Z import pynvml # type: ignore[import] 2025-09-07T10:56:58.7671523Z 2025-09-07T10:57:00.5829229Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:57:00.5829729Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:57:00.5830972Z cuda train levit_128 2025-09-07T10:58:50.8021861Z W0907 10:58:50.801000 32242 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T10:59:34.8648979Z pass 2025-09-07T10:59:41.6316210Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:59:41.6318304Z import pynvml # type: ignore[import] 2025-09-07T10:59:44.6959501Z 2025-09-07T10:59:46.6265603Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:59:46.6266126Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:59:46.6266543Z cuda train mixer_b16_224 2025-09-07T11:00:27.5555476Z W0907 11:00:27.554000 33152 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T11:00:57.4852770Z pass 2025-09-07T11:01:02.4532254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:01:02.4533532Z import pynvml # type: ignore[import] 2025-09-07T11:01:05.4656255Z 2025-09-07T11:01:07.3062717Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:01:07.3063041Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:01:07.3063314Z cuda train mixnet_l 2025-09-07T11:03:12.2555202Z pass 2025-09-07T11:03:18.4479390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:03:18.4480764Z import pynvml # type: ignore[import] 2025-09-07T11:03:21.4374290Z 2025-09-07T11:03:22.8559203Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:03:22.8559622Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:03:22.8559938Z cuda train mnasnet_100 2025-09-07T11:04:26.5909900Z W0907 11:04:26.589000 34331 site-packages/torch/_logging/_internal.py:1199] [7/0] Profiler function will be ignored 2025-09-07T11:04:58.4488363Z pass 2025-09-07T11:05:03.3688159Z accuracy pass_rate=87.50% 2025-09-07T11:05:03.3693360Z calls_captured gmean=969.23x mean=1248.750x 2025-09-07T11:05:03.3697128Z unique_graphs gmean=2.67x mean=2.750x 2025-09-07T11:05:03.3700490Z graph_breaks gmean=6.46x mean=6.500x 2025-09-07T11:05:03.3704197Z unique_graph_breaks gmean=4.86x mean=4.875x 2025-09-07T11:05:03.3708018Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T11:05:03.3711454Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T11:05:03.3714813Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T11:05:03.3716263Z compilation_latency mean=169.479 seconds 2025-09-07T11:05:04.3996017Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *freezing_cudagraphs-true* ]] 2025-09-07T11:05:04.3997104Z + [[ training == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T11:05:04.3998131Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *freeze_autotune_cudagraphs-true* ]] 2025-09-07T11:05:04.3999731Z + [[ training == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T11:05:04.4000905Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *aotinductor-true* ]] 2025-09-07T11:05:04.4001860Z + [[ training == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T11:05:04.4002819Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *maxautotune-true* ]] 2025-09-07T11:05:04.4003775Z + TORCHINDUCTOR_MAX_AUTOTUNE=1 2025-09-07T11:05:04.4004775Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --training --amp --backend inductor --device cuda --total-partitions 7 --partition-id 3 --output /var/lib/jenkins/workspace/test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_accuracy.csv 2025-09-07T11:05:05.4072712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:05:05.4074003Z import pynvml # type: ignore[import] 2025-09-07T11:05:10.2726281Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:05:10.2727553Z import pynvml # type: ignore[import] 2025-09-07T11:05:13.2781043Z 2025-09-07T11:05:15.6085534Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:05:15.6085913Z loading model: 0it [00:02, ?it/s] 2025-09-07T11:05:15.6086260Z cuda train hrnet_w18 2025-09-07T11:06:55.7300738Z Autotune Choices Stats: 2025-09-07T11:06:55.7302050Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_2196", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.052319999784231186, "best_triton_pos": 0} 2025-09-07T11:06:55.7307656Z AUTOTUNE convolution(8x128x56x56, 256x128x3x3) 2025-09-07T11:06:55.7308060Z strides: [401408, 3136, 56, 1], [1152, 9, 3, 1] 2025-09-07T11:06:55.7308380Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:55.7309290Z triton_convolution2d_2196 0.0523 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:55.7310611Z triton_convolution2d_2193 0.0541 ms 96.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:55.7311878Z triton_convolution2d_2194 0.0613 ms 85.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:55.7313097Z triton_convolution2d_2195 0.0662 ms 79.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:55.7314324Z triton_convolution2d_2191 0.0669 ms 78.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:55.7315358Z convolution 0.0892 ms 58.6% 2025-09-07T11:06:55.7316581Z triton_convolution2d_2190 0.0934 ms 56.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:55.7318041Z triton_convolution2d_2192 0.1829 ms 28.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:06:55.7319029Z SingleProcess AUTOTUNE benchmarking takes 0.1471 seconds and 0.0004 seconds precompiling for 8 choices 2025-09-07T11:06:56.2487024Z Autotune Choices Stats: 2025-09-07T11:06:56.2488522Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.039872001856565475, "best_triton_pos": 1, "best_triton_time": 0.09043200314044952, "best_triton_kernel": "triton_convolution2d_2229", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:06:56.2493190Z AUTOTUNE convolution(8x256x28x28, 512x256x3x3) 2025-09-07T11:06:56.2493630Z strides: [200704, 784, 28, 1], [2304, 9, 3, 1] 2025-09-07T11:06:56.2493912Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:56.2494158Z convolution 0.0399 ms 100.0% 2025-09-07T11:06:56.2494844Z triton_convolution2d_2229 0.0904 ms 44.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:56.2496313Z triton_convolution2d_2231 0.0925 ms 43.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:56.2497461Z triton_convolution2d_2228 0.0959 ms 41.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:56.2498620Z triton_convolution2d_2230 0.1203 ms 33.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:56.2499776Z triton_convolution2d_2226 0.1214 ms 32.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:56.2500850Z triton_convolution2d_2225 0.1766 ms 22.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:56.2501957Z triton_convolution2d_2227 0.2773 ms 14.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:06:56.2502762Z SingleProcess AUTOTUNE benchmarking takes 0.1868 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:06:56.6909521Z Autotune Choices Stats: 2025-09-07T11:06:56.6911005Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.023584000766277313, "best_triton_pos": 1, "best_triton_time": 0.0390079990029335, "best_triton_kernel": "triton_convolution2d_2270", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8"} 2025-09-07T11:06:56.6916211Z AUTOTUNE convolution(8x1024x7x7, 2048x1024x1x1) 2025-09-07T11:06:56.6916563Z strides: [50176, 49, 7, 1], [1024, 1, 1, 1] 2025-09-07T11:06:56.6916853Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:56.6917124Z convolution 0.0236 ms 100.0% 2025-09-07T11:06:56.6918278Z triton_convolution2d_2270 0.0390 ms 60.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:56.6919265Z conv1x1_via_mm 0.0395 ms 59.7% 2025-09-07T11:06:56.6920022Z triton_convolution2d_2271 0.0415 ms 56.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:56.6921206Z triton_convolution2d_2272 0.0495 ms 47.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:56.6922349Z triton_convolution2d_2273 0.0561 ms 42.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:56.6923499Z triton_convolution2d_2267 0.0587 ms 40.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:56.6924636Z triton_convolution2d_2268 0.0683 ms 34.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:56.6926062Z triton_convolution2d_2269 0.0741 ms 31.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:06:56.6926982Z SingleProcess AUTOTUNE benchmarking takes 0.1544 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:06:57.2138248Z Autotune Choices Stats: 2025-09-07T11:06:57.2139738Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.05878400057554245, "best_triton_pos": 1, "best_triton_time": 0.1363839954137802, "best_triton_kernel": "triton_convolution2d_2264", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:06:57.2144283Z AUTOTUNE convolution(8x512x14x14, 1024x512x3x3) 2025-09-07T11:06:57.2144572Z strides: [100352, 196, 14, 1], [4608, 9, 3, 1] 2025-09-07T11:06:57.2144835Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:57.2145364Z convolution 0.0588 ms 100.0% 2025-09-07T11:06:57.2145985Z triton_convolution2d_2264 0.1364 ms 43.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:57.2147011Z triton_convolution2d_2263 0.1864 ms 31.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:57.2148049Z triton_convolution2d_2266 0.1916 ms 30.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:57.2149055Z triton_convolution2d_2265 0.2266 ms 25.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:57.2150111Z triton_convolution2d_2261 0.2397 ms 24.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:57.2151183Z triton_convolution2d_2262 0.3949 ms 14.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:06:57.2152770Z triton_convolution2d_2260 0.4878 ms 12.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:57.2153610Z SingleProcess AUTOTUNE benchmarking takes 0.2395 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:06:57.9834497Z Autotune Choices Stats: 2025-09-07T11:06:57.9836863Z {"num_choices": 7, "num_triton_choices": 6, "best_kernel": "triton_convolution2d_5", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.017535999417304993, "best_triton_pos": 0} 2025-09-07T11:06:57.9841440Z AUTOTUNE convolution(8x3x224x224, 64x3x3x3) 2025-09-07T11:06:57.9841808Z strides: [150528, 50176, 224, 1], [27, 9, 3, 1] 2025-09-07T11:06:57.9842083Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:57.9842809Z triton_convolution2d_5 0.0175 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:57.9843959Z triton_convolution2d_3 0.0188 ms 93.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:57.9845378Z triton_convolution2d_1 0.0197 ms 89.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:57.9846591Z triton_convolution2d_0 0.0199 ms 88.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:57.9847823Z triton_convolution2d_4 0.0219 ms 80.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:57.9848514Z convolution 0.0346 ms 50.6% 2025-09-07T11:06:57.9849184Z triton_convolution2d_2 0.0355 ms 49.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:06:57.9850103Z SingleProcess AUTOTUNE benchmarking takes 0.0873 seconds and 0.0002 seconds precompiling for 7 choices 2025-09-07T11:06:58.0941227Z Autotune Choices Stats: 2025-09-07T11:06:58.0942441Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_9", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.028224000707268715, "best_triton_pos": 0} 2025-09-07T11:06:58.0948031Z AUTOTUNE convolution(8x64x112x112, 64x64x3x3) 2025-09-07T11:06:58.0948474Z strides: [802816, 12544, 112, 1], [576, 9, 3, 1] 2025-09-07T11:06:58.0948799Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:58.0949633Z triton_convolution2d_9 0.0282 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:58.0951023Z triton_convolution2d_10 0.0297 ms 94.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:58.0952236Z triton_convolution2d_11 0.0298 ms 94.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:58.0953962Z triton_convolution2d_12 0.0303 ms 93.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:58.0955591Z triton_convolution2d_6 0.0316 ms 89.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:58.0956796Z triton_convolution2d_7 0.0383 ms 73.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:58.0957534Z convolution 0.0396 ms 71.2% 2025-09-07T11:06:58.0958262Z triton_convolution2d_8 0.1373 ms 20.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:06:58.0959244Z SingleProcess AUTOTUNE benchmarking takes 0.1102 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:06:58.2250501Z Autotune Choices Stats: 2025-09-07T11:06:58.2251616Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_18", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8", "best_time": 0.008895999751985073, "best_triton_pos": 0} 2025-09-07T11:06:58.2257644Z AUTOTUNE convolution(8x64x56x56, 64x64x1x1) 2025-09-07T11:06:58.2257970Z strides: [200704, 3136, 56, 1], [64, 1, 1, 1] 2025-09-07T11:06:58.2258274Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:58.2259070Z triton_convolution2d_18 0.0089 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:58.2260373Z triton_convolution2d_17 0.0092 ms 97.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:58.2261536Z triton_convolution2d_16 0.0094 ms 94.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:58.2262192Z convolution 0.0096 ms 92.4% 2025-09-07T11:06:58.2262806Z triton_convolution2d_13 0.0099 ms 90.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:58.2263845Z triton_convolution2d_19 0.0107 ms 83.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:58.2265427Z triton_convolution2d_14 0.0116 ms 76.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:58.2267063Z triton_convolution2d_15 0.0122 ms 73.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:06:58.2268024Z conv1x1_via_mm 0.0440 ms 20.2% 2025-09-07T11:06:58.2268634Z SingleProcess AUTOTUNE benchmarking takes 0.1305 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:06:58.3268551Z Autotune Choices Stats: 2025-09-07T11:06:58.3269735Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_24", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.023104000836610794, "best_triton_pos": 0} 2025-09-07T11:06:58.3275737Z AUTOTUNE convolution(8x64x56x56, 64x64x3x3) 2025-09-07T11:06:58.3276147Z strides: [200704, 3136, 56, 1], [576, 9, 3, 1] 2025-09-07T11:06:58.3276442Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:58.3277235Z triton_convolution2d_24 0.0231 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:58.3278481Z triton_convolution2d_26 0.0231 ms 99.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:58.3279710Z triton_convolution2d_23 0.0246 ms 93.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:58.3280928Z triton_convolution2d_25 0.0248 ms 93.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:58.3282047Z triton_convolution2d_20 0.0278 ms 83.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:58.3282734Z convolution 0.0300 ms 77.0% 2025-09-07T11:06:58.3283398Z triton_convolution2d_21 0.0329 ms 70.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:58.3284543Z triton_convolution2d_22 0.0683 ms 33.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:06:58.3285583Z SingleProcess AUTOTUNE benchmarking takes 0.1013 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:06:58.4562557Z Autotune Choices Stats: 2025-09-07T11:06:58.4563953Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.011552000418305397, "best_triton_pos": 1, "best_triton_time": 0.013024000450968742, "best_triton_kernel": "triton_convolution2d_27", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T11:06:58.4570137Z AUTOTUNE convolution(8x64x56x56, 256x64x1x1) 2025-09-07T11:06:58.4570510Z strides: [200704, 3136, 56, 1], [64, 1, 1, 1] 2025-09-07T11:06:58.4570766Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:58.4571011Z convolution 0.0116 ms 100.0% 2025-09-07T11:06:58.4571659Z triton_convolution2d_27 0.0130 ms 88.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:58.4572724Z triton_convolution2d_31 0.0136 ms 84.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:58.4573790Z triton_convolution2d_30 0.0142 ms 81.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:58.4574851Z triton_convolution2d_28 0.0147 ms 78.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:58.4576598Z triton_convolution2d_33 0.0157 ms 73.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:58.4577670Z triton_convolution2d_32 0.0157 ms 73.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:58.4578723Z triton_convolution2d_29 0.0204 ms 56.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:06:58.4579372Z conv1x1_via_mm 0.0834 ms 13.9% 2025-09-07T11:06:58.4579780Z SingleProcess AUTOTUNE benchmarking takes 0.1290 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:06:58.5894009Z Autotune Choices Stats: 2025-09-07T11:06:58.5895739Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.013055999763309956, "best_triton_pos": 1, "best_triton_time": 0.014431999996304512, "best_triton_kernel": "triton_convolution2d_46", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8"} 2025-09-07T11:06:58.5901547Z AUTOTUNE convolution(8x256x56x56, 64x256x1x1) 2025-09-07T11:06:58.5901853Z strides: [802816, 3136, 56, 1], [256, 1, 1, 1] 2025-09-07T11:06:58.5902105Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:58.5902344Z convolution 0.0131 ms 100.0% 2025-09-07T11:06:58.5902983Z triton_convolution2d_46 0.0144 ms 90.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:58.5904046Z triton_convolution2d_45 0.0146 ms 89.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:58.5905271Z triton_convolution2d_44 0.0157 ms 83.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:58.5906327Z triton_convolution2d_47 0.0173 ms 75.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:58.5907372Z triton_convolution2d_41 0.0185 ms 70.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:58.5908417Z triton_convolution2d_42 0.0228 ms 57.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:58.5909481Z triton_convolution2d_43 0.0257 ms 50.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:06:58.5910132Z conv1x1_via_mm 0.0816 ms 16.0% 2025-09-07T11:06:58.5910545Z SingleProcess AUTOTUNE benchmarking takes 0.1276 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:06:58.7611816Z Autotune Choices Stats: 2025-09-07T11:06:58.7613199Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.06457599997520447, "best_triton_pos": 1, "best_triton_time": 0.07568000257015228, "best_triton_kernel": "triton_convolution2d_108", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:06:58.7619838Z AUTOTUNE convolution(8x256x56x56, 18x256x3x3) 2025-09-07T11:06:58.7620211Z strides: [802816, 3136, 56, 1], [2304, 9, 3, 1] 2025-09-07T11:06:58.7620534Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:58.7620775Z convolution 0.0646 ms 100.0% 2025-09-07T11:06:58.7621552Z triton_convolution2d_108 0.0757 ms 85.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:58.7622619Z triton_convolution2d_104 0.0775 ms 83.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:58.7623669Z triton_convolution2d_109 0.0798 ms 80.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:58.7624725Z triton_convolution2d_107 0.0841 ms 76.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:58.7625908Z triton_convolution2d_110 0.0972 ms 66.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:58.7626976Z triton_convolution2d_105 0.1048 ms 61.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:58.7628041Z triton_convolution2d_106 0.2815 ms 22.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:06:58.7628886Z SingleProcess AUTOTUNE benchmarking takes 0.1571 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:06:58.9158236Z Autotune Choices Stats: 2025-09-07T11:06:58.9159588Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.05023999884724617, "best_triton_pos": 1, "best_triton_time": 0.05558399856090546, "best_triton_kernel": "triton_convolution2d_115", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:06:58.9166281Z AUTOTUNE convolution(8x256x56x56, 36x256x3x3) 2025-09-07T11:06:58.9166746Z strides: [802816, 3136, 56, 1], [2304, 9, 3, 1] 2025-09-07T11:06:58.9167077Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:58.9167379Z convolution 0.0502 ms 100.0% 2025-09-07T11:06:58.9168151Z triton_convolution2d_115 0.0556 ms 90.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:58.9169447Z triton_convolution2d_116 0.0564 ms 89.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:58.9170715Z triton_convolution2d_114 0.0656 ms 76.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:58.9171763Z triton_convolution2d_111 0.0810 ms 62.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:58.9173307Z triton_convolution2d_117 0.0901 ms 55.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:58.9174376Z triton_convolution2d_112 0.1173 ms 42.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:58.9175751Z triton_convolution2d_113 0.3484 ms 14.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:06:58.9176591Z SingleProcess AUTOTUNE benchmarking takes 0.1522 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:06:59.0167060Z Autotune Choices Stats: 2025-09-07T11:06:59.0168184Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_122", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.01244799979031086, "best_triton_pos": 0} 2025-09-07T11:06:59.0174389Z AUTOTUNE convolution(8x18x56x56, 18x18x3x3) 2025-09-07T11:06:59.0174667Z strides: [56448, 3136, 56, 1], [162, 9, 3, 1] 2025-09-07T11:06:59.0174915Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:59.0175928Z triton_convolution2d_122 0.0124 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:59.0177006Z triton_convolution2d_124 0.0131 ms 95.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:59.0178090Z triton_convolution2d_121 0.0133 ms 93.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:59.0179142Z triton_convolution2d_123 0.0134 ms 92.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:59.0180186Z triton_convolution2d_118 0.0138 ms 90.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:59.0181240Z triton_convolution2d_119 0.0161 ms 77.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:59.0182417Z triton_convolution2d_120 0.0332 ms 37.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:06:59.0183077Z convolution 0.0437 ms 28.5% 2025-09-07T11:06:59.0183482Z SingleProcess AUTOTUNE benchmarking takes 0.0984 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:06:59.1185909Z Autotune Choices Stats: 2025-09-07T11:06:59.1187112Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_178", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.014592000283300877, "best_triton_pos": 0} 2025-09-07T11:06:59.1193572Z AUTOTUNE convolution(8x36x28x28, 36x36x3x3) 2025-09-07T11:06:59.1194276Z strides: [28224, 784, 28, 1], [324, 9, 3, 1] 2025-09-07T11:06:59.1194562Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:59.1195717Z triton_convolution2d_178 0.0146 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:59.1196978Z triton_convolution2d_179 0.0156 ms 93.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:59.1198210Z triton_convolution2d_174 0.0157 ms 92.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:59.1199428Z triton_convolution2d_177 0.0172 ms 84.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:59.1200666Z triton_convolution2d_180 0.0198 ms 73.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:59.1201848Z triton_convolution2d_175 0.0224 ms 65.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:59.1202542Z convolution 0.0365 ms 40.0% 2025-09-07T11:06:59.1203223Z triton_convolution2d_176 0.0391 ms 37.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:06:59.1204132Z SingleProcess AUTOTUNE benchmarking takes 0.0992 seconds and 0.0001 seconds precompiling for 8 choices 2025-09-07T11:06:59.2491952Z Autotune Choices Stats: 2025-09-07T11:06:59.2493062Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_235", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8", "best_time": 0.007040000054985285, "best_triton_pos": 0} 2025-09-07T11:06:59.2500596Z AUTOTUNE convolution(8x36x28x28, 18x36x1x1) 2025-09-07T11:06:59.2500879Z strides: [28224, 784, 28, 1], [36, 1, 1, 1] 2025-09-07T11:06:59.2501123Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:59.2501871Z triton_convolution2d_235 0.0070 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:59.2502925Z triton_convolution2d_234 0.0073 ms 96.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:59.2503989Z triton_convolution2d_233 0.0073 ms 96.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:59.2505286Z triton_convolution2d_236 0.0079 ms 89.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:59.2506350Z triton_convolution2d_230 0.0080 ms 88.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:59.2507000Z convolution 0.0080 ms 88.0% 2025-09-07T11:06:59.2507623Z triton_convolution2d_231 0.0086 ms 81.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:59.2509065Z triton_convolution2d_232 0.0096 ms 73.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:06:59.2509740Z conv1x1_via_mm 0.0176 ms 39.9% 2025-09-07T11:06:59.2510159Z SingleProcess AUTOTUNE benchmarking takes 0.1279 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:06:59.3492958Z Autotune Choices Stats: 2025-09-07T11:06:59.3494062Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_242", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.009920000098645687, "best_triton_pos": 0} 2025-09-07T11:06:59.3501171Z AUTOTUNE convolution(8x18x56x56, 36x18x3x3) 2025-09-07T11:06:59.3501632Z strides: [56448, 3136, 56, 1], [162, 9, 3, 1] 2025-09-07T11:06:59.3501928Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:59.3502707Z triton_convolution2d_242 0.0099 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:59.3503953Z triton_convolution2d_241 0.0110 ms 90.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:59.3505453Z triton_convolution2d_240 0.0120 ms 82.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:59.3506700Z triton_convolution2d_237 0.0141 ms 70.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:59.3507927Z triton_convolution2d_243 0.0144 ms 69.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:59.3509141Z triton_convolution2d_238 0.0180 ms 55.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:59.3509881Z convolution 0.0367 ms 27.1% 2025-09-07T11:06:59.3510617Z triton_convolution2d_239 0.0380 ms 26.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:06:59.3511527Z SingleProcess AUTOTUNE benchmarking takes 0.0996 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:06:59.4496499Z Autotune Choices Stats: 2025-09-07T11:06:59.4497588Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_248", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.016575999557971954, "best_triton_pos": 0} 2025-09-07T11:06:59.4505501Z AUTOTUNE convolution(8x36x28x28, 72x36x3x3) 2025-09-07T11:06:59.4505808Z strides: [28224, 784, 28, 1], [324, 9, 3, 1] 2025-09-07T11:06:59.4506098Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:59.4506853Z triton_convolution2d_248 0.0166 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:59.4508376Z triton_convolution2d_249 0.0168 ms 98.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:59.4509757Z triton_convolution2d_244 0.0183 ms 90.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:59.4510986Z triton_convolution2d_247 0.0206 ms 80.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:59.4512215Z triton_convolution2d_250 0.0224 ms 74.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:59.4513437Z triton_convolution2d_245 0.0234 ms 70.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:59.4514192Z convolution 0.0281 ms 59.0% 2025-09-07T11:06:59.4514925Z triton_convolution2d_246 0.0516 ms 32.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:06:59.4516058Z SingleProcess AUTOTUNE benchmarking takes 0.0999 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:06:59.5557942Z Autotune Choices Stats: 2025-09-07T11:06:59.5559027Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_368", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.021407999098300934, "best_triton_pos": 0} 2025-09-07T11:06:59.5567061Z AUTOTUNE convolution(8x72x14x14, 72x72x3x3) 2025-09-07T11:06:59.5567392Z strides: [14112, 196, 14, 1], [648, 9, 3, 1] 2025-09-07T11:06:59.5567697Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:59.5568480Z triton_convolution2d_368 0.0214 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:59.5569740Z triton_convolution2d_367 0.0219 ms 97.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:59.5570961Z triton_convolution2d_363 0.0257 ms 83.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:59.5571631Z convolution 0.0261 ms 82.1% 2025-09-07T11:06:59.5572287Z triton_convolution2d_366 0.0270 ms 79.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:59.5573359Z triton_convolution2d_369 0.0280 ms 76.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:59.5574417Z triton_convolution2d_364 0.0337 ms 63.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:59.5575743Z triton_convolution2d_365 0.0605 ms 35.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:06:59.5576783Z SingleProcess AUTOTUNE benchmarking takes 0.1000 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:06:59.6878583Z Autotune Choices Stats: 2025-09-07T11:06:59.6880165Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.007968000136315823, "best_triton_pos": 1, "best_triton_time": 0.008063999935984612, "best_triton_kernel": "triton_convolution2d_430", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T11:06:59.6887676Z AUTOTUNE convolution(8x72x14x14, 18x72x1x1) 2025-09-07T11:06:59.6888043Z strides: [14112, 196, 14, 1], [72, 1, 1, 1] 2025-09-07T11:06:59.6888302Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:59.6888542Z convolution 0.0080 ms 100.0% 2025-09-07T11:06:59.6889200Z triton_convolution2d_430 0.0081 ms 98.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:59.6890279Z triton_convolution2d_431 0.0081 ms 98.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:59.6891345Z triton_convolution2d_429 0.0086 ms 92.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:59.6892392Z triton_convolution2d_426 0.0088 ms 90.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:59.6893438Z triton_convolution2d_432 0.0095 ms 83.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:59.6894496Z triton_convolution2d_427 0.0099 ms 80.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:59.6896158Z triton_convolution2d_428 0.0125 ms 63.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:06:59.6896826Z conv1x1_via_mm 0.0180 ms 44.1% 2025-09-07T11:06:59.6897243Z SingleProcess AUTOTUNE benchmarking takes 0.1292 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:06:59.8203916Z Autotune Choices Stats: 2025-09-07T11:06:59.8205937Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.008031999692320824, "best_triton_pos": 1, "best_triton_time": 0.008287999778985977, "best_triton_kernel": "triton_convolution2d_444", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T11:06:59.8213013Z AUTOTUNE convolution(8x72x14x14, 36x72x1x1) 2025-09-07T11:06:59.8213431Z strides: [14112, 196, 14, 1], [72, 1, 1, 1] 2025-09-07T11:06:59.8213688Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:59.8213930Z convolution 0.0080 ms 100.0% 2025-09-07T11:06:59.8214591Z triton_convolution2d_444 0.0083 ms 96.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:59.8215824Z triton_convolution2d_445 0.0084 ms 95.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:59.8217396Z triton_convolution2d_443 0.0088 ms 91.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:59.8218460Z triton_convolution2d_440 0.0092 ms 86.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:59.8219498Z triton_convolution2d_446 0.0102 ms 78.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:06:59.8220556Z triton_convolution2d_441 0.0109 ms 73.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:06:59.8221739Z triton_convolution2d_442 0.0124 ms 65.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:06:59.8222393Z conv1x1_via_mm 0.0180 ms 44.6% 2025-09-07T11:06:59.8222809Z SingleProcess AUTOTUNE benchmarking takes 0.1290 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:06:59.9224407Z Autotune Choices Stats: 2025-09-07T11:06:59.9225882Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_452", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.009920000098645687, "best_triton_pos": 0} 2025-09-07T11:06:59.9233572Z AUTOTUNE convolution(8x18x56x56, 18x18x3x3) 2025-09-07T11:06:59.9234008Z strides: [56448, 3136, 56, 1], [162, 9, 3, 1] 2025-09-07T11:06:59.9234297Z dtypes: torch.float16, torch.float16 2025-09-07T11:06:59.9235420Z triton_convolution2d_452 0.0099 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:59.9236667Z triton_convolution2d_451 0.0107 ms 92.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:59.9237905Z triton_convolution2d_450 0.0118 ms 84.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:59.9239123Z triton_convolution2d_447 0.0129 ms 76.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:59.9240362Z triton_convolution2d_453 0.0149 ms 66.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:06:59.9241568Z triton_convolution2d_448 0.0175 ms 56.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:06:59.9242374Z convolution 0.0371 ms 26.7% 2025-09-07T11:06:59.9243060Z triton_convolution2d_449 0.0380 ms 26.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:06:59.9243968Z SingleProcess AUTOTUNE benchmarking takes 0.0994 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:07:00.0206018Z Autotune Choices Stats: 2025-09-07T11:07:00.0207347Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_458", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.010048000141978264, "best_triton_pos": 0} 2025-09-07T11:07:00.0215459Z AUTOTUNE convolution(8x18x28x28, 72x18x3x3) 2025-09-07T11:07:00.0215773Z strides: [14112, 784, 28, 1], [162, 9, 3, 1] 2025-09-07T11:07:00.0216050Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:00.0216773Z triton_convolution2d_458 0.0100 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:00.0217927Z triton_convolution2d_459 0.0111 ms 90.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:00.0219091Z triton_convolution2d_457 0.0129 ms 77.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:00.0220245Z triton_convolution2d_454 0.0138 ms 72.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:00.0221503Z triton_convolution2d_460 0.0141 ms 71.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:00.0222576Z triton_convolution2d_455 0.0177 ms 56.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:00.0223228Z convolution 0.0258 ms 39.0% 2025-09-07T11:07:00.0223869Z triton_convolution2d_456 0.0330 ms 30.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:07:00.0224717Z SingleProcess AUTOTUNE benchmarking takes 0.0977 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:07:00.1760564Z Autotune Choices Stats: 2025-09-07T11:07:00.1761627Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_1123", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.02223999984562397, "best_triton_pos": 0} 2025-09-07T11:07:00.1770393Z AUTOTUNE convolution(8x72x14x14, 144x72x3x3) 2025-09-07T11:07:00.1770824Z strides: [14112, 196, 14, 1], [648, 9, 3, 1] 2025-09-07T11:07:00.1771093Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:00.1771809Z triton_convolution2d_1123 0.0222 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:00.1772513Z convolution 0.0284 ms 78.4% 2025-09-07T11:07:00.1773181Z triton_convolution2d_1124 0.0341 ms 65.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:00.1774317Z triton_convolution2d_1122 0.0348 ms 64.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:00.1775952Z triton_convolution2d_1125 0.0357 ms 62.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:00.1777270Z triton_convolution2d_1120 0.0417 ms 53.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:00.1778407Z triton_convolution2d_1119 0.0467 ms 47.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:00.1779543Z triton_convolution2d_1121 0.0682 ms 32.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:07:00.1780445Z SingleProcess AUTOTUNE benchmarking takes 0.1077 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:07:00.3097304Z Autotune Choices Stats: 2025-09-07T11:07:00.3098733Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.022911999374628067, "best_triton_pos": 1, "best_triton_time": 0.04307200014591217, "best_triton_kernel": "triton_convolution2d_1298", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:07:00.3107046Z AUTOTUNE convolution(8x144x7x7, 144x144x3x3) 2025-09-07T11:07:00.3107326Z strides: [7056, 49, 7, 1], [1296, 9, 3, 1] 2025-09-07T11:07:00.3107581Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:00.3107821Z convolution 0.0229 ms 100.0% 2025-09-07T11:07:00.3108457Z triton_convolution2d_1298 0.0431 ms 53.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:00.3109540Z triton_convolution2d_1300 0.0511 ms 44.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:00.3110707Z triton_convolution2d_1297 0.0558 ms 41.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:00.3111940Z triton_convolution2d_1299 0.0562 ms 40.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:00.3113166Z triton_convolution2d_1295 0.0655 ms 35.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:00.3114394Z triton_convolution2d_1296 0.0734 ms 31.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:07:00.3115886Z triton_convolution2d_1294 0.0767 ms 29.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:00.3116854Z SingleProcess AUTOTUNE benchmarking takes 0.1213 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:07:00.4614538Z Autotune Choices Stats: 2025-09-07T11:07:00.4616084Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.00863999966531992, "best_triton_pos": 1, "best_triton_time": 0.009535999968647957, "best_triton_kernel": "triton_convolution2d_1369", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8"} 2025-09-07T11:07:00.4624649Z AUTOTUNE convolution(8x144x7x7, 18x144x1x1) 2025-09-07T11:07:00.4625059Z strides: [7056, 49, 7, 1], [144, 1, 1, 1] 2025-09-07T11:07:00.4625306Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:00.4625550Z convolution 0.0086 ms 100.0% 2025-09-07T11:07:00.4626149Z triton_convolution2d_1369 0.0095 ms 90.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:00.4627162Z triton_convolution2d_1368 0.0109 ms 79.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:00.4628180Z triton_convolution2d_1367 0.0134 ms 64.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:00.4629180Z triton_convolution2d_1364 0.0138 ms 62.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:00.4630180Z triton_convolution2d_1365 0.0145 ms 59.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:00.4631406Z triton_convolution2d_1370 0.0150 ms 57.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:00.4632623Z triton_convolution2d_1366 0.0159 ms 54.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:07:00.4633370Z conv1x1_via_mm 0.0192 ms 45.1% 2025-09-07T11:07:00.4633830Z SingleProcess AUTOTUNE benchmarking takes 0.1285 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:07:00.5969081Z Autotune Choices Stats: 2025-09-07T11:07:00.5970453Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.00886400043964386, "best_triton_pos": 1, "best_triton_time": 0.009631999768316746, "best_triton_kernel": "triton_convolution2d_1390", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8"} 2025-09-07T11:07:00.5978967Z AUTOTUNE convolution(8x144x7x7, 36x144x1x1) 2025-09-07T11:07:00.5979300Z strides: [7056, 49, 7, 1], [144, 1, 1, 1] 2025-09-07T11:07:00.5979613Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:00.5979889Z convolution 0.0089 ms 100.0% 2025-09-07T11:07:00.5980673Z triton_convolution2d_1390 0.0096 ms 92.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:00.5981876Z triton_convolution2d_1385 0.0140 ms 63.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:00.5982932Z triton_convolution2d_1391 0.0147 ms 60.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:00.5983993Z triton_convolution2d_1388 0.0157 ms 56.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:00.5985708Z triton_convolution2d_1386 0.0159 ms 55.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:00.5986813Z triton_convolution2d_1387 0.0162 ms 54.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:07:00.5987874Z triton_convolution2d_1389 0.0173 ms 51.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:00.5988522Z conv1x1_via_mm 0.0196 ms 45.2% 2025-09-07T11:07:00.5988932Z SingleProcess AUTOTUNE benchmarking takes 0.1290 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:07:00.7312695Z Autotune Choices Stats: 2025-09-07T11:07:00.7314062Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.009344000369310379, "best_triton_pos": 1, "best_triton_time": 0.010239999741315842, "best_triton_kernel": "triton_convolution2d_1418", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8"} 2025-09-07T11:07:00.7323083Z AUTOTUNE convolution(8x144x7x7, 72x144x1x1) 2025-09-07T11:07:00.7323384Z strides: [7056, 49, 7, 1], [144, 1, 1, 1] 2025-09-07T11:07:00.7323643Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:00.7323902Z convolution 0.0093 ms 100.0% 2025-09-07T11:07:00.7324598Z triton_convolution2d_1418 0.0102 ms 91.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:00.7325990Z triton_convolution2d_1417 0.0113 ms 83.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:00.7327141Z triton_convolution2d_1416 0.0118 ms 79.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:00.7328288Z triton_convolution2d_1413 0.0123 ms 76.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:00.7329419Z triton_convolution2d_1414 0.0161 ms 58.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:00.7330602Z triton_convolution2d_1415 0.0161 ms 58.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:07:00.7331825Z triton_convolution2d_1419 0.0164 ms 57.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:00.7332574Z conv1x1_via_mm 0.0177 ms 52.7% 2025-09-07T11:07:00.7333047Z SingleProcess AUTOTUNE benchmarking takes 0.1302 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:07:00.8335757Z Autotune Choices Stats: 2025-09-07T11:07:00.8336859Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_1432", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.009088000282645226, "best_triton_pos": 0} 2025-09-07T11:07:00.8345969Z AUTOTUNE convolution(8x18x28x28, 18x18x3x3) 2025-09-07T11:07:00.8346593Z strides: [14112, 784, 28, 1], [162, 9, 3, 1] 2025-09-07T11:07:00.8346877Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:00.8347567Z triton_convolution2d_1432 0.0091 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:00.8348671Z triton_convolution2d_1431 0.0097 ms 93.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:00.8349738Z triton_convolution2d_1430 0.0108 ms 84.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:00.8350930Z triton_convolution2d_1427 0.0118 ms 77.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:00.8352163Z triton_convolution2d_1433 0.0128 ms 71.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:00.8353382Z triton_convolution2d_1428 0.0162 ms 56.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:00.8354610Z triton_convolution2d_1429 0.0329 ms 27.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:07:00.8355649Z convolution 0.0355 ms 25.6% 2025-09-07T11:07:00.8356125Z SingleProcess AUTOTUNE benchmarking takes 0.0987 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:07:00.9311736Z Autotune Choices Stats: 2025-09-07T11:07:00.9312803Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_1438", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.010111999697983265, "best_triton_pos": 0} 2025-09-07T11:07:00.9322448Z AUTOTUNE convolution(8x18x14x14, 144x18x3x3) 2025-09-07T11:07:00.9322764Z strides: [3528, 196, 14, 1], [162, 9, 3, 1] 2025-09-07T11:07:00.9323046Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:00.9323817Z triton_convolution2d_1438 0.0101 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:00.9325456Z triton_convolution2d_1439 0.0135 ms 74.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:00.9326683Z triton_convolution2d_1440 0.0143 ms 70.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:00.9327911Z triton_convolution2d_1437 0.0143 ms 70.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:00.9329155Z triton_convolution2d_1434 0.0180 ms 56.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:00.9330752Z triton_convolution2d_1435 0.0188 ms 53.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:00.9331497Z convolution 0.0235 ms 43.1% 2025-09-07T11:07:00.9332187Z triton_convolution2d_1436 0.0341 ms 29.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:07:00.9333086Z SingleProcess AUTOTUNE benchmarking takes 0.0972 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:07:01.0320215Z Autotune Choices Stats: 2025-09-07T11:07:01.0321538Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_1446", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.014816000126302242, "best_triton_pos": 0} 2025-09-07T11:07:01.0330977Z AUTOTUNE convolution(8x36x28x28, 36x36x3x3) 2025-09-07T11:07:01.0331286Z strides: [28224, 784, 28, 1], [324, 9, 3, 1] 2025-09-07T11:07:01.0331537Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:01.0332211Z triton_convolution2d_1446 0.0148 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:01.0333282Z triton_convolution2d_1445 0.0150 ms 98.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:01.0334336Z triton_convolution2d_1441 0.0176 ms 84.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:01.0335665Z triton_convolution2d_1444 0.0179 ms 82.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:01.0336724Z triton_convolution2d_1447 0.0213 ms 69.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:01.0337777Z triton_convolution2d_1442 0.0230 ms 64.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:01.0338424Z convolution 0.0366 ms 40.4% 2025-09-07T11:07:01.0339060Z triton_convolution2d_1443 0.0512 ms 28.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:07:01.0339903Z SingleProcess AUTOTUNE benchmarking takes 0.1004 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:07:01.1324498Z Autotune Choices Stats: 2025-09-07T11:07:01.1325923Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_1452", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.015039999969303608, "best_triton_pos": 0} 2025-09-07T11:07:01.1335195Z AUTOTUNE convolution(8x36x14x14, 144x36x3x3) 2025-09-07T11:07:01.1335484Z strides: [7056, 196, 14, 1], [324, 9, 3, 1] 2025-09-07T11:07:01.1335734Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:01.1336691Z triton_convolution2d_1452 0.0150 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:01.1337921Z triton_convolution2d_1453 0.0206 ms 73.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:01.1339015Z triton_convolution2d_1451 0.0218 ms 69.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:01.1340091Z triton_convolution2d_1454 0.0227 ms 66.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:01.1341301Z triton_convolution2d_1448 0.0242 ms 62.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:01.1342632Z triton_convolution2d_1449 0.0243 ms 62.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:01.1343380Z convolution 0.0294 ms 51.1% 2025-09-07T11:07:01.1344103Z triton_convolution2d_1450 0.0455 ms 33.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:07:01.1345216Z SingleProcess AUTOTUNE benchmarking takes 0.1000 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:07:01.3728257Z Autotune Choices Stats: 2025-09-07T11:07:01.3729394Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_2137", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8", "best_time": 0.007296000141650438, "best_triton_pos": 0} 2025-09-07T11:07:01.3739321Z AUTOTUNE convolution(8x18x56x56, 32x18x1x1) 2025-09-07T11:07:01.3739777Z strides: [56448, 3136, 56, 1], [18, 1, 1, 1] 2025-09-07T11:07:01.3740039Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:01.3740707Z triton_convolution2d_2137 0.0073 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:01.3741900Z triton_convolution2d_2139 0.0073 ms 99.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:01.3742953Z triton_convolution2d_2138 0.0076 ms 96.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:01.3743998Z triton_convolution2d_2140 0.0076 ms 95.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:01.3745322Z triton_convolution2d_2134 0.0082 ms 89.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:01.3746366Z triton_convolution2d_2135 0.0085 ms 85.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:01.3747258Z convolution 0.0087 ms 84.1% 2025-09-07T11:07:01.3748026Z triton_convolution2d_2136 0.0091 ms 80.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:07:01.3748691Z conv1x1_via_mm 0.0234 ms 31.1% 2025-09-07T11:07:01.3749106Z SingleProcess AUTOTUNE benchmarking takes 0.1293 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:07:01.4719659Z Autotune Choices Stats: 2025-09-07T11:07:01.4720765Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_2141", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.013151999562978745, "best_triton_pos": 0} 2025-09-07T11:07:01.4730726Z AUTOTUNE convolution(8x32x56x56, 32x32x3x3) 2025-09-07T11:07:01.4731115Z strides: [100352, 3136, 56, 1], [288, 9, 3, 1] 2025-09-07T11:07:01.4731465Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:01.4732229Z triton_convolution2d_2141 0.0132 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:01.4733459Z triton_convolution2d_2145 0.0146 ms 90.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:01.4734677Z triton_convolution2d_2146 0.0152 ms 86.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:01.4736053Z triton_convolution2d_2144 0.0156 ms 84.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:01.4737284Z triton_convolution2d_2147 0.0163 ms 80.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:01.4738506Z triton_convolution2d_2142 0.0171 ms 76.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:01.4739269Z convolution 0.0236 ms 55.8% 2025-09-07T11:07:01.4740012Z triton_convolution2d_2143 0.0385 ms 34.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:07:01.4740995Z SingleProcess AUTOTUNE benchmarking takes 0.0987 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:07:01.6034474Z Autotune Choices Stats: 2025-09-07T11:07:01.6035702Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_2152", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4", "best_time": 0.008799999952316284, "best_triton_pos": 0} 2025-09-07T11:07:01.6046688Z AUTOTUNE convolution(8x32x56x56, 128x32x1x1) 2025-09-07T11:07:01.6046973Z strides: [100352, 3136, 56, 1], [32, 1, 1, 1] 2025-09-07T11:07:01.6047309Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:01.6048023Z triton_convolution2d_2152 0.0088 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:01.6049173Z triton_convolution2d_2153 0.0091 ms 97.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:01.6050695Z triton_convolution2d_2151 0.0092 ms 95.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:01.6051844Z triton_convolution2d_2154 0.0094 ms 93.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:01.6052569Z convolution 0.0100 ms 88.1% 2025-09-07T11:07:01.6053257Z triton_convolution2d_2149 0.0102 ms 85.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:01.6054416Z triton_convolution2d_2150 0.0108 ms 81.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:07:01.6055735Z triton_convolution2d_2148 0.0110 ms 79.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:01.6056426Z conv1x1_via_mm 0.0516 ms 17.1% 2025-09-07T11:07:01.6056862Z SingleProcess AUTOTUNE benchmarking takes 0.1311 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:07:01.7328841Z Autotune Choices Stats: 2025-09-07T11:07:01.7330045Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_2160", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8", "best_time": 0.008063999935984612, "best_triton_pos": 0} 2025-09-07T11:07:01.7340926Z AUTOTUNE convolution(8x18x56x56, 128x18x1x1) 2025-09-07T11:07:01.7341346Z strides: [56448, 3136, 56, 1], [18, 1, 1, 1] 2025-09-07T11:07:01.7341751Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:01.7342534Z triton_convolution2d_2160 0.0081 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:01.7343785Z triton_convolution2d_2159 0.0082 ms 98.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:01.7345282Z triton_convolution2d_2161 0.0088 ms 91.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:01.7346511Z triton_convolution2d_2158 0.0088 ms 91.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:01.7347730Z triton_convolution2d_2155 0.0091 ms 88.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:01.7348953Z triton_convolution2d_2156 0.0098 ms 82.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:01.7350167Z triton_convolution2d_2157 0.0102 ms 79.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:07:01.7350926Z convolution 0.0109 ms 74.1% 2025-09-07T11:07:01.7351441Z conv1x1_via_mm 0.0506 ms 15.9% 2025-09-07T11:07:01.7351845Z SingleProcess AUTOTUNE benchmarking takes 0.1290 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:07:01.8626132Z Autotune Choices Stats: 2025-09-07T11:07:01.8627502Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_2167", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8", "best_time": 0.007135999854654074, "best_triton_pos": 0} 2025-09-07T11:07:01.8637642Z AUTOTUNE convolution(8x36x28x28, 64x36x1x1) 2025-09-07T11:07:01.8638031Z strides: [28224, 784, 28, 1], [36, 1, 1, 1] 2025-09-07T11:07:01.8638313Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:01.8639057Z triton_convolution2d_2167 0.0071 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:01.8640326Z triton_convolution2d_2166 0.0076 ms 94.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:01.8641584Z triton_convolution2d_2165 0.0079 ms 89.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:01.8642315Z convolution 0.0080 ms 89.2% 2025-09-07T11:07:01.8642986Z triton_convolution2d_2162 0.0081 ms 87.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:01.8644126Z triton_convolution2d_2168 0.0090 ms 79.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:01.8645546Z triton_convolution2d_2163 0.0096 ms 74.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:01.8646697Z triton_convolution2d_2164 0.0099 ms 72.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:07:01.8647390Z conv1x1_via_mm 0.0208 ms 34.4% 2025-09-07T11:07:01.8647824Z SingleProcess AUTOTUNE benchmarking takes 0.1293 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:07:01.9636151Z Autotune Choices Stats: 2025-09-07T11:07:01.9637224Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_2173", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.015647999942302704, "best_triton_pos": 0} 2025-09-07T11:07:01.9647882Z AUTOTUNE convolution(8x64x28x28, 64x64x3x3) 2025-09-07T11:07:01.9648212Z strides: [50176, 784, 28, 1], [576, 9, 3, 1] 2025-09-07T11:07:01.9648476Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:01.9649192Z triton_convolution2d_2173 0.0156 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:01.9650335Z triton_convolution2d_2174 0.0159 ms 98.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:01.9651547Z triton_convolution2d_2172 0.0173 ms 90.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:01.9653207Z triton_convolution2d_2169 0.0203 ms 77.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:01.9653953Z convolution 0.0212 ms 74.0% 2025-09-07T11:07:01.9654666Z triton_convolution2d_2175 0.0216 ms 72.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:01.9656341Z triton_convolution2d_2170 0.0309 ms 50.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:01.9657566Z triton_convolution2d_2171 0.0535 ms 29.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:07:01.9658546Z SingleProcess AUTOTUNE benchmarking takes 0.1006 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:07:02.0960214Z Autotune Choices Stats: 2025-09-07T11:07:02.0961818Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.008320000022649765, "best_triton_pos": 1, "best_triton_time": 0.008415999822318554, "best_triton_kernel": "triton_convolution2d_2180", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T11:07:02.0972628Z AUTOTUNE convolution(8x64x28x28, 256x64x1x1) 2025-09-07T11:07:02.0972923Z strides: [50176, 784, 28, 1], [64, 1, 1, 1] 2025-09-07T11:07:02.0973196Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:02.0973435Z convolution 0.0083 ms 100.0% 2025-09-07T11:07:02.0974091Z triton_convolution2d_2180 0.0084 ms 98.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:02.0975489Z triton_convolution2d_2179 0.0094 ms 88.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:02.0976564Z triton_convolution2d_2182 0.0094 ms 88.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:02.0977629Z triton_convolution2d_2181 0.0095 ms 87.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:02.0978864Z triton_convolution2d_2177 0.0105 ms 79.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:02.0979927Z triton_convolution2d_2176 0.0107 ms 77.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:02.0980992Z triton_convolution2d_2178 0.0111 ms 74.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:07:02.0981761Z conv1x1_via_mm 0.0368 ms 22.6% 2025-09-07T11:07:02.0982175Z SingleProcess AUTOTUNE benchmarking takes 0.1320 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:07:02.2292196Z Autotune Choices Stats: 2025-09-07T11:07:02.2293657Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_2187", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4", "best_time": 0.008224000222980976, "best_triton_pos": 0} 2025-09-07T11:07:02.2304658Z AUTOTUNE convolution(8x36x28x28, 256x36x1x1) 2025-09-07T11:07:02.2305105Z strides: [28224, 784, 28, 1], [36, 1, 1, 1] 2025-09-07T11:07:02.2305354Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:02.2306026Z triton_convolution2d_2187 0.0082 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:02.2307115Z triton_convolution2d_2186 0.0090 ms 91.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:02.2308207Z triton_convolution2d_2188 0.0090 ms 91.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:02.2309289Z triton_convolution2d_2189 0.0090 ms 91.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:02.2309955Z convolution 0.0091 ms 90.2% 2025-09-07T11:07:02.2310591Z triton_convolution2d_2184 0.0097 ms 84.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:02.2311808Z triton_convolution2d_2185 0.0097 ms 84.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:07:02.2313038Z triton_convolution2d_2183 0.0098 ms 83.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:02.2313785Z conv1x1_via_mm 0.0356 ms 23.1% 2025-09-07T11:07:02.2314253Z SingleProcess AUTOTUNE benchmarking takes 0.1291 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:07:02.3647759Z Autotune Choices Stats: 2025-09-07T11:07:02.3649142Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.008224000222980976, "best_triton_pos": 1, "best_triton_time": 0.008511999621987343, "best_triton_kernel": "triton_convolution2d_2201", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T11:07:02.3661192Z AUTOTUNE convolution(8x72x14x14, 128x72x1x1) 2025-09-07T11:07:02.3661580Z strides: [14112, 196, 14, 1], [72, 1, 1, 1] 2025-09-07T11:07:02.3661845Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:02.3662084Z convolution 0.0082 ms 100.0% 2025-09-07T11:07:02.3662733Z triton_convolution2d_2201 0.0085 ms 96.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:02.3663837Z triton_convolution2d_2202 0.0093 ms 88.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:02.3664922Z triton_convolution2d_2197 0.0101 ms 81.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:02.3666705Z triton_convolution2d_2200 0.0105 ms 78.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:02.3667792Z triton_convolution2d_2198 0.0108 ms 76.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:02.3668870Z triton_convolution2d_2203 0.0109 ms 75.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:02.3669936Z triton_convolution2d_2199 0.0124 ms 66.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:07:02.3670626Z conv1x1_via_mm 0.0203 ms 40.5% 2025-09-07T11:07:02.3678535Z SingleProcess AUTOTUNE benchmarking takes 0.1352 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:07:02.4872575Z Autotune Choices Stats: 2025-09-07T11:07:02.4873931Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.02425600029528141, "best_triton_pos": 1, "best_triton_time": 0.03340800106525421, "best_triton_kernel": "triton_convolution2d_2208", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:07:02.4885571Z AUTOTUNE convolution(8x128x14x14, 128x128x3x3) 2025-09-07T11:07:02.4886019Z strides: [25088, 196, 14, 1], [1152, 9, 3, 1] 2025-09-07T11:07:02.4886296Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:02.4886574Z convolution 0.0243 ms 100.0% 2025-09-07T11:07:02.4887277Z triton_convolution2d_2208 0.0334 ms 72.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:02.4888422Z triton_convolution2d_2209 0.0414 ms 58.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:02.4889541Z triton_convolution2d_2210 0.0453 ms 53.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:02.4890676Z triton_convolution2d_2207 0.0505 ms 48.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:02.4891913Z triton_convolution2d_2204 0.0513 ms 47.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:02.4893133Z triton_convolution2d_2205 0.0595 ms 40.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:02.4894367Z triton_convolution2d_2206 0.1097 ms 22.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:07:02.4895480Z SingleProcess AUTOTUNE benchmarking takes 0.1187 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:07:02.6221728Z Autotune Choices Stats: 2025-09-07T11:07:02.6223107Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_2215", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4", "best_time": 0.009664000011980534, "best_triton_pos": 0} 2025-09-07T11:07:02.6234515Z AUTOTUNE convolution(8x128x14x14, 512x128x1x1) 2025-09-07T11:07:02.6235140Z strides: [25088, 196, 14, 1], [128, 1, 1, 1] 2025-09-07T11:07:02.6235434Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:02.6236204Z triton_convolution2d_2215 0.0097 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:02.6236960Z convolution 0.0098 ms 99.0% 2025-09-07T11:07:02.6237698Z triton_convolution2d_2214 0.0112 ms 86.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:02.6238946Z triton_convolution2d_2217 0.0114 ms 84.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:02.6240173Z triton_convolution2d_2216 0.0117 ms 82.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:02.6241411Z triton_convolution2d_2212 0.0125 ms 77.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:02.6242600Z triton_convolution2d_2211 0.0138 ms 70.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:02.6243736Z triton_convolution2d_2213 0.0154 ms 62.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:07:02.6244432Z conv1x1_via_mm 0.0223 ms 43.4% 2025-09-07T11:07:02.6244874Z SingleProcess AUTOTUNE benchmarking takes 0.1314 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:07:02.7519959Z Autotune Choices Stats: 2025-09-07T11:07:02.7521039Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_2222", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4", "best_time": 0.009088000282645226, "best_triton_pos": 0} 2025-09-07T11:07:02.7533621Z AUTOTUNE convolution(8x72x14x14, 512x72x1x1) 2025-09-07T11:07:02.7533952Z strides: [14112, 196, 14, 1], [72, 1, 1, 1] 2025-09-07T11:07:02.7534229Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:02.7535300Z triton_convolution2d_2222 0.0091 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:02.7536024Z convolution 0.0095 ms 95.3% 2025-09-07T11:07:02.7536725Z triton_convolution2d_2221 0.0104 ms 87.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:02.7537889Z triton_convolution2d_2224 0.0105 ms 86.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:02.7539015Z triton_convolution2d_2219 0.0106 ms 85.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:02.7540570Z triton_convolution2d_2223 0.0108 ms 84.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:02.7541837Z triton_convolution2d_2218 0.0119 ms 76.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:02.7542913Z triton_convolution2d_2220 0.0123 ms 74.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:07:02.7543568Z conv1x1_via_mm 0.0256 ms 35.5% 2025-09-07T11:07:02.7543982Z SingleProcess AUTOTUNE benchmarking takes 0.1294 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:07:02.8869307Z Autotune Choices Stats: 2025-09-07T11:07:02.8870418Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_2235", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8", "best_time": 0.01244799979031086, "best_triton_pos": 0} 2025-09-07T11:07:02.8882954Z AUTOTUNE convolution(8x144x7x7, 256x144x1x1) 2025-09-07T11:07:02.8883955Z strides: [7056, 49, 7, 1], [144, 1, 1, 1] 2025-09-07T11:07:02.8884317Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:02.8885411Z triton_convolution2d_2235 0.0124 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:02.8886623Z triton_convolution2d_2236 0.0140 ms 88.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:02.8887834Z triton_convolution2d_2232 0.0151 ms 82.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:02.8888995Z triton_convolution2d_2234 0.0160 ms 78.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:07:02.8890134Z triton_convolution2d_2238 0.0162 ms 76.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:02.8891275Z triton_convolution2d_2233 0.0164 ms 75.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:02.8892383Z triton_convolution2d_2237 0.0166 ms 75.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:02.8893033Z convolution 0.0168 ms 74.2% 2025-09-07T11:07:02.8893246Z conv1x1_via_mm 0.0176 ms 70.7% 2025-09-07T11:07:02.8893659Z SingleProcess AUTOTUNE benchmarking takes 0.1312 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:07:03.0561789Z Autotune Choices Stats: 2025-09-07T11:07:03.0563292Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.026016000658273697, "best_triton_pos": 1, "best_triton_time": 0.06390400230884552, "best_triton_kernel": "triton_convolution2d_2243", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:07:03.0575351Z AUTOTUNE convolution(8x256x7x7, 256x256x3x3) 2025-09-07T11:07:03.0576214Z strides: [12544, 49, 7, 1], [2304, 9, 3, 1] 2025-09-07T11:07:03.0576557Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:03.0576827Z convolution 0.0260 ms 100.0% 2025-09-07T11:07:03.0577518Z triton_convolution2d_2243 0.0639 ms 40.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:03.0578693Z triton_convolution2d_2245 0.0868 ms 30.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:03.0579841Z triton_convolution2d_2242 0.0943 ms 27.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:03.0580997Z triton_convolution2d_2244 0.1102 ms 23.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:07:03.0582247Z triton_convolution2d_2240 0.1159 ms 22.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:03.0583370Z triton_convolution2d_2241 0.1454 ms 17.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:07:03.0584494Z triton_convolution2d_2239 0.2737 ms 9.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:07:03.0585562Z SingleProcess AUTOTUNE benchmarking takes 0.1688 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:07:03.1922794Z Autotune Choices Stats: 2025-09-07T11:07:03.1923906Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_2250", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4", "best_time": 0.014783999882638454, "best_triton_pos": 0} 2025-09-07T11:07:03.1936896Z AUTOTUNE convolution(8x256x7x7, 1024x256x1x1) 2025-09-07T11:07:03.1937234Z strides: [12544, 49, 7, 1], [256, 1, 1, 1] 2025-09-07T11:07:03.1937525Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:03.1938311Z triton_convolution2d_2250 0.0148 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:03.1939559Z triton_convolution2d_2249 0.0151 ms 97.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:03.1940323Z convolution 0.0168 ms 88.2% 2025-09-07T11:07:03.1941063Z triton_convolution2d_2251 0.0180 ms 81.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:03.1942279Z triton_convolution2d_2252 0.0196 ms 75.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:03.1943328Z triton_convolution2d_2246 0.0205 ms 72.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:03.1944204Z conv1x1_via_mm 0.0220 ms 67.2% 2025-09-07T11:07:03.1945167Z triton_convolution2d_2247 0.0226 ms 65.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:03.1946280Z triton_convolution2d_2248 0.0239 ms 61.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:07:03.1947122Z SingleProcess AUTOTUNE benchmarking takes 0.1325 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:07:03.3226434Z Autotune Choices Stats: 2025-09-07T11:07:03.3227395Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_2256", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8", "best_time": 0.01228800043463707, "best_triton_pos": 0} 2025-09-07T11:07:03.3240462Z AUTOTUNE convolution(8x144x7x7, 1024x144x1x1) 2025-09-07T11:07:03.3240797Z strides: [7056, 49, 7, 1], [144, 1, 1, 1] 2025-09-07T11:07:03.3241073Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:03.3242073Z triton_convolution2d_2256 0.0123 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:03.3243419Z triton_convolution2d_2257 0.0140 ms 87.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:03.3244205Z convolution 0.0149 ms 82.2% 2025-09-07T11:07:03.3245274Z triton_convolution2d_2253 0.0150 ms 81.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:03.3246538Z triton_convolution2d_2259 0.0155 ms 79.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:03.3247778Z triton_convolution2d_2254 0.0158 ms 77.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T11:07:03.3249014Z triton_convolution2d_2258 0.0160 ms 76.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T11:07:03.3250261Z triton_convolution2d_2255 0.0162 ms 75.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T11:07:03.3251023Z conv1x1_via_mm 0.0201 ms 61.0% 2025-09-07T11:07:03.3251525Z SingleProcess AUTOTUNE benchmarking takes 0.1299 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T11:07:03.5817244Z Autotune Choices Stats: 2025-09-07T11:07:03.5818238Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_2278", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.010688000358641148, "best_triton_pos": 0} 2025-09-07T11:07:03.5832328Z AUTOTUNE addmm(8x1000, 8x2048, 2048x1000) 2025-09-07T11:07:03.5832715Z strides: [0, 1], [2048, 1], [1, 2048] 2025-09-07T11:07:03.5833389Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:07:03.5834233Z triton_mm_2278 0.0107 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:07:03.5834884Z bias_addmm 0.0112 ms 95.7% 2025-09-07T11:07:03.5835644Z triton_mm_2282 0.0116 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:03.5836624Z triton_mm_2286 0.0139 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:03.5837240Z addmm 0.0148 ms 72.5% 2025-09-07T11:07:03.5837806Z triton_mm_2290 0.0152 ms 70.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:03.5838768Z triton_mm_2277 0.0169 ms 63.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:07:03.5839740Z triton_mm_2276 0.0180 ms 59.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:03.5840699Z triton_mm_2275 0.0185 ms 57.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:07:03.5841661Z triton_mm_2281 0.0185 ms 57.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:03.5842495Z SingleProcess AUTOTUNE benchmarking takes 0.2524 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:07:59.6524046Z Autotune Choices Stats: 2025-09-07T11:07:59.6525497Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_2314", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007199999876320362, "best_triton_pos": 0} 2025-09-07T11:07:59.6542245Z AUTOTUNE mm(1000x8, 8x2048) 2025-09-07T11:07:59.6542542Z strides: [1, 1000], [2048, 1] 2025-09-07T11:07:59.6542780Z dtypes: torch.float16, torch.float16 2025-09-07T11:07:59.6543375Z triton_mm_2314 0.0072 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:07:59.6544273Z triton_mm_2317 0.0073 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:59.6545293Z triton_mm_2320 0.0073 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:59.6546195Z triton_mm_2315 0.0073 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:59.6547077Z triton_mm_2316 0.0073 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:59.6548062Z triton_mm_2318 0.0073 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:59.6549030Z triton_mm_2319 0.0073 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:59.6550653Z triton_mm_2313 0.0073 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:07:59.6551901Z triton_mm_2321 0.0074 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:07:59.6552918Z triton_mm_2322 0.0075 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:59.6553783Z SingleProcess AUTOTUNE benchmarking takes 0.1658 seconds and 0.0004 seconds precompiling for 17 choices 2025-09-07T11:08:00.4886016Z Autotune Choices Stats: 2025-09-07T11:08:00.4887306Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "mm", "best_time": 0.009920000098645687, "best_triton_pos": 1, "best_triton_time": 0.01027199998497963, "best_triton_kernel": "triton_mm_2299", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:08:00.4901499Z AUTOTUNE mm(8x1000, 1000x2048) 2025-09-07T11:08:00.4901765Z strides: [1000, 1], [2048, 1] 2025-09-07T11:08:00.4901983Z dtypes: torch.float16, torch.float16 2025-09-07T11:08:00.4902218Z mm 0.0099 ms 100.0% 2025-09-07T11:08:00.4902734Z triton_mm_2299 0.0103 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:08:00.4903603Z triton_mm_2295 0.0108 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:08:00.4904471Z triton_mm_2303 0.0108 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:08:00.4905514Z triton_mm_2293 0.0119 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:08:00.4906350Z triton_mm_2307 0.0121 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:08:00.4907194Z triton_mm_2294 0.0124 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:08:00.4908158Z triton_mm_2298 0.0125 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:08:00.4909113Z triton_mm_2305 0.0135 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:08:00.4910089Z triton_mm_2302 0.0136 ms 72.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:08:00.4910935Z SingleProcess AUTOTUNE benchmarking takes 0.1854 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:08:21.8420018Z skipping cudagraphs due to disabling cudagraphs due to incompatible op aten.index_put.default Found from File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 442, in torch_dynamo_resume_in_forward_and_backward_pass_at_440 2025-09-07T11:08:21.8421114Z pred = mod(*cloned_inputs) 2025-09-07T11:08:21.8421726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/hrnet.py", line 792, in forward 2025-09-07T11:08:21.8422239Z y = self.forward_features(x) 2025-09-07T11:08:21.8422748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/hrnet.py", line 770, in forward_features 2025-09-07T11:08:21.8423744Z yl = self.stages(x) 2025-09-07T11:08:21.8424179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/hrnet.py", line 757, in stages 2025-09-07T11:08:21.8424791Z yl = self.stage4(xl) 2025-09-07T11:08:21.8425681Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/hrnet.py", line 506, in forward 2025-09-07T11:08:21.8426091Z x = module(x) 2025-09-07T11:08:21.8426445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/hrnet.py", line 484, in forward 2025-09-07T11:08:21.8426842Z y = y + f(x[j]) 2025-09-07T11:08:21.8426947Z 2025-09-07T11:08:21.8426950Z 2025-09-07T11:08:26.0768481Z pass 2025-09-07T11:08:35.6470068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:08:35.6471858Z import pynvml # type: ignore[import] 2025-09-07T11:08:38.6860665Z 2025-09-07T11:08:40.6397637Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:08:40.6398162Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:08:40.6398586Z cuda train inception_v3 2025-09-07T11:09:26.1089641Z Autotune Choices Stats: 2025-09-07T11:09:26.1091286Z {"num_choices": 7, "num_triton_choices": 6, "best_kernel": "convolution", "best_time": 0.020896000787615776, "best_triton_pos": 1, "best_triton_time": 0.03376000002026558, "best_triton_kernel": "triton_convolution2d_4", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T11:09:26.1105843Z AUTOTUNE convolution(8x3x299x299, 32x3x3x3) 2025-09-07T11:09:26.1106205Z strides: [268203, 1, 897, 3], [27, 1, 9, 3] 2025-09-07T11:09:26.1106600Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:26.1106903Z convolution 0.0209 ms 100.0% 2025-09-07T11:09:26.1107671Z triton_convolution2d_4 0.0338 ms 61.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:26.1108795Z triton_convolution2d_0 0.0358 ms 58.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:26.1109918Z triton_convolution2d_2 0.0363 ms 57.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:26.1111012Z triton_convolution2d_3 0.0399 ms 52.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:26.1112250Z triton_convolution2d_5 0.0484 ms 43.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:26.1113522Z triton_convolution2d_1 0.0626 ms 33.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:26.1114547Z SingleProcess AUTOTUNE benchmarking takes 0.1023 seconds and 0.0003 seconds precompiling for 7 choices 2025-09-07T11:09:26.2233587Z Autotune Choices Stats: 2025-09-07T11:09:26.2234813Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_9", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.028991999104619026, "best_triton_pos": 0} 2025-09-07T11:09:26.2249546Z AUTOTUNE convolution(8x32x149x149, 32x32x3x3) 2025-09-07T11:09:26.2250170Z strides: [710432, 1, 4768, 32], [288, 1, 96, 32] 2025-09-07T11:09:26.2250542Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:26.2251370Z triton_convolution2d_9 0.0290 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:26.2252727Z triton_convolution2d_12 0.0301 ms 96.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:26.2254066Z triton_convolution2d_11 0.0315 ms 92.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:26.2255567Z triton_convolution2d_10 0.0321 ms 90.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:26.2256829Z triton_convolution2d_6 0.0394 ms 73.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:26.2258114Z triton_convolution2d_7 0.0409 ms 70.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:26.2259416Z triton_convolution2d_8 0.0636 ms 45.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:26.2260255Z convolution 0.1601 ms 18.1% 2025-09-07T11:09:26.2260847Z SingleProcess AUTOTUNE benchmarking takes 0.1130 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:26.3453411Z Autotune Choices Stats: 2025-09-07T11:09:26.3455246Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.0315839983522892, "best_triton_pos": 1, "best_triton_time": 0.035679999738931656, "best_triton_kernel": "triton_convolution2d_16", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T11:09:26.3469379Z AUTOTUNE convolution(8x32x147x147, 64x32x3x3) 2025-09-07T11:09:26.3469751Z strides: [691488, 1, 4704, 32], [288, 1, 96, 32] 2025-09-07T11:09:26.3470078Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:26.3470383Z convolution 0.0316 ms 100.0% 2025-09-07T11:09:26.3471114Z triton_convolution2d_16 0.0357 ms 88.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:26.3472386Z triton_convolution2d_19 0.0376 ms 83.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:26.3473662Z triton_convolution2d_17 0.0390 ms 81.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:26.3475089Z triton_convolution2d_18 0.0415 ms 76.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:26.3476635Z triton_convolution2d_14 0.0538 ms 58.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:26.3478080Z triton_convolution2d_13 0.0806 ms 39.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:26.3479365Z triton_convolution2d_15 0.1184 ms 26.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:26.3480402Z SingleProcess AUTOTUNE benchmarking takes 0.1215 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:26.5757529Z Autotune Choices Stats: 2025-09-07T11:09:26.5758602Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.010463999584317207, "best_triton_pos": 0} 2025-09-07T11:09:26.5773511Z AUTOTUNE mm(42632x64, 64x80) 2025-09-07T11:09:26.5773813Z strides: [64, 1], [1, 64] 2025-09-07T11:09:26.5774078Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:26.5774733Z triton_mm_31 0.0105 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:26.5775942Z triton_mm_32 0.0105 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:26.5776547Z mm 0.0108 ms 96.5% 2025-09-07T11:09:26.5777115Z triton_mm_36 0.0113 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:26.5778047Z triton_mm_27 0.0122 ms 86.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:26.5778913Z triton_mm_38 0.0122 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:26.5779801Z triton_mm_28 0.0124 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:26.5780684Z triton_mm_33 0.0124 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:26.5781642Z triton_mm_37 0.0126 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:26.5782561Z triton_mm_34 0.0126 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:26.5783366Z SingleProcess AUTOTUNE benchmarking takes 0.2291 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:26.7219588Z Autotune Choices Stats: 2025-09-07T11:09:26.7221086Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.04495999962091446, "best_triton_pos": 1, "best_triton_time": 0.062272001057863235, "best_triton_kernel": "triton_convolution2d_45", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T11:09:26.7235619Z AUTOTUNE convolution(8x80x73x73, 192x80x3x3) 2025-09-07T11:09:26.7236243Z strides: [426320, 1, 5840, 80], [720, 1, 240, 80] 2025-09-07T11:09:26.7236612Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:26.7236948Z convolution 0.0450 ms 100.0% 2025-09-07T11:09:26.7237948Z triton_convolution2d_45 0.0623 ms 72.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:26.7239277Z triton_convolution2d_44 0.0745 ms 60.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:26.7240561Z triton_convolution2d_42 0.0746 ms 60.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:26.7241870Z triton_convolution2d_43 0.0747 ms 60.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:26.7243158Z triton_convolution2d_39 0.0750 ms 59.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:26.7244346Z triton_convolution2d_40 0.0816 ms 55.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:26.7245703Z triton_convolution2d_41 0.1871 ms 24.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:26.7246670Z SingleProcess AUTOTUNE benchmarking takes 0.1457 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:26.9450020Z Autotune Choices Stats: 2025-09-07T11:09:26.9451111Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_57", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.008704000152647495, "best_triton_pos": 0} 2025-09-07T11:09:26.9467056Z AUTOTUNE mm(9800x192, 192x64) 2025-09-07T11:09:26.9467723Z strides: [192, 1], [1, 192] 2025-09-07T11:09:26.9468140Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:26.9469090Z triton_mm_57 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:26.9470093Z triton_mm_53 0.0090 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:26.9471074Z triton_mm_56 0.0092 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:26.9472095Z triton_mm_63 0.0093 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:26.9473131Z triton_mm_62 0.0093 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:26.9474155Z triton_mm_60 0.0094 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:26.9475482Z triton_mm_55 0.0095 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:26.9476441Z mm 0.0095 ms 91.3% 2025-09-07T11:09:26.9477256Z triton_mm_59 0.0096 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:26.9478256Z triton_mm_49 0.0097 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:26.9479172Z SingleProcess AUTOTUNE benchmarking takes 0.2217 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:09:27.1666091Z Autotune Choices Stats: 2025-09-07T11:09:27.1667205Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_75", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.008671999908983707, "best_triton_pos": 0} 2025-09-07T11:09:27.1682294Z AUTOTUNE mm(9800x192, 192x48) 2025-09-07T11:09:27.1682610Z strides: [192, 1], [1, 192] 2025-09-07T11:09:27.1682932Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:27.1683597Z triton_mm_75 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:27.1684234Z mm 0.0088 ms 98.2% 2025-09-07T11:09:27.1684829Z triton_mm_81 0.0089 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:27.1686078Z triton_mm_80 0.0090 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:27.1687034Z triton_mm_71 0.0090 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:27.1687971Z triton_mm_78 0.0091 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:27.1688924Z triton_mm_74 0.0092 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:27.1689867Z triton_mm_73 0.0095 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:27.1690801Z triton_mm_67 0.0096 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:27.1691740Z triton_mm_77 0.0096 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:27.1692605Z SingleProcess AUTOTUNE benchmarking takes 0.2210 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:09:27.2844309Z Autotune Choices Stats: 2025-09-07T11:09:27.2846043Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.022592000663280487, "best_triton_pos": 1, "best_triton_time": 0.038176000118255615, "best_triton_kernel": "triton_convolution2d_86", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=5, KERNEL_W=5, PADDING_H=2, PADDING_W=2, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:09:27.2861191Z AUTOTUNE convolution(8x48x35x35, 64x48x5x5) 2025-09-07T11:09:27.2861594Z strides: [58800, 1, 1680, 48], [1200, 1, 240, 48] 2025-09-07T11:09:27.2861926Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:27.2862248Z convolution 0.0226 ms 100.0% 2025-09-07T11:09:27.2863294Z triton_convolution2d_86 0.0382 ms 59.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=5, KERNEL_W=5, PADDING_H=2, PADDING_W=2, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:27.2864648Z triton_convolution2d_87 0.0405 ms 55.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=5, KERNEL_W=5, PADDING_H=2, PADDING_W=2, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:27.2866005Z triton_convolution2d_85 0.0445 ms 50.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=5, KERNEL_W=5, PADDING_H=2, PADDING_W=2, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:27.2867195Z triton_convolution2d_82 0.0521 ms 43.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=5, KERNEL_W=5, PADDING_H=2, PADDING_W=2, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:27.2868376Z triton_convolution2d_88 0.0524 ms 43.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=5, KERNEL_W=5, PADDING_H=2, PADDING_W=2, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:27.2869584Z triton_convolution2d_83 0.0589 ms 38.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=5, KERNEL_W=5, PADDING_H=2, PADDING_W=2, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:27.2870776Z triton_convolution2d_84 0.0848 ms 26.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=5, KERNEL_W=5, PADDING_H=2, PADDING_W=2, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:27.2871732Z SingleProcess AUTOTUNE benchmarking takes 0.1174 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:27.3883749Z Autotune Choices Stats: 2025-09-07T11:09:27.3885525Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.015104000456631184, "best_triton_pos": 1, "best_triton_time": 0.016672000288963318, "best_triton_kernel": "triton_convolution2d_111", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:09:27.3900857Z AUTOTUNE convolution(8x64x35x35, 96x64x3x3) 2025-09-07T11:09:27.3901232Z strides: [78400, 1, 2240, 64], [576, 1, 192, 64] 2025-09-07T11:09:27.3901668Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:27.3902022Z convolution 0.0151 ms 100.0% 2025-09-07T11:09:27.3902887Z triton_convolution2d_111 0.0167 ms 90.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:27.3904283Z triton_convolution2d_112 0.0172 ms 87.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:27.3905802Z triton_convolution2d_110 0.0179 ms 84.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:27.3907121Z triton_convolution2d_107 0.0228 ms 66.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:27.3908425Z triton_convolution2d_113 0.0233 ms 64.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:27.3909696Z triton_convolution2d_108 0.0261 ms 57.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:27.3911526Z triton_convolution2d_109 0.0499 ms 30.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:27.3912687Z SingleProcess AUTOTUNE benchmarking takes 0.1019 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:27.4933148Z Autotune Choices Stats: 2025-09-07T11:09:27.4934600Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.0163199994713068, "best_triton_pos": 1, "best_triton_time": 0.023455999791622162, "best_triton_kernel": "triton_convolution2d_119", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T11:09:27.4950560Z AUTOTUNE convolution(8x96x35x35, 96x96x3x3) 2025-09-07T11:09:27.4950916Z strides: [117600, 1, 3360, 96], [864, 1, 288, 96] 2025-09-07T11:09:27.4951218Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:27.4951518Z convolution 0.0163 ms 100.0% 2025-09-07T11:09:27.4952305Z triton_convolution2d_119 0.0235 ms 69.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:27.4953615Z triton_convolution2d_118 0.0239 ms 68.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:27.4954901Z triton_convolution2d_117 0.0292 ms 55.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:27.4956342Z triton_convolution2d_120 0.0300 ms 54.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:27.4957636Z triton_convolution2d_114 0.0346 ms 47.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:27.4958931Z triton_convolution2d_115 0.0372 ms 43.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:27.4960228Z triton_convolution2d_116 0.0607 ms 26.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:27.4961265Z SingleProcess AUTOTUNE benchmarking takes 0.1036 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:27.7050973Z Autotune Choices Stats: 2025-09-07T11:09:27.7052090Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_131", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.008832000195980072, "best_triton_pos": 0} 2025-09-07T11:09:27.7068877Z AUTOTUNE mm(9800x192, 192x32) 2025-09-07T11:09:27.7069168Z strides: [192, 1], [1, 192] 2025-09-07T11:09:27.7069474Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:27.7070118Z triton_mm_131 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:27.7071017Z triton_mm_128 0.0089 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:27.7072216Z triton_mm_132 0.0090 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:27.7073411Z triton_mm_130 0.0092 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:27.7074090Z mm 0.0092 ms 95.8% 2025-09-07T11:09:27.7074732Z triton_mm_123 0.0092 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:27.7076035Z triton_mm_124 0.0092 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:27.7077034Z triton_mm_137 0.0093 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:27.7078080Z triton_mm_136 0.0093 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:27.7079096Z triton_mm_129 0.0094 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:27.7079996Z SingleProcess AUTOTUNE benchmarking takes 0.2106 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:09:27.9303762Z Autotune Choices Stats: 2025-09-07T11:09:27.9304859Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_149", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.009535999968647957, "best_triton_pos": 0} 2025-09-07T11:09:27.9322823Z AUTOTUNE mm(9800x256, 256x64) 2025-09-07T11:09:27.9323112Z strides: [256, 1], [1, 256] 2025-09-07T11:09:27.9323426Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:27.9324111Z triton_mm_149 0.0095 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:27.9324763Z mm 0.0099 ms 96.8% 2025-09-07T11:09:27.9325485Z triton_mm_155 0.0099 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:27.9326417Z triton_mm_145 0.0099 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:27.9327368Z triton_mm_139 0.0101 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:27.9328325Z triton_mm_148 0.0101 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:27.9329270Z triton_mm_147 0.0103 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:27.9330219Z triton_mm_154 0.0105 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:27.9331165Z triton_mm_152 0.0105 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:27.9332113Z triton_mm_141 0.0107 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:27.9333195Z SingleProcess AUTOTUNE benchmarking takes 0.2248 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:09:28.1523730Z Autotune Choices Stats: 2025-09-07T11:09:28.1524862Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_167", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.009631999768316746, "best_triton_pos": 0} 2025-09-07T11:09:28.1542125Z AUTOTUNE mm(9800x256, 256x48) 2025-09-07T11:09:28.1542403Z strides: [256, 1], [1, 256] 2025-09-07T11:09:28.1542772Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:28.1543586Z triton_mm_167 0.0096 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:28.1544708Z triton_mm_173 0.0098 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:28.1545646Z mm 0.0099 ms 97.4% 2025-09-07T11:09:28.1546327Z triton_mm_163 0.0099 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:28.1547370Z triton_mm_157 0.0101 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:28.1548416Z triton_mm_172 0.0103 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:28.1549470Z triton_mm_166 0.0104 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:28.1550523Z triton_mm_159 0.0105 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:28.1551560Z triton_mm_170 0.0106 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:28.1552750Z triton_mm_165 0.0107 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:28.1553630Z SingleProcess AUTOTUNE benchmarking takes 0.2213 seconds and 0.0003 seconds precompiling for 19 choices 2025-09-07T11:09:28.3809570Z Autotune Choices Stats: 2025-09-07T11:09:28.3810699Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_248", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.01033599954098463, "best_triton_pos": 0} 2025-09-07T11:09:28.3828025Z AUTOTUNE mm(9800x288, 288x64) 2025-09-07T11:09:28.3828366Z strides: [288, 1], [1, 288] 2025-09-07T11:09:28.3828664Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:28.3829306Z triton_mm_248 0.0103 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:28.3830295Z triton_mm_238 0.0104 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:28.3831253Z triton_mm_241 0.0104 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:28.3832503Z triton_mm_242 0.0105 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:28.3833171Z mm 0.0105 ms 98.5% 2025-09-07T11:09:28.3833975Z triton_mm_245 0.0108 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:28.3835305Z triton_mm_247 0.0109 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:28.3836307Z triton_mm_234 0.0111 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:28.3837330Z triton_mm_240 0.0111 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:28.3838352Z triton_mm_244 0.0115 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:28.3839253Z SingleProcess AUTOTUNE benchmarking takes 0.2221 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:09:28.6022029Z Autotune Choices Stats: 2025-09-07T11:09:28.6023405Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "mm", "best_time": 0.010048000141978264, "best_triton_pos": 1, "best_triton_time": 0.01017600018531084, "best_triton_kernel": "triton_mm_266", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:09:28.6041291Z AUTOTUNE mm(9800x288, 288x48) 2025-09-07T11:09:28.6041628Z strides: [288, 1], [1, 288] 2025-09-07T11:09:28.6041956Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:28.6042256Z mm 0.0100 ms 100.0% 2025-09-07T11:09:28.6042928Z triton_mm_266 0.0102 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:28.6043977Z triton_mm_263 0.0104 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:28.6045084Z triton_mm_260 0.0105 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:28.6046043Z triton_mm_259 0.0106 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:28.6046994Z triton_mm_256 0.0108 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:28.6047919Z triton_mm_252 0.0109 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:28.6048874Z triton_mm_265 0.0110 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:28.6049821Z triton_mm_258 0.0111 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:28.6050761Z triton_mm_262 0.0114 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:28.6051606Z SingleProcess AUTOTUNE benchmarking takes 0.2208 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:09:28.7673365Z Autotune Choices Stats: 2025-09-07T11:09:28.7675254Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.023520000278949738, "best_triton_pos": 1, "best_triton_time": 0.06412799656391144, "best_triton_kernel": "triton_convolution2d_328", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:09:28.7693287Z AUTOTUNE convolution(8x288x35x35, 384x288x3x3) 2025-09-07T11:09:28.7693749Z strides: [352800, 1, 10080, 288], [2592, 1, 864, 288] 2025-09-07T11:09:28.7694163Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:28.7694541Z convolution 0.0235 ms 100.0% 2025-09-07T11:09:28.7695540Z triton_convolution2d_328 0.0641 ms 36.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:28.7696834Z triton_convolution2d_327 0.0772 ms 30.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:28.7698127Z triton_convolution2d_330 0.0805 ms 29.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:28.7699421Z triton_convolution2d_329 0.0816 ms 28.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:28.7700710Z triton_convolution2d_325 0.1028 ms 22.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:28.7702111Z triton_convolution2d_324 0.1152 ms 20.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:28.7703410Z triton_convolution2d_326 0.2213 ms 10.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:28.7704373Z SingleProcess AUTOTUNE benchmarking takes 0.1578 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:28.8771983Z Autotune Choices Stats: 2025-09-07T11:09:28.8773448Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.013535999692976475, "best_triton_pos": 1, "best_triton_time": 0.02287999913096428, "best_triton_kernel": "triton_convolution2d_361", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T11:09:28.8791694Z AUTOTUNE convolution(8x96x35x35, 96x96x3x3) 2025-09-07T11:09:28.8792038Z strides: [117600, 1, 3360, 96], [864, 1, 288, 96] 2025-09-07T11:09:28.8792395Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:28.8792733Z convolution 0.0135 ms 100.0% 2025-09-07T11:09:28.8793533Z triton_convolution2d_361 0.0229 ms 59.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:28.8794797Z triton_convolution2d_360 0.0229 ms 59.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:28.8796235Z triton_convolution2d_359 0.0284 ms 47.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:28.8797920Z triton_convolution2d_362 0.0299 ms 45.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:28.8799229Z triton_convolution2d_356 0.0338 ms 40.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:28.8800512Z triton_convolution2d_357 0.0387 ms 35.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:28.8801807Z triton_convolution2d_358 0.0796 ms 17.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:28.8802853Z SingleProcess AUTOTUNE benchmarking takes 0.1062 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:29.1132679Z Autotune Choices Stats: 2025-09-07T11:09:29.1133765Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_371", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009312000125646591, "best_triton_pos": 0} 2025-09-07T11:09:29.1152667Z AUTOTUNE mm(2312x768, 768x192) 2025-09-07T11:09:29.1152991Z strides: [768, 1], [1, 768] 2025-09-07T11:09:29.1153278Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:29.1154050Z triton_mm_371 0.0093 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:29.1154761Z mm 0.0094 ms 99.0% 2025-09-07T11:09:29.1155537Z triton_mm_375 0.0104 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:29.1156575Z triton_mm_370 0.0112 ms 82.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:29.1157600Z triton_mm_367 0.0114 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:29.1158625Z triton_mm_381 0.0117 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:29.1159623Z triton_mm_374 0.0118 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:29.1160646Z triton_mm_366 0.0122 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:29.1161665Z triton_mm_364 0.0126 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:29.1162688Z triton_mm_373 0.0128 ms 72.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:29.1163591Z SingleProcess AUTOTUNE benchmarking takes 0.2347 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:29.3506629Z Autotune Choices Stats: 2025-09-07T11:09:29.3507977Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_390", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009184000082314014, "best_triton_pos": 0} 2025-09-07T11:09:29.3527175Z AUTOTUNE mm(2312x768, 768x128) 2025-09-07T11:09:29.3527474Z strides: [768, 1], [1, 768] 2025-09-07T11:09:29.3527778Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:29.3528456Z triton_mm_390 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:29.3529094Z mm 0.0092 ms 99.7% 2025-09-07T11:09:29.3529654Z triton_mm_394 0.0104 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:29.3530626Z triton_mm_386 0.0112 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:29.3531600Z triton_mm_389 0.0112 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:29.3532553Z triton_mm_400 0.0116 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:29.3533519Z triton_mm_393 0.0117 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:29.3534473Z triton_mm_385 0.0121 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:29.3535712Z triton_mm_383 0.0122 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:29.3536714Z triton_mm_399 0.0128 ms 71.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:29.3537571Z SingleProcess AUTOTUNE benchmarking takes 0.2369 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:29.4569929Z Autotune Choices Stats: 2025-09-07T11:09:29.4571478Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.012640000320971012, "best_triton_pos": 1, "best_triton_time": 0.024224000051617622, "best_triton_kernel": "triton_convolution2d_405", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:09:29.4590728Z AUTOTUNE convolution(8x128x17x17, 128x128x1x7) 2025-09-07T11:09:29.4591090Z strides: [36992, 1, 2176, 128], [896, 1, 896, 128] 2025-09-07T11:09:29.4591454Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:29.4591773Z convolution 0.0126 ms 100.0% 2025-09-07T11:09:29.4592499Z triton_convolution2d_405 0.0242 ms 52.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:29.4593790Z triton_convolution2d_406 0.0263 ms 48.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:29.4595366Z triton_convolution2d_404 0.0298 ms 42.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:29.4596888Z triton_convolution2d_407 0.0301 ms 42.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:29.4598319Z triton_convolution2d_401 0.0340 ms 37.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:29.4599613Z triton_convolution2d_402 0.0363 ms 34.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:29.4600906Z triton_convolution2d_403 0.0724 ms 17.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:29.4601939Z SingleProcess AUTOTUNE benchmarking takes 0.1059 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:29.5639733Z Autotune Choices Stats: 2025-09-07T11:09:29.5641167Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.012608000077307224, "best_triton_pos": 1, "best_triton_time": 0.023840000852942467, "best_triton_kernel": "triton_convolution2d_412", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:09:29.5660418Z AUTOTUNE convolution(8x128x17x17, 192x128x7x1) 2025-09-07T11:09:29.5660760Z strides: [36992, 1, 2176, 128], [896, 1, 128, 128] 2025-09-07T11:09:29.5661092Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:29.5661459Z convolution 0.0126 ms 100.0% 2025-09-07T11:09:29.5662155Z triton_convolution2d_412 0.0238 ms 52.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:29.5663437Z triton_convolution2d_411 0.0281 ms 44.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:29.5664760Z triton_convolution2d_413 0.0289 ms 43.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:29.5666360Z triton_convolution2d_414 0.0295 ms 42.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:29.5667664Z triton_convolution2d_409 0.0349 ms 36.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:29.5668973Z triton_convolution2d_408 0.0362 ms 34.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:29.5670301Z triton_convolution2d_410 0.0708 ms 17.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:29.5671359Z SingleProcess AUTOTUNE benchmarking takes 0.1054 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:29.6712827Z Autotune Choices Stats: 2025-09-07T11:09:29.6714349Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.012543999589979649, "best_triton_pos": 1, "best_triton_time": 0.024224000051617622, "best_triton_kernel": "triton_convolution2d_438", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:09:29.6733939Z AUTOTUNE convolution(8x128x17x17, 128x128x7x1) 2025-09-07T11:09:29.6734380Z strides: [36992, 1, 2176, 128], [896, 1, 128, 128] 2025-09-07T11:09:29.6734779Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:29.6735352Z convolution 0.0125 ms 100.0% 2025-09-07T11:09:29.6736174Z triton_convolution2d_438 0.0242 ms 51.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:29.6737524Z triton_convolution2d_439 0.0261 ms 48.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:29.6738858Z triton_convolution2d_437 0.0285 ms 44.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:29.6740179Z triton_convolution2d_440 0.0296 ms 42.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:29.6741575Z triton_convolution2d_434 0.0328 ms 38.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:29.6742967Z triton_convolution2d_435 0.0354 ms 35.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:29.6744116Z triton_convolution2d_436 0.0719 ms 17.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:29.6745143Z SingleProcess AUTOTUNE benchmarking takes 0.1052 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:29.7819872Z Autotune Choices Stats: 2025-09-07T11:09:29.7821323Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.013407999649643898, "best_triton_pos": 1, "best_triton_time": 0.024032000452280045, "best_triton_kernel": "triton_convolution2d_459", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:09:29.7844840Z AUTOTUNE convolution(8x128x17x17, 192x128x1x7) 2025-09-07T11:09:29.7845502Z strides: [36992, 1, 2176, 128], [896, 1, 896, 128] 2025-09-07T11:09:29.7845917Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:29.7846295Z convolution 0.0134 ms 100.0% 2025-09-07T11:09:29.7847115Z triton_convolution2d_459 0.0240 ms 55.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:29.7848446Z triton_convolution2d_458 0.0292 ms 46.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:29.7849757Z triton_convolution2d_460 0.0295 ms 45.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:29.7851071Z triton_convolution2d_461 0.0295 ms 45.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:29.7852819Z triton_convolution2d_456 0.0360 ms 37.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:29.7854188Z triton_convolution2d_455 0.0373 ms 36.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:29.7855622Z triton_convolution2d_457 0.0704 ms 19.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:29.7856656Z SingleProcess AUTOTUNE benchmarking takes 0.1061 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:30.0211599Z Autotune Choices Stats: 2025-09-07T11:09:30.0212786Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.00940799992531538, "best_triton_pos": 1, "best_triton_time": 0.009440000168979168, "best_triton_kernel": "triton_mm_508", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:09:30.0232861Z AUTOTUNE mm(2312x768, 768x160) 2025-09-07T11:09:30.0233176Z strides: [768, 1], [1, 768] 2025-09-07T11:09:30.0233507Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:30.0233813Z mm 0.0094 ms 100.0% 2025-09-07T11:09:30.0234459Z triton_mm_508 0.0094 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:30.0235721Z triton_mm_512 0.0105 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:30.0236756Z triton_mm_507 0.0114 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:30.0237779Z triton_mm_504 0.0115 ms 82.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:30.0238813Z triton_mm_518 0.0117 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:30.0239834Z triton_mm_511 0.0120 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:30.0240817Z triton_mm_503 0.0123 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:30.0241839Z triton_mm_501 0.0124 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.0242864Z triton_mm_517 0.0128 ms 73.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:30.0243770Z SingleProcess AUTOTUNE benchmarking takes 0.2362 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:30.1340338Z Autotune Choices Stats: 2025-09-07T11:09:30.1341883Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.014303999952971935, "best_triton_pos": 1, "best_triton_time": 0.028255999088287354, "best_triton_kernel": "triton_convolution2d_523", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:09:30.1361826Z AUTOTUNE convolution(8x160x17x17, 160x160x1x7) 2025-09-07T11:09:30.1362391Z strides: [46240, 1, 2720, 160], [1120, 1, 1120, 160] 2025-09-07T11:09:30.1362776Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:30.1363158Z convolution 0.0143 ms 100.0% 2025-09-07T11:09:30.1364146Z triton_convolution2d_523 0.0283 ms 50.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.1365795Z triton_convolution2d_522 0.0341 ms 41.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.1367137Z triton_convolution2d_525 0.0345 ms 41.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.1368468Z triton_convolution2d_524 0.0350 ms 40.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.1369783Z triton_convolution2d_520 0.0443 ms 32.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.1371132Z triton_convolution2d_519 0.0456 ms 31.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.1372478Z triton_convolution2d_521 0.0902 ms 15.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:30.1373570Z SingleProcess AUTOTUNE benchmarking takes 0.1114 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:30.2463514Z Autotune Choices Stats: 2025-09-07T11:09:30.2465262Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01568000018596649, "best_triton_pos": 1, "best_triton_time": 0.02800000086426735, "best_triton_kernel": "triton_convolution2d_530", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:09:30.2485198Z AUTOTUNE convolution(8x160x17x17, 192x160x7x1) 2025-09-07T11:09:30.2485668Z strides: [46240, 1, 2720, 160], [1120, 1, 160, 160] 2025-09-07T11:09:30.2486078Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:30.2486432Z convolution 0.0157 ms 100.0% 2025-09-07T11:09:30.2487274Z triton_convolution2d_530 0.0280 ms 56.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.2488592Z triton_convolution2d_532 0.0337 ms 46.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.2489891Z triton_convolution2d_531 0.0352 ms 44.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.2491215Z triton_convolution2d_529 0.0352 ms 44.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.2493035Z triton_convolution2d_527 0.0434 ms 36.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.2494407Z triton_convolution2d_526 0.0471 ms 33.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.2495836Z triton_convolution2d_528 0.0853 ms 18.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:30.2496841Z SingleProcess AUTOTUNE benchmarking takes 0.1109 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:30.3602279Z Autotune Choices Stats: 2025-09-07T11:09:30.3603907Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.015231999568641186, "best_triton_pos": 1, "best_triton_time": 0.028031999245285988, "best_triton_kernel": "triton_convolution2d_556", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:09:30.3623873Z AUTOTUNE convolution(8x160x17x17, 160x160x7x1) 2025-09-07T11:09:30.3624322Z strides: [46240, 1, 2720, 160], [1120, 1, 160, 160] 2025-09-07T11:09:30.3624736Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:30.3625207Z convolution 0.0152 ms 100.0% 2025-09-07T11:09:30.3626037Z triton_convolution2d_556 0.0280 ms 54.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.3627350Z triton_convolution2d_555 0.0340 ms 44.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.3628653Z triton_convolution2d_558 0.0340 ms 44.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.3629939Z triton_convolution2d_557 0.0341 ms 44.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.3631241Z triton_convolution2d_553 0.0426 ms 35.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.3632532Z triton_convolution2d_552 0.0449 ms 33.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.3634011Z triton_convolution2d_554 0.0883 ms 17.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:30.3635164Z SingleProcess AUTOTUNE benchmarking takes 0.1108 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:30.4770882Z Autotune Choices Stats: 2025-09-07T11:09:30.4772304Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.0144640002399683, "best_triton_pos": 1, "best_triton_time": 0.028704000636935234, "best_triton_kernel": "triton_convolution2d_577", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:09:30.4792726Z AUTOTUNE convolution(8x160x17x17, 192x160x1x7) 2025-09-07T11:09:30.4793133Z strides: [46240, 1, 2720, 160], [1120, 1, 1120, 160] 2025-09-07T11:09:30.4793505Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:30.4793972Z convolution 0.0145 ms 100.0% 2025-09-07T11:09:30.4794766Z triton_convolution2d_577 0.0287 ms 50.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.4796331Z triton_convolution2d_576 0.0352 ms 41.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.4797622Z triton_convolution2d_578 0.0355 ms 40.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.4798901Z triton_convolution2d_579 0.0372 ms 38.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.4800175Z triton_convolution2d_574 0.0451 ms 32.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.4801443Z triton_convolution2d_573 0.0476 ms 30.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.4802691Z triton_convolution2d_575 0.0886 ms 16.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:30.4803733Z SingleProcess AUTOTUNE benchmarking takes 0.1120 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:30.6139276Z Autotune Choices Stats: 2025-09-07T11:09:30.6140751Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01500799972563982, "best_triton_pos": 1, "best_triton_time": 0.03433600068092346, "best_triton_kernel": "triton_convolution2d_759", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:09:30.6161593Z AUTOTUNE convolution(8x192x17x17, 192x192x1x7) 2025-09-07T11:09:30.6161990Z strides: [55488, 1, 3264, 192], [1344, 1, 1344, 192] 2025-09-07T11:09:30.6162370Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:30.6162706Z convolution 0.0150 ms 100.0% 2025-09-07T11:09:30.6163656Z triton_convolution2d_759 0.0343 ms 43.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.6165386Z triton_convolution2d_758 0.0412 ms 36.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.6166725Z triton_convolution2d_760 0.0419 ms 35.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.6168038Z triton_convolution2d_761 0.0428 ms 35.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.6169340Z triton_convolution2d_756 0.0530 ms 28.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.6171041Z triton_convolution2d_755 0.0544 ms 27.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.6172351Z triton_convolution2d_757 0.1068 ms 14.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:30.6173440Z SingleProcess AUTOTUNE benchmarking takes 0.1181 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:30.7325909Z Autotune Choices Stats: 2025-09-07T11:09:30.7327265Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.014879999682307243, "best_triton_pos": 1, "best_triton_time": 0.03296000137925148, "best_triton_kernel": "triton_convolution2d_766", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:09:30.7347811Z AUTOTUNE convolution(8x192x17x17, 192x192x7x1) 2025-09-07T11:09:30.7348243Z strides: [55488, 1, 3264, 192], [1344, 1, 192, 192] 2025-09-07T11:09:30.7348640Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:30.7348970Z convolution 0.0149 ms 100.0% 2025-09-07T11:09:30.7349820Z triton_convolution2d_766 0.0330 ms 45.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.7351145Z triton_convolution2d_768 0.0399 ms 37.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.7352513Z triton_convolution2d_765 0.0407 ms 36.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.7354002Z triton_convolution2d_767 0.0418 ms 35.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.7355576Z triton_convolution2d_763 0.0528 ms 28.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.7356864Z triton_convolution2d_762 0.0535 ms 27.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.7358171Z triton_convolution2d_764 0.1044 ms 14.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:30.7359209Z SingleProcess AUTOTUNE benchmarking takes 0.1171 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:30.8733869Z Autotune Choices Stats: 2025-09-07T11:09:30.8735722Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.012736000120639801, "best_triton_pos": 1, "best_triton_time": 0.04416000097990036, "best_triton_kernel": "triton_convolution2d_858", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:09:30.8756460Z AUTOTUNE convolution(8x192x17x17, 320x192x3x3) 2025-09-07T11:09:30.8757102Z strides: [55488, 1, 3264, 192], [1728, 1, 576, 192] 2025-09-07T11:09:30.8757468Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:30.8757778Z convolution 0.0127 ms 100.0% 2025-09-07T11:09:30.8758733Z triton_convolution2d_858 0.0442 ms 28.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.8760061Z triton_convolution2d_857 0.0535 ms 23.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.8761387Z triton_convolution2d_860 0.0537 ms 23.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.8762680Z triton_convolution2d_859 0.0549 ms 23.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:30.8763973Z triton_convolution2d_855 0.0758 ms 16.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.8765348Z triton_convolution2d_854 0.0799 ms 15.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:30.8766545Z triton_convolution2d_856 0.1224 ms 10.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:30.8777792Z SingleProcess AUTOTUNE benchmarking takes 0.1298 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:31.0083951Z Autotune Choices Stats: 2025-09-07T11:09:31.0085790Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01244799979031086, "best_triton_pos": 1, "best_triton_time": 0.04291199892759323, "best_triton_kernel": "triton_convolution2d_898", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:09:31.0106483Z AUTOTUNE convolution(8x192x17x17, 192x192x3x3) 2025-09-07T11:09:31.0106915Z strides: [55488, 1, 3264, 192], [1728, 1, 576, 192] 2025-09-07T11:09:31.0107313Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:31.0107691Z convolution 0.0124 ms 100.0% 2025-09-07T11:09:31.0108519Z triton_convolution2d_898 0.0429 ms 29.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:31.0109858Z triton_convolution2d_897 0.0534 ms 23.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:31.0111201Z triton_convolution2d_899 0.0541 ms 23.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:31.0112517Z triton_convolution2d_900 0.0541 ms 23.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:31.0113971Z triton_convolution2d_894 0.0748 ms 16.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:31.0115810Z triton_convolution2d_895 0.0766 ms 16.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:31.0117116Z triton_convolution2d_896 0.1207 ms 10.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:31.0118158Z SingleProcess AUTOTUNE benchmarking takes 0.1290 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:31.2494516Z Autotune Choices Stats: 2025-09-07T11:09:31.2495801Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_905", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009151999838650227, "best_triton_pos": 0} 2025-09-07T11:09:31.2517832Z AUTOTUNE mm(512x1280, 1280x320) 2025-09-07T11:09:31.2518204Z strides: [1280, 1], [1, 1280] 2025-09-07T11:09:31.2518503Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:31.2519227Z triton_mm_905 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:31.2519931Z mm 0.0093 ms 97.9% 2025-09-07T11:09:31.2520564Z triton_mm_909 0.0098 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:31.2521596Z triton_mm_913 0.0112 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:31.2522615Z triton_mm_904 0.0128 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:31.2523651Z triton_mm_919 0.0133 ms 68.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:31.2524618Z triton_mm_908 0.0134 ms 68.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:31.2525698Z triton_mm_903 0.0134 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:31.2526642Z triton_mm_912 0.0139 ms 65.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:31.2527590Z triton_mm_902 0.0142 ms 64.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:31.2528438Z SingleProcess AUTOTUNE benchmarking takes 0.2396 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:31.4871482Z Autotune Choices Stats: 2025-09-07T11:09:31.4872547Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_924", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009247999638319016, "best_triton_pos": 0} 2025-09-07T11:09:31.4894445Z AUTOTUNE mm(512x1280, 1280x384) 2025-09-07T11:09:31.4894873Z strides: [1280, 1], [1, 1280] 2025-09-07T11:09:31.4895578Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:31.4896355Z triton_mm_924 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:31.4897347Z mm 0.0096 ms 96.7% 2025-09-07T11:09:31.4898196Z triton_mm_928 0.0098 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:31.4899262Z triton_mm_932 0.0111 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:31.4900307Z triton_mm_923 0.0130 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:31.4901418Z triton_mm_927 0.0133 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:31.4902476Z triton_mm_938 0.0134 ms 68.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:31.4903561Z triton_mm_922 0.0135 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:31.4904490Z triton_mm_931 0.0142 ms 65.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:31.4905526Z triton_mm_921 0.0143 ms 64.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:31.4906323Z SingleProcess AUTOTUNE benchmarking takes 0.2371 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:31.5997997Z Autotune Choices Stats: 2025-09-07T11:09:31.5999446Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01228800043463707, "best_triton_pos": 1, "best_triton_time": 0.02940800040960312, "best_triton_kernel": "triton_convolution2d_943", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=3, PADDING_H=0, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:09:31.6021211Z AUTOTUNE convolution(8x384x8x8, 384x384x1x3) 2025-09-07T11:09:31.6021624Z strides: [24576, 1, 3072, 384], [1152, 1, 1152, 384] 2025-09-07T11:09:31.6021935Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:31.6022241Z convolution 0.0123 ms 100.0% 2025-09-07T11:09:31.6022947Z triton_convolution2d_943 0.0294 ms 41.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=3, PADDING_H=0, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:31.6024191Z triton_convolution2d_942 0.0363 ms 33.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=3, PADDING_H=0, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:31.6025931Z triton_convolution2d_945 0.0382 ms 32.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=3, PADDING_H=0, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:31.6027296Z triton_convolution2d_944 0.0388 ms 31.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=3, PADDING_H=0, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:31.6029073Z triton_convolution2d_940 0.0515 ms 23.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=3, PADDING_H=0, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:31.6030385Z triton_convolution2d_939 0.0558 ms 22.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=3, PADDING_H=0, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:31.6032157Z triton_convolution2d_941 0.0712 ms 17.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=3, PADDING_H=0, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:31.6033235Z SingleProcess AUTOTUNE benchmarking takes 0.1121 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:31.7121700Z Autotune Choices Stats: 2025-09-07T11:09:31.7123121Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.012032000347971916, "best_triton_pos": 1, "best_triton_time": 0.028991999104619026, "best_triton_kernel": "triton_convolution2d_950", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=1, PADDING_H=1, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:09:31.7145376Z AUTOTUNE convolution(8x384x8x8, 384x384x3x1) 2025-09-07T11:09:31.7145836Z strides: [24576, 1, 3072, 384], [1152, 1, 384, 384] 2025-09-07T11:09:31.7146252Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:31.7146626Z convolution 0.0120 ms 100.0% 2025-09-07T11:09:31.7147467Z triton_convolution2d_950 0.0290 ms 41.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=1, PADDING_H=1, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:31.7148748Z triton_convolution2d_949 0.0368 ms 32.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=1, PADDING_H=1, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:31.7150071Z triton_convolution2d_952 0.0368 ms 32.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=1, PADDING_H=1, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:31.7151401Z triton_convolution2d_951 0.0385 ms 31.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=1, PADDING_H=1, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:31.7152743Z triton_convolution2d_947 0.0518 ms 23.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=1, PADDING_H=1, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:31.7154199Z triton_convolution2d_946 0.0525 ms 22.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=1, PADDING_H=1, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:31.7155663Z triton_convolution2d_948 0.0679 ms 17.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=1, PADDING_H=1, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:31.7156710Z SingleProcess AUTOTUNE benchmarking takes 0.1106 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:31.9484118Z Autotune Choices Stats: 2025-09-07T11:09:31.9485605Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_957", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009279999881982803, "best_triton_pos": 0} 2025-09-07T11:09:31.9507962Z AUTOTUNE mm(512x1280, 1280x448) 2025-09-07T11:09:31.9508324Z strides: [1280, 1], [1, 1280] 2025-09-07T11:09:31.9508649Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:31.9509362Z triton_mm_957 0.0093 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:31.9510055Z mm 0.0095 ms 97.3% 2025-09-07T11:09:31.9510657Z triton_mm_961 0.0098 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:31.9512110Z triton_mm_965 0.0113 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:31.9513139Z triton_mm_956 0.0131 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:31.9514176Z triton_mm_960 0.0134 ms 69.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:31.9515369Z triton_mm_971 0.0135 ms 68.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:31.9516398Z triton_mm_955 0.0135 ms 68.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:31.9517431Z triton_mm_964 0.0140 ms 66.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:31.9518431Z triton_mm_954 0.0143 ms 64.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:31.9519350Z SingleProcess AUTOTUNE benchmarking takes 0.2346 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:32.1369427Z Autotune Choices Stats: 2025-09-07T11:09:32.1370895Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.016543999314308167, "best_triton_pos": 1, "best_triton_time": 0.09679999947547913, "best_triton_kernel": "triton_convolution2d_976", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:09:32.1393454Z AUTOTUNE convolution(8x448x8x8, 384x448x3x3) 2025-09-07T11:09:32.1393863Z strides: [28672, 1, 3584, 448], [4032, 1, 1344, 448] 2025-09-07T11:09:32.1394207Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:32.1394554Z convolution 0.0165 ms 100.0% 2025-09-07T11:09:32.1395498Z triton_convolution2d_976 0.0968 ms 17.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:32.1396799Z triton_convolution2d_975 0.1180 ms 14.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:32.1398093Z triton_convolution2d_977 0.1219 ms 13.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:32.1399400Z triton_convolution2d_978 0.1253 ms 13.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:09:32.1400655Z triton_convolution2d_973 0.1720 ms 9.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:32.1401936Z triton_convolution2d_972 0.1819 ms 9.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:09:32.1403216Z triton_convolution2d_974 0.2236 ms 7.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:09:32.1404610Z SingleProcess AUTOTUNE benchmarking takes 0.1880 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:09:32.3787198Z Autotune Choices Stats: 2025-09-07T11:09:32.3788560Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.009151999838650227, "best_triton_pos": 1, "best_triton_time": 0.009184000082314014, "best_triton_kernel": "triton_mm_997", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:09:32.3811589Z AUTOTUNE mm(512x1280, 1280x192) 2025-09-07T11:09:32.3811873Z strides: [1280, 1], [1, 1280] 2025-09-07T11:09:32.3812180Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:32.3812492Z mm 0.0092 ms 100.0% 2025-09-07T11:09:32.3813131Z triton_mm_997 0.0092 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:32.3814224Z triton_mm_1001 0.0096 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:32.3815667Z triton_mm_1005 0.0109 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:32.3816711Z triton_mm_996 0.0126 ms 72.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:32.3817759Z triton_mm_1011 0.0132 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:32.3818849Z triton_mm_1000 0.0132 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:32.3819896Z triton_mm_995 0.0133 ms 68.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:32.3820941Z triton_mm_1004 0.0137 ms 67.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:32.3822067Z triton_mm_994 0.0139 ms 65.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:32.3823000Z SingleProcess AUTOTUNE benchmarking takes 0.2362 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:32.6223757Z Autotune Choices Stats: 2025-09-07T11:09:32.6225380Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.010208000428974628, "best_triton_pos": 1, "best_triton_time": 0.010975999757647514, "best_triton_kernel": "triton_mm_1016", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:09:32.6248506Z AUTOTUNE mm(512x2048, 2048x320) 2025-09-07T11:09:32.6248879Z strides: [2048, 1], [1, 2048] 2025-09-07T11:09:32.6249230Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:32.6249586Z mm 0.0102 ms 100.0% 2025-09-07T11:09:32.6250238Z triton_mm_1016 0.0110 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:32.6251302Z triton_mm_1020 0.0118 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:32.6253004Z triton_mm_1024 0.0132 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:32.6254174Z triton_mm_1030 0.0175 ms 58.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:32.6255300Z triton_mm_1014 0.0177 ms 57.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:32.6256263Z triton_mm_1015 0.0178 ms 57.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:32.6257184Z triton_mm_1013 0.0187 ms 54.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:32.6258128Z triton_mm_1019 0.0187 ms 54.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:32.6259088Z triton_mm_1023 0.0189 ms 54.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:32.6259931Z SingleProcess AUTOTUNE benchmarking takes 0.2421 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:32.8655947Z Autotune Choices Stats: 2025-09-07T11:09:32.8657260Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.010816000401973724, "best_triton_pos": 1, "best_triton_time": 0.010944000445306301, "best_triton_kernel": "triton_mm_1035", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:09:32.8681159Z AUTOTUNE mm(512x2048, 2048x384) 2025-09-07T11:09:32.8681439Z strides: [2048, 1], [1, 2048] 2025-09-07T11:09:32.8681770Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:32.8682101Z mm 0.0108 ms 100.0% 2025-09-07T11:09:32.8682758Z triton_mm_1035 0.0109 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:32.8683823Z triton_mm_1039 0.0120 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:32.8684866Z triton_mm_1043 0.0136 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:32.8686085Z triton_mm_1049 0.0176 ms 61.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:32.8687062Z triton_mm_1034 0.0178 ms 60.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:32.8688019Z triton_mm_1033 0.0179 ms 60.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:32.8688947Z triton_mm_1032 0.0188 ms 57.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:32.8689894Z triton_mm_1038 0.0188 ms 57.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:32.8691076Z triton_mm_1042 0.0192 ms 56.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:32.8692065Z SingleProcess AUTOTUNE benchmarking takes 0.2416 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:33.1152745Z Autotune Choices Stats: 2025-09-07T11:09:33.1153894Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.010912000201642513, "best_triton_pos": 1, "best_triton_time": 0.010975999757647514, "best_triton_kernel": "triton_mm_1068", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:09:33.1179457Z AUTOTUNE mm(512x2048, 2048x448) 2025-09-07T11:09:33.1179854Z strides: [2048, 1], [1, 2048] 2025-09-07T11:09:33.1180176Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:33.1180561Z mm 0.0109 ms 100.0% 2025-09-07T11:09:33.1181260Z triton_mm_1068 0.0110 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:33.1182418Z triton_mm_1072 0.0120 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:33.1183499Z triton_mm_1076 0.0137 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:33.1184512Z triton_mm_1082 0.0176 ms 62.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:33.1185632Z triton_mm_1067 0.0178 ms 61.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:33.1186588Z triton_mm_1066 0.0183 ms 59.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:33.1187537Z triton_mm_1071 0.0190 ms 57.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:33.1188501Z triton_mm_1075 0.0191 ms 57.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:33.1189449Z triton_mm_1065 0.0192 ms 56.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:33.1190299Z SingleProcess AUTOTUNE benchmarking takes 0.2441 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:33.3656314Z Autotune Choices Stats: 2025-09-07T11:09:33.3657645Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01033599954098463, "best_triton_pos": 1, "best_triton_time": 0.010816000401973724, "best_triton_kernel": "triton_mm_1108", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:09:33.3681999Z AUTOTUNE mm(512x2048, 2048x192) 2025-09-07T11:09:33.3682422Z strides: [2048, 1], [1, 2048] 2025-09-07T11:09:33.3682778Z dtypes: torch.float16, torch.float16 2025-09-07T11:09:33.3683080Z mm 0.0103 ms 100.0% 2025-09-07T11:09:33.3683745Z triton_mm_1108 0.0108 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:33.3685353Z triton_mm_1112 0.0117 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:33.3686989Z triton_mm_1116 0.0131 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:33.3688067Z triton_mm_1107 0.0171 ms 60.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:33.3689121Z triton_mm_1122 0.0174 ms 59.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:33.3690204Z triton_mm_1106 0.0175 ms 59.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:33.3691240Z triton_mm_1105 0.0185 ms 56.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:33.3692297Z triton_mm_1111 0.0185 ms 55.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:33.3693346Z triton_mm_1115 0.0187 ms 55.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:33.3694287Z SingleProcess AUTOTUNE benchmarking takes 0.2427 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:10:35.9848598Z W0907 11:10:35.983000 43059 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T11:11:22.8602019Z pass 2025-09-07T11:11:31.0181822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:11:31.0183142Z import pynvml # type: ignore[import] 2025-09-07T11:11:34.0303872Z 2025-09-07T11:11:35.8863610Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:11:35.8863981Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:11:35.8864281Z cuda train jx_nest_base 2025-09-07T11:12:13.5230223Z Autotune Choices Stats: 2025-09-07T11:12:13.5231551Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.023520000278949738, "best_triton_pos": 1, "best_triton_time": 0.024960000067949295, "best_triton_kernel": "triton_mm_87", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:12:13.5257785Z AUTOTUNE addmm(25088x512, 25088x128, 128x512) 2025-09-07T11:12:13.5258125Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T11:12:13.5258449Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:12:13.5258825Z bias_addmm 0.0235 ms 100.0% 2025-09-07T11:12:13.5259475Z triton_mm_87 0.0250 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:13.5260465Z triton_mm_95 0.0251 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:13.5261527Z triton_mm_94 0.0252 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:13.5262507Z triton_mm_91 0.0252 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:13.5264155Z triton_mm_89 0.0259 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:13.5265497Z triton_mm_88 0.0263 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:13.5266462Z triton_mm_92 0.0263 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:13.5267412Z triton_mm_84 0.0267 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:13.5268366Z triton_mm_90 0.0268 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:12:13.5269241Z SingleProcess AUTOTUNE benchmarking takes 0.2862 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T11:12:14.1088257Z Autotune Choices Stats: 2025-09-07T11:12:14.1089624Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.01583999954164028, "best_triton_pos": 1, "best_triton_time": 0.017920000478625298, "best_triton_kernel": "triton_mm_314", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:12:14.1116238Z AUTOTUNE addmm(6272x1024, 6272x256, 256x1024) 2025-09-07T11:12:14.1116527Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T11:12:14.1116827Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:12:14.1117141Z bias_addmm 0.0158 ms 100.0% 2025-09-07T11:12:14.1117762Z triton_mm_314 0.0179 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:14.1118744Z triton_mm_320 0.0181 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:14.1119826Z triton_mm_319 0.0193 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:14.1120941Z triton_mm_317 0.0203 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:14.1121899Z triton_mm_316 0.0204 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:14.1122861Z triton_mm_313 0.0206 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:14.1123959Z triton_mm_312 0.0208 ms 76.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:14.1124930Z triton_mm_321 0.0211 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:12:14.1126286Z triton_mm_310 0.0219 ms 72.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:12:14.1127151Z SingleProcess AUTOTUNE benchmarking takes 0.2801 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T11:12:14.4789814Z Autotune Choices Stats: 2025-09-07T11:12:14.4818079Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_6", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=4, KERNEL_W=4, PADDING_H=0, PADDING_W=0, STRIDE_H=4, STRIDE_W=4, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.01724799908697605, "best_triton_pos": 0} 2025-09-07T11:12:14.4819291Z AUTOTUNE convolution(8x3x224x224, 128x3x4x4) 2025-09-07T11:12:14.4819643Z strides: [150528, 50176, 224, 1], [48, 16, 4, 1] 2025-09-07T11:12:14.4819959Z dtypes: torch.float16, torch.float16 2025-09-07T11:12:14.4820743Z triton_convolution2d_6 0.0172 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=4, KERNEL_W=4, PADDING_H=0, PADDING_W=0, STRIDE_H=4, STRIDE_W=4, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:12:14.4822105Z triton_convolution2d_1 0.0175 ms 98.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=4, KERNEL_W=4, PADDING_H=0, PADDING_W=0, STRIDE_H=4, STRIDE_W=4, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:12:14.4823362Z triton_convolution2d_0 0.0177 ms 97.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=4, KERNEL_W=4, PADDING_H=0, PADDING_W=0, STRIDE_H=4, STRIDE_W=4, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:12:14.4824594Z triton_convolution2d_3 0.0179 ms 96.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=4, KERNEL_W=4, PADDING_H=0, PADDING_W=0, STRIDE_H=4, STRIDE_W=4, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:12:14.4826219Z triton_convolution2d_5 0.0188 ms 91.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=4, KERNEL_W=4, PADDING_H=0, PADDING_W=0, STRIDE_H=4, STRIDE_W=4, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:12:14.4827441Z triton_convolution2d_4 0.0191 ms 90.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=4, KERNEL_W=4, PADDING_H=0, PADDING_W=0, STRIDE_H=4, STRIDE_W=4, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:12:14.4828197Z convolution 0.0273 ms 63.3% 2025-09-07T11:12:14.4828943Z triton_convolution2d_2 0.0596 ms 28.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=4, KERNEL_W=4, PADDING_H=0, PADDING_W=0, STRIDE_H=4, STRIDE_W=4, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:12:14.4829948Z SingleProcess AUTOTUNE benchmarking takes 0.1043 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:12:15.0362170Z Autotune Choices Stats: 2025-09-07T11:12:15.0363553Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.013856000266969204, "best_triton_pos": 1, "best_triton_time": 0.014592000283300877, "best_triton_kernel": "triton_mm_545", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:12:15.0393279Z AUTOTUNE addmm(1568x2048, 1568x512, 512x2048) 2025-09-07T11:12:15.0393616Z strides: [0, 1], [512, 1], [1, 512] 2025-09-07T11:12:15.0393922Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:12:15.0394237Z bias_addmm 0.0139 ms 100.0% 2025-09-07T11:12:15.0394877Z triton_mm_545 0.0146 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:15.0396003Z triton_mm_539 0.0160 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:15.0396980Z triton_mm_544 0.0163 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:15.0397968Z triton_mm_537 0.0164 ms 84.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:15.0399286Z triton_mm_541 0.0176 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:15.0400479Z triton_mm_546 0.0180 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:12:15.0401400Z triton_mm_535 0.0182 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:12:15.0402283Z triton_mm_534 0.0194 ms 71.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:15.0403165Z triton_mm_538 0.0196 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:15.0403947Z SingleProcess AUTOTUNE benchmarking takes 0.2818 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T11:12:18.4799900Z Autotune Choices Stats: 2025-09-07T11:12:18.4801787Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.015615999698638916, "best_triton_pos": 1, "best_triton_time": 0.018432000651955605, "best_triton_kernel": "triton_mm_16", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:12:18.4832545Z AUTOTUNE mm(25088x128, 128x384) 2025-09-07T11:12:18.4832961Z strides: [128, 1], [1, 128] 2025-09-07T11:12:18.4833340Z dtypes: torch.float16, torch.float16 2025-09-07T11:12:18.4833723Z mm 0.0156 ms 100.0% 2025-09-07T11:12:18.4834575Z triton_mm_16 0.0184 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:18.4836661Z triton_mm_23 0.0185 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:18.4838084Z triton_mm_20 0.0187 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:18.4839472Z triton_mm_18 0.0199 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:18.4840847Z triton_mm_17 0.0207 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:18.4842217Z triton_mm_24 0.0207 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:18.4843618Z triton_mm_21 0.0209 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:18.4845224Z triton_mm_19 0.0220 ms 71.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:12:18.4846615Z triton_mm_13 0.0223 ms 69.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:18.4847845Z SingleProcess AUTOTUNE benchmarking takes 0.2624 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:12:18.8152572Z Autotune Choices Stats: 2025-09-07T11:12:18.8153681Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_32", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.0724480003118515, "best_triton_pos": 0} 2025-09-07T11:12:18.8182868Z AUTOTUNE bmm(512x196x32, 512x32x196) 2025-09-07T11:12:18.8183163Z strides: [6272, 32, 1], [6272, 196, 1] 2025-09-07T11:12:18.8183444Z dtypes: torch.float32, torch.float32 2025-09-07T11:12:18.8184079Z triton_bmm_32 0.0724 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:18.8185220Z triton_bmm_34 0.0726 ms 99.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:12:18.8186197Z triton_bmm_27 0.0784 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:18.8187165Z triton_bmm_31 0.0810 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:18.8188122Z triton_bmm_33 0.0822 ms 88.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:12:18.8189077Z triton_bmm_36 0.0826 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:18.8190031Z triton_bmm_29 0.0849 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:12:18.8191028Z triton_bmm_28 0.0850 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:12:18.8192070Z triton_bmm_39 0.0850 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:18.8193118Z triton_bmm_30 0.0863 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:12:18.8194029Z SingleProcess AUTOTUNE benchmarking takes 0.3334 seconds and 0.0003 seconds precompiling for 18 choices 2025-09-07T11:12:19.1407931Z Autotune Choices Stats: 2025-09-07T11:12:19.1409041Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "bmm", "best_time": 0.06905599683523178, "best_triton_pos": 1, "best_triton_time": 0.0764480009675026, "best_triton_kernel": "triton_bmm_50", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:12:19.1438691Z AUTOTUNE bmm(512x196x196, 512x196x32) 2025-09-07T11:12:19.1438971Z strides: [38416, 196, 1], [6272, 32, 1] 2025-09-07T11:12:19.1439254Z dtypes: torch.float32, torch.float32 2025-09-07T11:12:19.1439507Z bmm 0.0691 ms 100.0% 2025-09-07T11:12:19.1440064Z triton_bmm_50 0.0764 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:19.1440971Z triton_bmm_45 0.0765 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:12:19.1441885Z triton_bmm_43 0.0778 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:12:19.1442772Z triton_bmm_48 0.0783 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:19.1444227Z triton_bmm_54 0.0786 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:19.1445595Z triton_bmm_44 0.0803 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:19.1446517Z triton_bmm_55 0.0809 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:19.1447414Z triton_bmm_56 0.0812 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:12:19.1448319Z triton_bmm_51 0.0812 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:19.1449109Z SingleProcess AUTOTUNE benchmarking takes 0.3249 seconds and 0.0003 seconds precompiling for 17 choices 2025-09-07T11:12:19.4023128Z Autotune Choices Stats: 2025-09-07T11:12:19.4024248Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_70", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.01065600011497736, "best_triton_pos": 0} 2025-09-07T11:12:19.4056955Z AUTOTUNE mm(25088x128, 128x128) 2025-09-07T11:12:19.4057261Z strides: [128, 1], [1, 128] 2025-09-07T11:12:19.4057525Z dtypes: torch.float16, torch.float16 2025-09-07T11:12:19.4058224Z triton_mm_70 0.0107 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:19.4059344Z triton_mm_72 0.0114 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:19.4060472Z triton_mm_75 0.0114 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:19.4061133Z mm 0.0114 ms 93.3% 2025-09-07T11:12:19.4061877Z triton_mm_76 0.0115 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:19.4062906Z triton_mm_69 0.0116 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:19.4063929Z triton_mm_65 0.0117 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:19.4065151Z triton_mm_73 0.0117 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:19.4066184Z triton_mm_68 0.0120 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:19.4067215Z triton_mm_71 0.0123 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:12:19.4068124Z SingleProcess AUTOTUNE benchmarking takes 0.2603 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:12:19.6699413Z Autotune Choices Stats: 2025-09-07T11:12:19.6701214Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_108", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.019999999552965164, "best_triton_pos": 0} 2025-09-07T11:12:19.6731199Z AUTOTUNE mm(25088x512, 512x128) 2025-09-07T11:12:19.6731590Z strides: [512, 1], [1, 512] 2025-09-07T11:12:19.6731938Z dtypes: torch.float16, torch.float16 2025-09-07T11:12:19.6732864Z triton_mm_108 0.0200 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:19.6733774Z mm 0.0206 ms 96.9% 2025-09-07T11:12:19.6734594Z triton_mm_114 0.0207 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:19.6736444Z triton_mm_115 0.0228 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:12:19.6737878Z triton_mm_109 0.0231 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:12:19.6739307Z triton_mm_104 0.0236 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:12:19.6740694Z triton_mm_113 0.0241 ms 82.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:19.6742191Z triton_mm_107 0.0242 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:19.6743586Z triton_mm_106 0.0249 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:19.6745144Z triton_mm_110 0.0249 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:19.6746363Z SingleProcess AUTOTUNE benchmarking takes 0.2659 seconds and 0.0004 seconds precompiling for 20 choices 2025-09-07T11:12:19.8379013Z Autotune Choices Stats: 2025-09-07T11:12:19.8380516Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.03177599981427193, "best_triton_pos": 1, "best_triton_time": 0.06345599889755249, "best_triton_kernel": "triton_convolution2d_228", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T11:12:19.8409328Z AUTOTUNE convolution(8x128x56x56, 256x128x3x3) 2025-09-07T11:12:19.8409686Z strides: [401408, 1, 7168, 128], [1152, 1, 384, 128] 2025-09-07T11:12:19.8410013Z dtypes: torch.float16, torch.float16 2025-09-07T11:12:19.8410316Z convolution 0.0318 ms 100.0% 2025-09-07T11:12:19.8411132Z triton_convolution2d_228 0.0635 ms 50.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:12:19.8412436Z triton_convolution2d_225 0.0640 ms 49.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:12:19.8413676Z triton_convolution2d_229 0.0741 ms 42.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:12:19.8416236Z triton_convolution2d_230 0.0753 ms 42.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:12:19.8417473Z triton_convolution2d_231 0.0804 ms 39.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:12:19.8418711Z triton_convolution2d_226 0.1066 ms 29.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:12:19.8419935Z triton_convolution2d_227 0.3465 ms 9.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:12:19.8420929Z SingleProcess AUTOTUNE benchmarking takes 0.1606 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:12:20.0967244Z Autotune Choices Stats: 2025-09-07T11:12:20.0968543Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.012959999963641167, "best_triton_pos": 1, "best_triton_time": 0.01398400031030178, "best_triton_kernel": "triton_mm_248", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:12:20.0999544Z AUTOTUNE mm(6272x256, 256x768) 2025-09-07T11:12:20.0999824Z strides: [256, 1], [1, 256] 2025-09-07T11:12:20.1000088Z dtypes: torch.float16, torch.float16 2025-09-07T11:12:20.1000373Z mm 0.0130 ms 100.0% 2025-09-07T11:12:20.1001010Z triton_mm_248 0.0140 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:20.1002338Z triton_mm_243 0.0150 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:20.1003335Z triton_mm_241 0.0155 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:20.1004323Z triton_mm_249 0.0157 ms 82.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:20.1005802Z triton_mm_245 0.0159 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:20.1006818Z triton_mm_242 0.0160 ms 80.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:20.1007805Z triton_mm_246 0.0162 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:20.1008785Z triton_mm_239 0.0174 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:12:20.1009805Z triton_mm_250 0.0180 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:12:20.1010696Z SingleProcess AUTOTUNE benchmarking takes 0.2575 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:12:20.3040257Z Autotune Choices Stats: 2025-09-07T11:12:20.3041349Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_257", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.04054399952292442, "best_triton_pos": 0} 2025-09-07T11:12:20.3072798Z AUTOTUNE bmm(256x196x32, 256x32x196) 2025-09-07T11:12:20.3073309Z strides: [6272, 32, 1], [6272, 196, 1] 2025-09-07T11:12:20.3073591Z dtypes: torch.float32, torch.float32 2025-09-07T11:12:20.3074221Z triton_bmm_257 0.0405 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:20.3075561Z triton_bmm_259 0.0409 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:12:20.3076529Z triton_bmm_252 0.0422 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:20.3077490Z triton_bmm_258 0.0448 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:12:20.3078447Z triton_bmm_256 0.0451 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:20.3079399Z triton_bmm_254 0.0456 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:12:20.3080350Z triton_bmm_253 0.0456 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:12:20.3081317Z triton_bmm_261 0.0458 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:20.3082268Z triton_bmm_255 0.0464 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:12:20.3083168Z triton_bmm_264 0.0477 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:20.3083953Z SingleProcess AUTOTUNE benchmarking takes 0.2058 seconds and 0.0003 seconds precompiling for 18 choices 2025-09-07T11:12:20.5081270Z Autotune Choices Stats: 2025-09-07T11:12:20.5083086Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "bmm", "best_time": 0.04310400038957596, "best_triton_pos": 1, "best_triton_time": 0.04390399903059006, "best_triton_kernel": "triton_bmm_268", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2"} 2025-09-07T11:12:20.5115225Z AUTOTUNE bmm(256x196x196, 256x196x32) 2025-09-07T11:12:20.5115486Z strides: [38416, 196, 1], [6272, 32, 1] 2025-09-07T11:12:20.5115721Z dtypes: torch.float32, torch.float32 2025-09-07T11:12:20.5115972Z bmm 0.0431 ms 100.0% 2025-09-07T11:12:20.5116481Z triton_bmm_268 0.0439 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:12:20.5117273Z triton_bmm_279 0.0447 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:20.5118111Z triton_bmm_276 0.0455 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:20.5118891Z triton_bmm_270 0.0465 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:12:20.5120334Z triton_bmm_273 0.0467 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:20.5121131Z triton_bmm_269 0.0468 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:20.5121930Z triton_bmm_281 0.0478 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:12:20.5122713Z triton_bmm_271 0.0482 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:12:20.5123484Z triton_bmm_272 0.0499 ms 86.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:20.5124174Z SingleProcess AUTOTUNE benchmarking takes 0.2035 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T11:12:20.7563779Z Autotune Choices Stats: 2025-09-07T11:12:20.7565479Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_295", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.009727999567985535, "best_triton_pos": 0} 2025-09-07T11:12:20.7596880Z AUTOTUNE mm(6272x256, 256x256) 2025-09-07T11:12:20.7597160Z strides: [256, 1], [1, 256] 2025-09-07T11:12:20.7597379Z dtypes: torch.float16, torch.float16 2025-09-07T11:12:20.7597938Z triton_mm_295 0.0097 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:20.7598799Z triton_mm_291 0.0103 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:12:20.7599589Z triton_mm_298 0.0103 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:20.7600358Z triton_mm_294 0.0104 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:20.7600845Z mm 0.0105 ms 92.4% 2025-09-07T11:12:20.7601310Z triton_mm_302 0.0105 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:12:20.7602089Z triton_mm_301 0.0107 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:20.7602864Z triton_mm_293 0.0108 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:20.7603710Z triton_mm_297 0.0108 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:20.7604496Z triton_mm_300 0.0111 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:20.7605422Z SingleProcess AUTOTUNE benchmarking takes 0.2467 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:12:21.0108421Z Autotune Choices Stats: 2025-09-07T11:12:21.0110199Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01369599997997284, "best_triton_pos": 1, "best_triton_time": 0.015647999942302704, "best_triton_kernel": "triton_mm_340", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:12:21.0141366Z AUTOTUNE mm(6272x1024, 1024x256) 2025-09-07T11:12:21.0141742Z strides: [1024, 1], [1, 1024] 2025-09-07T11:12:21.0142017Z dtypes: torch.float16, torch.float16 2025-09-07T11:12:21.0142296Z mm 0.0137 ms 100.0% 2025-09-07T11:12:21.0142910Z triton_mm_340 0.0156 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:12:21.0143950Z triton_mm_333 0.0165 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:21.0145407Z triton_mm_329 0.0171 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:12:21.0146387Z triton_mm_339 0.0178 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:21.0147383Z triton_mm_334 0.0188 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:12:21.0148370Z triton_mm_332 0.0193 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:21.0149325Z triton_mm_336 0.0194 ms 70.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:21.0150282Z triton_mm_330 0.0198 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:12:21.0151310Z triton_mm_331 0.0222 ms 61.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:21.0152321Z SingleProcess AUTOTUNE benchmarking takes 0.2530 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:12:21.1874891Z Autotune Choices Stats: 2025-09-07T11:12:21.1877092Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.03356799855828285, "best_triton_pos": 1, "best_triton_time": 0.06022400036454201, "best_triton_kernel": "triton_convolution2d_454", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T11:12:21.1913385Z AUTOTUNE convolution(8x256x28x28, 512x256x3x3) 2025-09-07T11:12:21.1913914Z strides: [200704, 1, 7168, 256], [2304, 1, 768, 256] 2025-09-07T11:12:21.1914337Z dtypes: torch.float16, torch.float16 2025-09-07T11:12:21.1914762Z convolution 0.0336 ms 100.0% 2025-09-07T11:12:21.1916049Z triton_convolution2d_454 0.0602 ms 55.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:12:21.1917813Z triton_convolution2d_453 0.0708 ms 47.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:12:21.1919579Z triton_convolution2d_455 0.0761 ms 44.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:12:21.1922350Z triton_convolution2d_456 0.0771 ms 43.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:12:21.1924115Z triton_convolution2d_451 0.1021 ms 32.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:12:21.1926085Z triton_convolution2d_450 0.1060 ms 31.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:12:21.1927838Z triton_convolution2d_452 0.3703 ms 9.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:12:21.1929206Z SingleProcess AUTOTUNE benchmarking takes 0.1681 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:12:21.4487021Z Autotune Choices Stats: 2025-09-07T11:12:21.4488396Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.012160000391304493, "best_triton_pos": 1, "best_triton_time": 0.012671999633312225, "best_triton_kernel": "triton_mm_468", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:12:21.4752692Z AUTOTUNE mm(1568x512, 512x1536) 2025-09-07T11:12:21.4753133Z strides: [512, 1], [1, 512] 2025-09-07T11:12:21.4753489Z dtypes: torch.float16, torch.float16 2025-09-07T11:12:21.4753849Z mm 0.0122 ms 100.0% 2025-09-07T11:12:21.4754679Z triton_mm_468 0.0127 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:21.4756785Z triton_mm_474 0.0136 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:21.4758387Z triton_mm_466 0.0144 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:21.4760045Z triton_mm_470 0.0146 ms 83.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:21.4761796Z triton_mm_467 0.0148 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:21.4763383Z triton_mm_471 0.0149 ms 81.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:21.4764753Z triton_mm_473 0.0150 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:21.4766321Z triton_mm_464 0.0153 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:12:21.4767579Z triton_mm_475 0.0161 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:12:21.4768755Z SingleProcess AUTOTUNE benchmarking takes 0.2821 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:12:21.6477533Z Autotune Choices Stats: 2025-09-07T11:12:21.6478663Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_484", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.024831999093294144, "best_triton_pos": 0} 2025-09-07T11:12:21.6606816Z AUTOTUNE bmm(128x196x32, 128x32x196) 2025-09-07T11:12:21.6607292Z strides: [6272, 32, 1], [6272, 196, 1] 2025-09-07T11:12:21.6607707Z dtypes: torch.float32, torch.float32 2025-09-07T11:12:21.6608643Z triton_bmm_484 0.0248 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:12:21.6610113Z triton_bmm_477 0.0249 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:21.6611536Z triton_bmm_482 0.0249 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:21.6612983Z triton_bmm_483 0.0263 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:12:21.6614431Z triton_bmm_479 0.0265 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:12:21.6616357Z triton_bmm_478 0.0267 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:12:21.6617766Z triton_bmm_480 0.0271 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:12:21.6619185Z triton_bmm_486 0.0271 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:21.6620619Z triton_bmm_481 0.0276 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:21.6622128Z triton_bmm_476 0.0277 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:12:21.6623415Z SingleProcess AUTOTUNE benchmarking takes 0.1837 seconds and 0.0004 seconds precompiling for 18 choices 2025-09-07T11:12:21.8874238Z Autotune Choices Stats: 2025-09-07T11:12:21.8876077Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "bmm", "best_time": 0.025087999179959297, "best_triton_pos": 1, "best_triton_time": 0.025087999179959297, "best_triton_kernel": "triton_bmm_504", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:12:21.8909145Z AUTOTUNE bmm(128x196x196, 128x196x32) 2025-09-07T11:12:21.8909452Z strides: [38432, 196, 1], [6272, 32, 1] 2025-09-07T11:12:21.8909753Z dtypes: torch.float32, torch.float32 2025-09-07T11:12:21.8910016Z bmm 0.0251 ms 100.0% 2025-09-07T11:12:21.8910635Z triton_bmm_504 0.0251 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:21.8911653Z triton_bmm_500 0.0253 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:21.8912725Z triton_bmm_498 0.0266 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:21.8913813Z triton_bmm_505 0.0267 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:21.8915707Z triton_bmm_501 0.0282 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:21.8916757Z triton_bmm_493 0.0285 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:12:21.8917810Z triton_bmm_506 0.0286 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:12:21.8918878Z triton_bmm_497 0.0293 ms 85.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:21.8919933Z triton_bmm_495 0.0300 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:12:21.8920849Z SingleProcess AUTOTUNE benchmarking takes 0.2296 seconds and 0.0003 seconds precompiling for 17 choices 2025-09-07T11:12:22.1387573Z Autotune Choices Stats: 2025-09-07T11:12:22.1389055Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_521", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.009279999881982803, "best_triton_pos": 0} 2025-09-07T11:12:22.1422101Z AUTOTUNE mm(1568x512, 512x512) 2025-09-07T11:12:22.1422507Z strides: [512, 1], [1, 512] 2025-09-07T11:12:22.1422872Z dtypes: torch.float16, torch.float16 2025-09-07T11:12:22.1423821Z triton_mm_521 0.0093 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:12:22.1424782Z mm 0.0094 ms 99.0% 2025-09-07T11:12:22.1426100Z triton_mm_520 0.0101 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:22.1427504Z triton_mm_516 0.0102 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:12:22.1428916Z triton_mm_527 0.0105 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:12:22.1430322Z triton_mm_519 0.0108 ms 86.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:22.1431737Z triton_mm_526 0.0109 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:22.1433165Z triton_mm_523 0.0109 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:22.1434573Z triton_mm_517 0.0112 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:12:22.1436150Z triton_mm_518 0.0123 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:22.1437370Z SingleProcess AUTOTUNE benchmarking takes 0.2498 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:12:22.3960903Z Autotune Choices Stats: 2025-09-07T11:12:22.3997903Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.014240000396966934, "best_triton_pos": 1, "best_triton_time": 0.014688000082969666, "best_triton_kernel": "triton_mm_559", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T11:12:22.3999260Z AUTOTUNE mm(1568x2048, 2048x512) 2025-09-07T11:12:22.3999535Z strides: [2048, 1], [1, 2048] 2025-09-07T11:12:22.3999807Z dtypes: torch.float16, torch.float16 2025-09-07T11:12:22.4000093Z mm 0.0142 ms 100.0% 2025-09-07T11:12:22.4000759Z triton_mm_559 0.0147 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:12:22.4001843Z triton_mm_565 0.0183 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:12:22.4002905Z triton_mm_555 0.0194 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:12:22.4003869Z triton_mm_558 0.0207 ms 68.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:22.4004830Z triton_mm_554 0.0209 ms 68.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:12:22.4006229Z triton_mm_564 0.0225 ms 63.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:22.4007196Z triton_mm_557 0.0241 ms 59.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:22.4008200Z triton_mm_561 0.0242 ms 58.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:22.4009170Z triton_mm_551 0.0270 ms 52.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:12:22.4010014Z SingleProcess AUTOTUNE benchmarking takes 0.2559 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:12:22.8110081Z Autotune Choices Stats: 2025-09-07T11:12:22.8111129Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_2641", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.00800000037997961, "best_triton_pos": 0} 2025-09-07T11:12:22.8147784Z AUTOTUNE addmm(8x1000, 8x512, 512x1000) 2025-09-07T11:12:22.8148075Z strides: [0, 1], [512, 1], [1, 512] 2025-09-07T11:12:22.8148391Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:12:22.8149123Z triton_mm_2641 0.0080 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:12:22.8150165Z triton_mm_2645 0.0081 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:12:22.8151154Z triton_mm_2653 0.0090 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:12:22.8152264Z triton_mm_2640 0.0090 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:12:22.8153555Z triton_mm_2639 0.0092 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:12:22.8154717Z triton_mm_2644 0.0092 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:22.8156123Z triton_mm_2649 0.0095 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:12:22.8157092Z triton_mm_2638 0.0096 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:12:22.8158066Z triton_mm_2651 0.0100 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:12:22.8159036Z triton_mm_2648 0.0100 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:22.8159885Z SingleProcess AUTOTUNE benchmarking takes 0.2548 seconds and 0.0003 seconds precompiling for 19 choices 2025-09-07T11:12:59.0107882Z Autotune Choices Stats: 2025-09-07T11:12:59.0109233Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01894400082528591, "best_triton_pos": 1, "best_triton_time": 0.021695999428629875, "best_triton_kernel": "triton_mm_7448", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:12:59.0142691Z AUTOTUNE mm(25088x128, 128x512) 2025-09-07T11:12:59.0143032Z strides: [128, 1], [512, 1] 2025-09-07T11:12:59.0143304Z dtypes: torch.float16, torch.float16 2025-09-07T11:12:59.0143608Z mm 0.0189 ms 100.0% 2025-09-07T11:12:59.0144275Z triton_mm_7448 0.0217 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:59.0147164Z triton_mm_7455 0.0220 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:59.0148190Z triton_mm_7456 0.0226 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:59.0149170Z triton_mm_7452 0.0227 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:59.0150173Z triton_mm_7450 0.0231 ms 82.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:59.0151156Z triton_mm_7449 0.0239 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:59.0152203Z triton_mm_7445 0.0243 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:12:59.0153266Z triton_mm_7453 0.0243 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:59.0154240Z triton_mm_7451 0.0245 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:12:59.0155223Z SingleProcess AUTOTUNE benchmarking takes 0.2369 seconds and 0.0006 seconds precompiling for 20 choices 2025-09-07T11:12:59.5699398Z Autotune Choices Stats: 2025-09-07T11:12:59.5701179Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.013919999822974205, "best_triton_pos": 1, "best_triton_time": 0.015552000142633915, "best_triton_kernel": "triton_mm_7024", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:12:59.5735247Z AUTOTUNE mm(6272x256, 256x1024) 2025-09-07T11:12:59.5735535Z strides: [256, 1], [1024, 1] 2025-09-07T11:12:59.5735807Z dtypes: torch.float16, torch.float16 2025-09-07T11:12:59.5736074Z mm 0.0139 ms 100.0% 2025-09-07T11:12:59.5736694Z triton_mm_7024 0.0156 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:59.5737697Z triton_mm_7018 0.0158 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:59.5738707Z triton_mm_7023 0.0172 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:59.5739698Z triton_mm_7016 0.0174 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:59.5740704Z triton_mm_7025 0.0178 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:12:59.5741793Z triton_mm_7017 0.0178 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:59.5742885Z triton_mm_7021 0.0180 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:12:59.5743788Z triton_mm_7020 0.0182 ms 76.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:12:59.5744691Z triton_mm_7014 0.0196 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:12:59.5745606Z SingleProcess AUTOTUNE benchmarking takes 0.2231 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:13:00.0993871Z Autotune Choices Stats: 2025-09-07T11:13:00.0995296Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.013311999849975109, "best_triton_pos": 1, "best_triton_time": 0.013376000337302685, "best_triton_kernel": "triton_mm_2704", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:13:00.1030984Z AUTOTUNE mm(1568x512, 512x2048) 2025-09-07T11:13:00.1031308Z strides: [512, 1], [2048, 1] 2025-09-07T11:13:00.1031584Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:00.1031858Z mm 0.0133 ms 100.0% 2025-09-07T11:13:00.1032588Z triton_mm_2704 0.0134 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:00.1033765Z triton_mm_2703 0.0144 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:00.1034840Z triton_mm_2696 0.0145 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:00.1036491Z triton_mm_2698 0.0152 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:00.1037793Z triton_mm_2700 0.0155 ms 86.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:00.1038868Z triton_mm_2705 0.0157 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:00.1039925Z triton_mm_2697 0.0162 ms 82.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:00.1040969Z triton_mm_2701 0.0171 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:00.1042021Z triton_mm_2694 0.0177 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:13:00.1042944Z SingleProcess AUTOTUNE benchmarking takes 0.2184 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:13:01.0022206Z Autotune Choices Stats: 2025-09-07T11:13:01.0023540Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.012095999903976917, "best_triton_pos": 1, "best_triton_time": 0.014240000396966934, "best_triton_kernel": "triton_mm_2718", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T11:13:01.0058902Z AUTOTUNE mm(512x1568, 1568x2048) 2025-09-07T11:13:01.0059192Z strides: [1, 512], [2048, 1] 2025-09-07T11:13:01.0059579Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:01.0059862Z mm 0.0121 ms 100.0% 2025-09-07T11:13:01.0060497Z triton_mm_2718 0.0142 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:01.0061614Z triton_mm_2724 0.0175 ms 69.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:01.0062636Z triton_mm_2714 0.0176 ms 68.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:01.0063770Z triton_mm_2713 0.0180 ms 67.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:13:01.0064741Z triton_mm_2717 0.0180 ms 67.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:01.0065914Z triton_mm_2720 0.0183 ms 66.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:01.0066887Z triton_mm_2716 0.0185 ms 65.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:01.0067866Z triton_mm_2723 0.0197 ms 61.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:01.0068848Z triton_mm_2715 0.0236 ms 51.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:01.0070065Z SingleProcess AUTOTUNE benchmarking takes 0.2288 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:13:01.4316867Z Autotune Choices Stats: 2025-09-07T11:13:01.4318676Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.012319999746978283, "best_triton_pos": 1, "best_triton_time": 0.013919999822974205, "best_triton_kernel": "triton_mm_2756", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T11:13:01.4346295Z AUTOTUNE mm(2048x1568, 1568x512) 2025-09-07T11:13:01.4346570Z strides: [1, 2048], [512, 1] 2025-09-07T11:13:01.4346846Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:01.4347156Z mm 0.0123 ms 100.0% 2025-09-07T11:13:01.4347808Z triton_mm_2756 0.0139 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:01.4348858Z triton_mm_2762 0.0175 ms 70.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:01.4349864Z triton_mm_2752 0.0176 ms 70.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:01.4350855Z triton_mm_2755 0.0178 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:01.4351827Z triton_mm_2751 0.0180 ms 68.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:13:01.4352877Z triton_mm_2754 0.0180 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:01.4353887Z triton_mm_2758 0.0185 ms 66.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:01.4354862Z triton_mm_2761 0.0196 ms 62.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:01.4356001Z triton_mm_2753 0.0233 ms 53.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:01.4356848Z SingleProcess AUTOTUNE benchmarking takes 0.2279 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:13:02.0749297Z Autotune Choices Stats: 2025-09-07T11:13:02.0750671Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.012191999703645706, "best_triton_pos": 1, "best_triton_time": 0.013919999822974205, "best_triton_kernel": "triton_mm_2896", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T11:13:02.0786497Z AUTOTUNE mm(1536x1568, 1568x512) 2025-09-07T11:13:02.0786768Z strides: [1, 1536], [512, 1] 2025-09-07T11:13:02.0787053Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:02.0787327Z mm 0.0122 ms 100.0% 2025-09-07T11:13:02.0787962Z triton_mm_2896 0.0139 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:02.0788987Z triton_mm_2892 0.0171 ms 71.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:02.0789992Z triton_mm_2891 0.0175 ms 69.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:13:02.0791608Z triton_mm_2902 0.0176 ms 69.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:02.0792598Z triton_mm_2895 0.0177 ms 69.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:02.0793733Z triton_mm_2894 0.0180 ms 67.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:02.0794701Z triton_mm_2898 0.0181 ms 67.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:02.0796055Z triton_mm_2901 0.0195 ms 62.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:02.0797045Z triton_mm_2888 0.0215 ms 56.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:02.0797889Z SingleProcess AUTOTUNE benchmarking takes 0.2255 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:13:02.6161509Z Autotune Choices Stats: 2025-09-07T11:13:02.6162535Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_2677", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.006047999951988459, "best_triton_pos": 0} 2025-09-07T11:13:02.6201422Z AUTOTUNE mm(1000x8, 8x512) 2025-09-07T11:13:02.6201694Z strides: [1, 1000], [512, 1] 2025-09-07T11:13:02.6201988Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:02.6202654Z triton_mm_2677 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:13:02.6203788Z triton_mm_2675 0.0061 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:02.6204907Z triton_mm_2676 0.0061 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:02.6206391Z triton_mm_2678 0.0061 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:02.6207430Z triton_mm_2673 0.0061 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:02.6208434Z triton_mm_2674 0.0061 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:02.6209417Z triton_mm_2672 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:02.6210431Z triton_mm_2681 0.0062 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:02.6211447Z triton_mm_2679 0.0063 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:02.6212435Z triton_mm_2671 0.0063 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:13:02.6213868Z SingleProcess AUTOTUNE benchmarking takes 0.1599 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T11:13:03.0267783Z Autotune Choices Stats: 2025-09-07T11:13:03.0269100Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01027199998497963, "best_triton_pos": 1, "best_triton_time": 0.010751999914646149, "best_triton_kernel": "triton_mm_2790", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:13:03.0304818Z AUTOTUNE mm(512x1568, 1568x512) 2025-09-07T11:13:03.0307559Z strides: [1, 512], [512, 1] 2025-09-07T11:13:03.0307856Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:03.0308133Z mm 0.0103 ms 100.0% 2025-09-07T11:13:03.0308749Z triton_mm_2790 0.0108 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:03.0309797Z triton_mm_2786 0.0109 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:03.0310770Z triton_mm_2794 0.0128 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:03.0311733Z triton_mm_2785 0.0142 ms 72.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:03.0312679Z triton_mm_2784 0.0151 ms 68.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:03.0313795Z triton_mm_2789 0.0157 ms 65.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:13:03.0314785Z triton_mm_2800 0.0160 ms 64.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:03.0315922Z triton_mm_2793 0.0163 ms 62.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:03.0316886Z triton_mm_2792 0.0167 ms 61.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:03.0317724Z SingleProcess AUTOTUNE benchmarking takes 0.2174 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:13:03.7280334Z Autotune Choices Stats: 2025-09-07T11:13:03.7281613Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.019680000841617584, "best_triton_pos": 1, "best_triton_time": 0.02070399932563305, "best_triton_kernel": "triton_mm_7030", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:13:03.7317685Z AUTOTUNE mm(256x6272, 6272x1024) 2025-09-07T11:13:03.7317941Z strides: [1, 256], [1024, 1] 2025-09-07T11:13:03.7318200Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:03.7318459Z mm 0.0197 ms 100.0% 2025-09-07T11:13:03.7319041Z triton_mm_7030 0.0207 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:03.7320014Z triton_mm_7034 0.0216 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:03.7321334Z triton_mm_7038 0.0256 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:03.7322498Z triton_mm_7044 0.0398 ms 49.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:03.7323469Z triton_mm_7029 0.0425 ms 46.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:03.7324433Z triton_mm_7028 0.0445 ms 44.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:03.7325659Z triton_mm_7033 0.0453 ms 43.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:13:03.7326561Z triton_mm_7037 0.0457 ms 43.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:03.7327471Z triton_mm_7043 0.0513 ms 38.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:03.7328261Z SingleProcess AUTOTUNE benchmarking takes 0.3194 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:13:04.0852815Z Autotune Choices Stats: 2025-09-07T11:13:04.0854068Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01961600035429001, "best_triton_pos": 1, "best_triton_time": 0.021183999255299568, "best_triton_kernel": "triton_mm_7068", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:13:04.0889150Z AUTOTUNE mm(1024x6272, 6272x256) 2025-09-07T11:13:04.0889417Z strides: [1, 1024], [256, 1] 2025-09-07T11:13:04.0889689Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:04.0889949Z mm 0.0196 ms 100.0% 2025-09-07T11:13:04.0890577Z triton_mm_7068 0.0212 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:04.0891582Z triton_mm_7072 0.0216 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:04.0892554Z triton_mm_7076 0.0262 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:04.0893561Z triton_mm_7082 0.0398 ms 49.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:04.0894702Z triton_mm_7067 0.0424 ms 46.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:04.0895727Z triton_mm_7066 0.0436 ms 44.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:04.0896608Z triton_mm_7071 0.0446 ms 43.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:13:04.0897497Z triton_mm_7075 0.0451 ms 43.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:04.0898675Z triton_mm_7081 0.0507 ms 38.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:04.0899619Z SingleProcess AUTOTUNE benchmarking takes 0.3126 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:13:04.4608752Z Autotune Choices Stats: 2025-09-07T11:13:04.4610004Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01603199914097786, "best_triton_pos": 1, "best_triton_time": 0.01974399946630001, "best_triton_kernel": "triton_mm_7208", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:13:04.4647038Z AUTOTUNE mm(768x6272, 6272x256) 2025-09-07T11:13:04.4647381Z strides: [1, 768], [256, 1] 2025-09-07T11:13:04.4647639Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:04.4647914Z mm 0.0160 ms 100.0% 2025-09-07T11:13:04.4648543Z triton_mm_7208 0.0197 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:04.4649581Z triton_mm_7212 0.0209 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:04.4650568Z triton_mm_7216 0.0258 ms 62.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:04.4651543Z triton_mm_7222 0.0399 ms 40.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:04.4652516Z triton_mm_7207 0.0411 ms 39.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:04.4653478Z triton_mm_7206 0.0423 ms 37.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:04.4654530Z triton_mm_7211 0.0434 ms 36.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:13:04.4655788Z triton_mm_7215 0.0442 ms 36.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:04.4656685Z triton_mm_7221 0.0492 ms 32.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:04.4657470Z SingleProcess AUTOTUNE benchmarking takes 0.3112 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:13:04.9851542Z Autotune Choices Stats: 2025-09-07T11:13:04.9852920Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01571200042963028, "best_triton_pos": 1, "best_triton_time": 0.018880000337958336, "best_triton_kernel": "triton_mm_7106", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:13:04.9890157Z AUTOTUNE mm(256x6272, 6272x256) 2025-09-07T11:13:04.9890521Z strides: [1, 256], [256, 1] 2025-09-07T11:13:04.9890805Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:04.9891092Z mm 0.0157 ms 100.0% 2025-09-07T11:13:04.9891719Z triton_mm_7106 0.0189 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:04.9892764Z triton_mm_7110 0.0205 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:04.9894503Z triton_mm_7114 0.0254 ms 61.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:04.9895934Z triton_mm_7105 0.0386 ms 40.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:04.9896865Z triton_mm_7120 0.0387 ms 40.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:04.9897759Z triton_mm_7104 0.0403 ms 39.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:04.9898648Z triton_mm_7109 0.0407 ms 38.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:13:04.9899582Z triton_mm_7113 0.0421 ms 37.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:04.9900480Z triton_mm_7119 0.0479 ms 32.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:04.9901279Z SingleProcess AUTOTUNE benchmarking takes 0.3105 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:13:07.1044135Z Autotune Choices Stats: 2025-09-07T11:13:07.1046160Z {"num_choices": 29, "num_triton_choices": 19, "best_kernel": "decompose_k_mm_7_split_3", "best_kernel_desc": "k_split=7", "best_time": 0.02454400062561035, "best_triton_pos": 4, "best_triton_time": 0.05657599866390228, "best_triton_kernel": "triton_mm_7462", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:13:07.1098694Z AUTOTUNE mm(128x25088, 25088x512) 2025-09-07T11:13:07.1099000Z strides: [1, 128], [512, 1] 2025-09-07T11:13:07.1099274Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:07.1099626Z decompose_k_mm_7_split_3 0.0245 ms 100.0% k_split=7 2025-09-07T11:13:07.1099945Z mm 0.0261 ms 94.0% 2025-09-07T11:13:07.1100212Z decompose_k_mm_4_split_2 0.0275 ms 89.2% k_split=4 2025-09-07T11:13:07.1100552Z decompose_k_mm_2_split_1 0.0299 ms 82.1% k_split=2 2025-09-07T11:13:07.1101238Z triton_mm_7462 0.0566 ms 43.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:07.1102327Z triton_mm_7466 0.0626 ms 39.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:07.1103009Z decompose_k_mm_16_split_6 0.0641 ms 38.3% k_split=16 2025-09-07T11:13:07.1103367Z decompose_k_mm_14_split_5 0.0643 ms 38.2% k_split=14 2025-09-07T11:13:07.1103719Z decompose_k_mm_8_split_4 0.0647 ms 38.0% k_split=8 2025-09-07T11:13:07.1104061Z decompose_k_mm_32_split_8 0.0654 ms 37.5% k_split=32 2025-09-07T11:13:07.1104618Z SingleProcess AUTOTUNE benchmarking takes 1.9191 seconds and 0.0002 seconds precompiling for 29 choices 2025-09-07T11:13:08.1263676Z Autotune Choices Stats: 2025-09-07T11:13:08.1265429Z {"num_choices": 29, "num_triton_choices": 19, "best_kernel": "decompose_k_mm_7_split_12", "best_kernel_desc": "k_split=7", "best_time": 0.024191999807953835, "best_triton_pos": 4, "best_triton_time": 0.05728000029921532, "best_triton_kernel": "triton_mm_7500", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:13:08.1319850Z AUTOTUNE mm(512x25088, 25088x128) 2025-09-07T11:13:08.1320129Z strides: [1, 512], [128, 1] 2025-09-07T11:13:08.1320406Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:08.1321063Z decompose_k_mm_7_split_12 0.0242 ms 100.0% k_split=7 2025-09-07T11:13:08.1321414Z mm 0.0260 ms 92.9% 2025-09-07T11:13:08.1321676Z decompose_k_mm_4_split_11 0.0280 ms 86.3% k_split=4 2025-09-07T11:13:08.1322010Z decompose_k_mm_2_split_10 0.0300 ms 80.7% k_split=2 2025-09-07T11:13:08.1322702Z triton_mm_7500 0.0573 ms 42.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:08.1323688Z triton_mm_7504 0.0630 ms 38.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:08.1324378Z decompose_k_mm_16_split_15 0.0641 ms 37.7% k_split=16 2025-09-07T11:13:08.1324781Z decompose_k_mm_14_split_14 0.0644 ms 37.6% k_split=14 2025-09-07T11:13:08.1325282Z decompose_k_mm_8_split_13 0.0644 ms 37.6% k_split=8 2025-09-07T11:13:08.1325594Z decompose_k_mm_28_split_16 0.0651 ms 37.2% k_split=28 2025-09-07T11:13:08.1326085Z SingleProcess AUTOTUNE benchmarking takes 0.9884 seconds and 0.0002 seconds precompiling for 29 choices 2025-09-07T11:13:10.0416230Z Autotune Choices Stats: 2025-09-07T11:13:10.0418345Z {"num_choices": 30, "num_triton_choices": 19, "best_kernel": "decompose_k_mm_7_split_31", "best_kernel_desc": "k_split=7", "best_time": 0.02284800074994564, "best_triton_pos": 11, "best_triton_time": 0.05603199824690819, "best_triton_kernel": "triton_mm_7640", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:13:10.0473284Z AUTOTUNE mm(384x25088, 25088x128) 2025-09-07T11:13:10.0473739Z strides: [1, 384], [128, 1] 2025-09-07T11:13:10.0474165Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:10.0474663Z decompose_k_mm_7_split_31 0.0228 ms 100.0% k_split=7 2025-09-07T11:13:10.0475400Z decompose_k_mm_4_split_30 0.0233 ms 97.9% k_split=4 2025-09-07T11:13:10.0475862Z mm 0.0241 ms 94.7% 2025-09-07T11:13:10.0476249Z decompose_k_mm_2_split_29 0.0280 ms 81.6% k_split=2 2025-09-07T11:13:10.0476772Z decompose_k_mm_14_split_33 0.0516 ms 44.2% k_split=14 2025-09-07T11:13:10.0477302Z decompose_k_mm_16_split_34 0.0520 ms 44.0% k_split=16 2025-09-07T11:13:10.0477821Z decompose_k_mm_8_split_32 0.0523 ms 43.7% k_split=8 2025-09-07T11:13:10.0478340Z decompose_k_mm_28_split_35 0.0524 ms 43.6% k_split=28 2025-09-07T11:13:10.0478869Z decompose_k_mm_32_split_37 0.0527 ms 43.3% k_split=32 2025-09-07T11:13:10.0479382Z decompose_k_mm_49_split_28 0.0533 ms 42.9% k_split=49 2025-09-07T11:13:10.0480173Z SingleProcess AUTOTUNE benchmarking takes 1.8698 seconds and 0.0002 seconds precompiling for 30 choices 2025-09-07T11:13:12.1082138Z Autotune Choices Stats: 2025-09-07T11:13:12.1084228Z {"num_choices": 30, "num_triton_choices": 19, "best_kernel": "decompose_k_mm_7_split_23", "best_kernel_desc": "k_split=7", "best_time": 0.016575999557971954, "best_triton_pos": 11, "best_triton_time": 0.05488000065088272, "best_triton_kernel": "triton_mm_7538", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:13:12.1140164Z AUTOTUNE mm(128x25088, 25088x128) 2025-09-07T11:13:12.1140593Z strides: [1, 128], [128, 1] 2025-09-07T11:13:12.1141006Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:12.1141605Z decompose_k_mm_7_split_23 0.0166 ms 100.0% k_split=7 2025-09-07T11:13:12.1142057Z mm 0.0172 ms 96.3% 2025-09-07T11:13:12.1142446Z decompose_k_mm_98_split_19 0.0191 ms 86.6% k_split=98 2025-09-07T11:13:12.1142963Z decompose_k_mm_4_split_22 0.0194 ms 85.6% k_split=4 2025-09-07T11:13:12.1144066Z decompose_k_mm_196_split_20 0.0226 ms 73.5% k_split=196 2025-09-07T11:13:12.1144598Z decompose_k_mm_14_split_25 0.0256 ms 64.7% k_split=14 2025-09-07T11:13:12.1145359Z decompose_k_mm_49_split_18 0.0257 ms 64.4% k_split=49 2025-09-07T11:13:12.1146262Z decompose_k_mm_16_split_26 0.0260 ms 63.7% k_split=16 2025-09-07T11:13:12.1146822Z decompose_k_mm_28_split_27 0.0260 ms 63.7% k_split=28 2025-09-07T11:13:12.1147340Z decompose_k_mm_8_split_24 0.0262 ms 63.2% k_split=8 2025-09-07T11:13:12.1148107Z SingleProcess AUTOTUNE benchmarking takes 1.8824 seconds and 0.0002 seconds precompiling for 30 choices 2025-09-07T11:13:13.2711089Z Autotune Choices Stats: 2025-09-07T11:13:13.2712377Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "mm", "best_time": 0.008448000065982342, "best_triton_pos": 1, "best_triton_time": 0.008895999751985073, "best_triton_kernel": "triton_mm_2658", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2"} 2025-09-07T11:13:13.2753486Z AUTOTUNE mm(8x1000, 1000x512) 2025-09-07T11:13:13.2753734Z strides: [1000, 1], [512, 1] 2025-09-07T11:13:13.2753984Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:13.2754290Z mm 0.0084 ms 100.0% 2025-09-07T11:13:13.2754893Z triton_mm_2658 0.0089 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:13:13.2756303Z triton_mm_2662 0.0093 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:13.2757524Z triton_mm_2666 0.0096 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:13.2758564Z triton_mm_2656 0.0108 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:13.2759579Z triton_mm_2657 0.0108 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:13:13.2760595Z triton_mm_2661 0.0113 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:13.2761580Z triton_mm_2670 0.0113 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:13.2762555Z triton_mm_2668 0.0122 ms 69.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:13.2763540Z triton_mm_2665 0.0124 ms 68.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:13.2764411Z SingleProcess AUTOTUNE benchmarking takes 0.1894 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:13:13.5107716Z Autotune Choices Stats: 2025-09-07T11:13:13.5108965Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.013824000023305416, "best_triton_pos": 1, "best_triton_time": 0.014560000039637089, "best_triton_kernel": "triton_mm_2737", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T11:13:13.5147724Z AUTOTUNE mm(1568x2048, 2048x512) 2025-09-07T11:13:13.5147975Z strides: [2048, 1], [512, 1] 2025-09-07T11:13:13.5148238Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:13.5148498Z mm 0.0138 ms 100.0% 2025-09-07T11:13:13.5149741Z triton_mm_2737 0.0146 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:13.5150942Z triton_mm_2733 0.0180 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:13.5151930Z triton_mm_2743 0.0187 ms 74.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:13.5152903Z triton_mm_2736 0.0199 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:13.5153867Z triton_mm_2732 0.0205 ms 67.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:13:13.5154834Z triton_mm_2742 0.0218 ms 63.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:13.5156150Z triton_mm_2735 0.0231 ms 59.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:13.5157131Z triton_mm_2739 0.0234 ms 59.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:13.5158056Z triton_mm_2729 0.0274 ms 50.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:13.5158866Z SingleProcess AUTOTUNE benchmarking takes 0.2374 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:13:13.7102863Z Autotune Choices Stats: 2025-09-07T11:13:13.7104098Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.00886400043964386, "best_triton_pos": 1, "best_triton_time": 0.009216000325977802, "best_triton_kernel": "triton_mm_2775", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T11:13:13.7143956Z AUTOTUNE mm(1568x512, 512x512) 2025-09-07T11:13:13.7144252Z strides: [512, 1], [512, 1] 2025-09-07T11:13:13.7144509Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:13.7144782Z mm 0.0089 ms 100.0% 2025-09-07T11:13:13.7145545Z triton_mm_2775 0.0092 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:13.7146678Z triton_mm_2774 0.0096 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:13.7147688Z triton_mm_2770 0.0099 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:13:13.7148643Z triton_mm_2773 0.0102 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:13.7149611Z triton_mm_2781 0.0102 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:13.7150594Z triton_mm_2777 0.0103 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:13.7151562Z triton_mm_2780 0.0107 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:13.7152945Z triton_mm_2771 0.0109 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:13.7153933Z triton_mm_2772 0.0116 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:13.7154784Z SingleProcess AUTOTUNE benchmarking takes 0.1985 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:13:13.9023761Z Autotune Choices Stats: 2025-09-07T11:13:13.9024777Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_bmm_2812", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.021407999098300934, "best_triton_pos": 0} 2025-09-07T11:13:13.9066417Z AUTOTUNE bmm(128x196x196, 128x196x32) 2025-09-07T11:13:13.9066762Z strides: [38432, 1, 196], [6272, 32, 1] 2025-09-07T11:13:13.9067045Z dtypes: torch.float32, torch.float32 2025-09-07T11:13:13.9067709Z triton_bmm_2812 0.0214 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:13.9068700Z triton_bmm_2808 0.0218 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:13.9069319Z bmm 0.0231 ms 92.8% 2025-09-07T11:13:13.9069897Z triton_bmm_2813 0.0232 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:13.9070878Z triton_bmm_2806 0.0243 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:13.9071862Z triton_bmm_2815 0.0253 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:13.9072840Z triton_bmm_2805 0.0258 ms 83.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:13.9073811Z triton_bmm_2814 0.0261 ms 82.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:13:13.9074788Z triton_bmm_2809 0.0266 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:13.9075909Z triton_bmm_2810 0.0270 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:13.9076765Z SingleProcess AUTOTUNE benchmarking takes 0.1912 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T11:13:14.1628359Z Autotune Choices Stats: 2025-09-07T11:13:14.1629553Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "bmm", "best_time": 0.028896000236272812, "best_triton_pos": 1, "best_triton_time": 0.03699199855327606, "best_triton_kernel": "triton_bmm_2832", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:13:14.1676369Z AUTOTUNE bmm(128x196x32, 128x32x196) 2025-09-07T11:13:14.1676698Z strides: [6272, 32, 1], [6272, 1, 32] 2025-09-07T11:13:14.1677024Z dtypes: torch.float32, torch.float32 2025-09-07T11:13:14.1677353Z bmm 0.0289 ms 100.0% 2025-09-07T11:13:14.1678416Z triton_bmm_2832 0.0370 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:14.1679587Z triton_bmm_2828 0.0452 ms 64.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:14.1680566Z triton_bmm_2826 0.0452 ms 63.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:14.1681542Z triton_bmm_2831 0.0454 ms 63.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:13:14.1682526Z triton_bmm_2833 0.0454 ms 63.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:14.1683512Z triton_bmm_2829 0.0454 ms 63.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:14.1684490Z triton_bmm_2830 0.0506 ms 57.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:14.1685810Z triton_bmm_2823 0.0508 ms 56.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:14.1686883Z triton_bmm_2825 0.0509 ms 56.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:14.1687660Z SingleProcess AUTOTUNE benchmarking takes 0.2604 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:13:14.3471407Z Autotune Choices Stats: 2025-09-07T11:13:14.3472388Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_bmm_2842", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.02131200022995472, "best_triton_pos": 0} 2025-09-07T11:13:14.3515262Z AUTOTUNE bmm(128x32x196, 128x196x196) 2025-09-07T11:13:14.3515564Z strides: [6272, 1, 32], [38416, 196, 1] 2025-09-07T11:13:14.3515841Z dtypes: torch.float32, torch.float32 2025-09-07T11:13:14.3516506Z triton_bmm_2842 0.0213 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:14.3517741Z triton_bmm_2845 0.0220 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:14.3518371Z bmm 0.0232 ms 92.0% 2025-09-07T11:13:14.3518972Z triton_bmm_2843 0.0236 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:14.3519954Z triton_bmm_2840 0.0240 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:14.3520949Z triton_bmm_2844 0.0253 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:14.3521939Z triton_bmm_2839 0.0259 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:14.3522913Z triton_bmm_2847 0.0260 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:13:14.3524298Z triton_bmm_2846 0.0268 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:14.3525470Z triton_bmm_2836 0.0284 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:14.3526366Z SingleProcess AUTOTUNE benchmarking takes 0.1820 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T11:13:14.6226005Z Autotune Choices Stats: 2025-09-07T11:13:14.6227429Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "bmm", "best_time": 0.025151999667286873, "best_triton_pos": 1, "best_triton_time": 0.04931199923157692, "best_triton_kernel": "triton_bmm_2860", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:13:14.6268501Z AUTOTUNE bmm(128x196x196, 128x196x32) 2025-09-07T11:13:14.6268776Z strides: [38416, 196, 1], [6272, 1, 196] 2025-09-07T11:13:14.6269056Z dtypes: torch.float32, torch.float32 2025-09-07T11:13:14.6269325Z bmm 0.0252 ms 100.0% 2025-09-07T11:13:14.6269914Z triton_bmm_2860 0.0493 ms 51.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:14.6270897Z triton_bmm_2863 0.0564 ms 44.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:14.6271873Z triton_bmm_2849 0.0679 ms 37.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:13:14.6272843Z triton_bmm_2851 0.0695 ms 36.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:14.6273807Z triton_bmm_2856 0.0749 ms 33.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:14.6274766Z triton_bmm_2854 0.0755 ms 33.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:14.6275893Z triton_bmm_2861 0.0760 ms 33.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:14.6276868Z triton_bmm_2853 0.0760 ms 33.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:14.6277827Z triton_bmm_2857 0.0763 ms 33.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:14.6278601Z SingleProcess AUTOTUNE benchmarking takes 0.2747 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T11:13:14.8487165Z Autotune Choices Stats: 2025-09-07T11:13:14.8488381Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.012000000104308128, "best_triton_pos": 1, "best_triton_time": 0.012736000120639801, "best_triton_kernel": "triton_mm_2877", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T11:13:14.8529633Z AUTOTUNE mm(1568x1536, 1536x512) 2025-09-07T11:13:14.8529923Z strides: [1536, 1], [512, 1] 2025-09-07T11:13:14.8530184Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:14.8530848Z mm 0.0120 ms 100.0% 2025-09-07T11:13:14.8531619Z triton_mm_2877 0.0127 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:14.8532621Z triton_mm_2883 0.0150 ms 79.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:14.8533624Z triton_mm_2873 0.0159 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:14.8534587Z triton_mm_2872 0.0161 ms 74.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:13:14.8535892Z triton_mm_2876 0.0166 ms 72.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:14.8536912Z triton_mm_2875 0.0180 ms 66.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:14.8537757Z triton_mm_2882 0.0183 ms 65.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:14.8538598Z triton_mm_2879 0.0184 ms 65.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:14.8539428Z triton_mm_2869 0.0226 ms 53.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:14.8540162Z SingleProcess AUTOTUNE benchmarking takes 0.2248 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:13:15.3577981Z Autotune Choices Stats: 2025-09-07T11:13:15.3579216Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01375999953597784, "best_triton_pos": 1, "best_triton_time": 0.01532800029963255, "best_triton_kernel": "triton_mm_7063", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:13:15.3622441Z AUTOTUNE mm(6272x1024, 1024x256) 2025-09-07T11:13:15.3622737Z strides: [1024, 1], [256, 1] 2025-09-07T11:13:15.3623003Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:15.3623300Z mm 0.0138 ms 100.0% 2025-09-07T11:13:15.3623913Z triton_mm_7063 0.0153 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:15.3624921Z triton_mm_7056 0.0164 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:15.3626269Z triton_mm_7052 0.0167 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:13:15.3627361Z triton_mm_7062 0.0174 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:15.3628330Z triton_mm_7057 0.0180 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:15.3629302Z triton_mm_7055 0.0185 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:15.3630593Z triton_mm_7059 0.0192 ms 71.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:15.3631694Z triton_mm_7053 0.0197 ms 69.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:15.3632662Z triton_mm_7054 0.0213 ms 64.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:15.3633503Z SingleProcess AUTOTUNE benchmarking takes 0.2239 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:13:15.5594272Z Autotune Choices Stats: 2025-09-07T11:13:15.5595626Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_7094", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.009600000455975533, "best_triton_pos": 0} 2025-09-07T11:13:15.5637098Z AUTOTUNE mm(6272x256, 256x256) 2025-09-07T11:13:15.5637418Z strides: [256, 1], [256, 1] 2025-09-07T11:13:15.5637734Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:15.5638456Z triton_mm_7094 0.0096 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:15.5639452Z triton_mm_7090 0.0097 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:13:15.5640421Z triton_mm_7093 0.0099 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:15.5641031Z mm 0.0101 ms 95.2% 2025-09-07T11:13:15.5641619Z triton_mm_7100 0.0103 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:15.5642598Z triton_mm_7101 0.0103 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:15.5643577Z triton_mm_7092 0.0103 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:15.5644543Z triton_mm_7097 0.0103 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:15.5645681Z triton_mm_7096 0.0108 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:15.5646679Z triton_mm_7099 0.0110 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:15.5647593Z SingleProcess AUTOTUNE benchmarking takes 0.1997 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:13:15.7828069Z Autotune Choices Stats: 2025-09-07T11:13:15.7829291Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "bmm", "best_time": 0.03574400022625923, "best_triton_pos": 1, "best_triton_time": 0.036639999598264694, "best_triton_kernel": "triton_bmm_7125", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4"} 2025-09-07T11:13:15.7873549Z AUTOTUNE bmm(256x196x196, 256x196x32) 2025-09-07T11:13:15.7873823Z strides: [38416, 1, 196], [6272, 32, 1] 2025-09-07T11:13:15.7874110Z dtypes: torch.float32, torch.float32 2025-09-07T11:13:15.7874663Z bmm 0.0357 ms 100.0% 2025-09-07T11:13:15.7875438Z triton_bmm_7125 0.0366 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:15.7876707Z triton_bmm_7132 0.0368 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:15.7877691Z triton_bmm_7121 0.0393 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:13:15.7878627Z triton_bmm_7133 0.0402 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:15.7879520Z triton_bmm_7126 0.0411 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:15.7880425Z triton_bmm_7128 0.0425 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:15.7881320Z triton_bmm_7129 0.0444 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:15.7882216Z triton_bmm_7124 0.0452 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:15.7883108Z triton_bmm_7130 0.0456 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:15.7883893Z SingleProcess AUTOTUNE benchmarking takes 0.2230 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T11:13:16.0921728Z Autotune Choices Stats: 2025-09-07T11:13:16.0922955Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "bmm", "best_time": 0.05337600037455559, "best_triton_pos": 1, "best_triton_time": 0.0674239993095398, "best_triton_kernel": "triton_bmm_7152", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:13:16.0964610Z AUTOTUNE bmm(256x196x32, 256x32x196) 2025-09-07T11:13:16.0964895Z strides: [6272, 32, 1], [6272, 1, 32] 2025-09-07T11:13:16.0965333Z dtypes: torch.float32, torch.float32 2025-09-07T11:13:16.0965640Z bmm 0.0534 ms 100.0% 2025-09-07T11:13:16.0966270Z triton_bmm_7152 0.0674 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:16.0967282Z triton_bmm_7149 0.0842 ms 63.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:16.0968264Z triton_bmm_7148 0.0842 ms 63.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:16.0969248Z triton_bmm_7151 0.0844 ms 63.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:13:16.0970220Z triton_bmm_7146 0.0844 ms 63.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:16.0971195Z triton_bmm_7153 0.0847 ms 63.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:16.0972522Z triton_bmm_7150 0.0944 ms 56.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:16.0973682Z triton_bmm_7143 0.0945 ms 56.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:16.0974656Z triton_bmm_7145 0.0945 ms 56.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:16.0975672Z SingleProcess AUTOTUNE benchmarking takes 0.3073 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:13:16.2681184Z Autotune Choices Stats: 2025-09-07T11:13:16.2682196Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_bmm_7162", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.03651199862360954, "best_triton_pos": 0} 2025-09-07T11:13:16.2724172Z AUTOTUNE bmm(256x32x196, 256x196x196) 2025-09-07T11:13:16.2724469Z strides: [6272, 1, 32], [38416, 196, 1] 2025-09-07T11:13:16.2724769Z dtypes: torch.float32, torch.float32 2025-09-07T11:13:16.2725610Z triton_bmm_7162 0.0365 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:16.2726646Z triton_bmm_7159 0.0368 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:16.2727624Z triton_bmm_7154 0.0383 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:13:16.2728623Z triton_bmm_7163 0.0406 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:16.2729251Z bmm 0.0408 ms 89.5% 2025-09-07T11:13:16.2729849Z triton_bmm_7160 0.0412 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:16.2730834Z triton_bmm_7165 0.0431 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:16.2731827Z triton_bmm_7164 0.0450 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:16.2732812Z triton_bmm_7166 0.0453 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:16.2733806Z triton_bmm_7167 0.0461 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:13:16.2734697Z SingleProcess AUTOTUNE benchmarking takes 0.1743 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T11:13:16.6213029Z Autotune Choices Stats: 2025-09-07T11:13:16.6214280Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "bmm", "best_time": 0.0432640016078949, "best_triton_pos": 1, "best_triton_time": 0.09487999975681305, "best_triton_kernel": "triton_bmm_7180", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:13:16.6260479Z AUTOTUNE bmm(256x196x196, 256x196x32) 2025-09-07T11:13:16.6260769Z strides: [38416, 196, 1], [6272, 1, 196] 2025-09-07T11:13:16.6261565Z dtypes: torch.float32, torch.float32 2025-09-07T11:13:16.6261846Z bmm 0.0433 ms 100.0% 2025-09-07T11:13:16.6262661Z triton_bmm_7180 0.0949 ms 45.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:16.6263703Z triton_bmm_7183 0.1079 ms 40.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:16.6264685Z triton_bmm_7169 0.1280 ms 33.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:13:16.6265805Z triton_bmm_7171 0.1317 ms 32.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:16.6266771Z triton_bmm_7173 0.1390 ms 31.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:16.6267910Z triton_bmm_7176 0.1430 ms 30.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:16.6268875Z triton_bmm_7174 0.1438 ms 30.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:16.6269873Z triton_bmm_7181 0.1443 ms 30.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:16.6270869Z triton_bmm_7177 0.1459 ms 29.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:16.6271718Z SingleProcess AUTOTUNE benchmarking takes 0.3528 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T11:13:16.8400107Z Autotune Choices Stats: 2025-09-07T11:13:16.8401363Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.012512000277638435, "best_triton_pos": 1, "best_triton_time": 0.013279999606311321, "best_triton_kernel": "triton_mm_7203", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:13:16.8444051Z AUTOTUNE mm(6272x768, 768x256) 2025-09-07T11:13:16.8444312Z strides: [768, 1], [256, 1] 2025-09-07T11:13:16.8444585Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:16.8444860Z mm 0.0125 ms 100.0% 2025-09-07T11:13:16.8445823Z triton_mm_7203 0.0133 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:16.8446820Z triton_mm_7196 0.0140 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:16.8447822Z triton_mm_7192 0.0143 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:13:16.8448713Z triton_mm_7202 0.0146 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:16.8449608Z triton_mm_7195 0.0154 ms 81.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:16.8450496Z triton_mm_7199 0.0159 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:16.8451639Z triton_mm_7197 0.0160 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:16.8452690Z triton_mm_7194 0.0173 ms 72.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:16.8453583Z triton_mm_7193 0.0174 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:16.8454362Z SingleProcess AUTOTUNE benchmarking takes 0.2160 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:13:17.1085425Z Autotune Choices Stats: 2025-09-07T11:13:17.1086481Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_7488", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.019487999379634857, "best_triton_pos": 0} 2025-09-07T11:13:17.1134866Z AUTOTUNE mm(25088x512, 512x128) 2025-09-07T11:13:17.1135296Z strides: [512, 1], [128, 1] 2025-09-07T11:13:17.1135536Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:17.1136103Z triton_mm_7488 0.0195 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:17.1136643Z mm 0.0202 ms 96.7% 2025-09-07T11:13:17.1137154Z triton_mm_7494 0.0207 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:17.1137965Z triton_mm_7486 0.0225 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:17.1138777Z triton_mm_7495 0.0225 ms 86.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:17.1139580Z triton_mm_7487 0.0228 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:17.1140381Z triton_mm_7489 0.0230 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:17.1141188Z triton_mm_7493 0.0233 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:17.1142087Z triton_mm_7484 0.0235 ms 82.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:13:17.1142894Z triton_mm_7491 0.0244 ms 79.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:17.1143597Z SingleProcess AUTOTUNE benchmarking takes 0.2374 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:13:17.3234375Z Autotune Choices Stats: 2025-09-07T11:13:17.3235585Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_7526", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.010432000271975994, "best_triton_pos": 0} 2025-09-07T11:13:17.3282481Z AUTOTUNE mm(25088x128, 128x128) 2025-09-07T11:13:17.3282762Z strides: [128, 1], [128, 1] 2025-09-07T11:13:17.3283000Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:17.3283614Z triton_mm_7526 0.0104 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:17.3285393Z triton_mm_7524 0.0107 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:17.3286330Z triton_mm_7528 0.0111 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:17.3287233Z triton_mm_7525 0.0111 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:17.3288127Z triton_mm_7521 0.0112 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:17.3288701Z mm 0.0113 ms 92.6% 2025-09-07T11:13:17.3289237Z triton_mm_7531 0.0113 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:17.3290165Z triton_mm_7529 0.0114 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:17.3291075Z triton_mm_7532 0.0115 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:17.3291981Z triton_mm_7527 0.0116 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:17.3292771Z SingleProcess AUTOTUNE benchmarking takes 0.2030 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:13:17.5711573Z Autotune Choices Stats: 2025-09-07T11:13:17.5713057Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "bmm", "best_time": 0.058880001306533813, "best_triton_pos": 1, "best_triton_time": 0.06220800057053566, "best_triton_kernel": "triton_bmm_7564", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:13:17.5761319Z AUTOTUNE bmm(512x196x196, 512x196x32) 2025-09-07T11:13:17.5761654Z strides: [38416, 1, 196], [6272, 32, 1] 2025-09-07T11:13:17.5761948Z dtypes: torch.float32, torch.float32 2025-09-07T11:13:17.5762222Z bmm 0.0589 ms 100.0% 2025-09-07T11:13:17.5762881Z triton_bmm_7564 0.0622 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:17.5763928Z triton_bmm_7557 0.0632 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:17.5765372Z triton_bmm_7560 0.0644 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:17.5766394Z triton_bmm_7558 0.0664 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:17.5767440Z triton_bmm_7565 0.0677 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:17.5768491Z triton_bmm_7553 0.0697 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:13:17.5769820Z triton_bmm_7566 0.0728 ms 80.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:13:17.5770982Z triton_bmm_7555 0.0759 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:17.5771969Z triton_bmm_7567 0.0764 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:17.5772823Z SingleProcess AUTOTUNE benchmarking takes 0.2386 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T11:13:18.0463510Z Autotune Choices Stats: 2025-09-07T11:13:18.0464792Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "bmm", "best_time": 0.09824000298976898, "best_triton_pos": 1, "best_triton_time": 0.12396799772977829, "best_triton_kernel": "triton_bmm_7584", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:13:18.0512750Z AUTOTUNE bmm(512x196x32, 512x32x196) 2025-09-07T11:13:18.0513031Z strides: [6272, 32, 1], [6272, 1, 32] 2025-09-07T11:13:18.0513293Z dtypes: torch.float32, torch.float32 2025-09-07T11:13:18.0513560Z bmm 0.0982 ms 100.0% 2025-09-07T11:13:18.0514148Z triton_bmm_7584 0.1240 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:18.0515270Z triton_bmm_7578 0.1582 ms 62.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:18.0516237Z triton_bmm_7580 0.1582 ms 62.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:13:18.0517234Z triton_bmm_7581 0.1587 ms 61.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:18.0518210Z triton_bmm_7585 0.1608 ms 61.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:18.0519180Z triton_bmm_7583 0.1618 ms 60.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:13:18.0520153Z triton_bmm_7575 0.1786 ms 55.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:18.0521122Z triton_bmm_7582 0.1786 ms 55.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:18.0522100Z triton_bmm_7577 0.1786 ms 55.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:18.0522943Z SingleProcess AUTOTUNE benchmarking takes 0.4729 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:13:18.2756477Z Autotune Choices Stats: 2025-09-07T11:13:18.2757491Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_bmm_7594", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.060416001826524734, "best_triton_pos": 0} 2025-09-07T11:13:18.2801053Z AUTOTUNE bmm(512x32x196, 512x196x196) 2025-09-07T11:13:18.2801371Z strides: [6272, 1, 32], [38416, 196, 1] 2025-09-07T11:13:18.2801683Z dtypes: torch.float32, torch.float32 2025-09-07T11:13:18.2802914Z triton_bmm_7594 0.0604 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:18.2804179Z triton_bmm_7591 0.0631 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:18.2804825Z bmm 0.0636 ms 95.0% 2025-09-07T11:13:18.2805625Z triton_bmm_7597 0.0656 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:18.2806601Z triton_bmm_7592 0.0668 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:18.2807618Z triton_bmm_7595 0.0678 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:18.2808639Z triton_bmm_7586 0.0688 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:13:18.2809551Z triton_bmm_7599 0.0705 ms 85.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:13:18.2810464Z triton_bmm_7596 0.0743 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:18.2811369Z triton_bmm_7589 0.0777 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:18.2812156Z SingleProcess AUTOTUNE benchmarking takes 0.2266 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T11:13:18.7881671Z Autotune Choices Stats: 2025-09-07T11:13:18.7882955Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "bmm", "best_time": 0.06905599683523178, "best_triton_pos": 1, "best_triton_time": 0.17846399545669556, "best_triton_kernel": "triton_bmm_7612", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:13:18.7928323Z AUTOTUNE bmm(512x196x196, 512x196x32) 2025-09-07T11:13:18.7928622Z strides: [38416, 196, 1], [6272, 1, 196] 2025-09-07T11:13:18.7928979Z dtypes: torch.float32, torch.float32 2025-09-07T11:13:18.7929251Z bmm 0.0691 ms 100.0% 2025-09-07T11:13:18.7929878Z triton_bmm_7612 0.1785 ms 38.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:18.7930915Z triton_bmm_7615 0.2000 ms 34.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:18.7931954Z triton_bmm_7601 0.2449 ms 28.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:13:18.7932963Z triton_bmm_7603 0.2516 ms 27.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:13:18.7933965Z triton_bmm_7605 0.2643 ms 26.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:18.7935337Z triton_bmm_7608 0.2766 ms 25.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:18.7936631Z triton_bmm_7606 0.2767 ms 25.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:13:18.7937886Z triton_bmm_7613 0.2780 ms 24.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:18.7938938Z triton_bmm_7614 0.2791 ms 24.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:13:18.7939805Z SingleProcess AUTOTUNE benchmarking takes 0.5121 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T11:13:19.0099491Z Autotune Choices Stats: 2025-09-07T11:13:19.0100532Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_7628", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.016127999871969223, "best_triton_pos": 0} 2025-09-07T11:13:19.0149261Z AUTOTUNE mm(25088x384, 384x128) 2025-09-07T11:13:19.0149582Z strides: [384, 1], [128, 1] 2025-09-07T11:13:19.0149849Z dtypes: torch.float16, torch.float16 2025-09-07T11:13:19.0150550Z triton_mm_7628 0.0161 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:19.0151222Z mm 0.0169 ms 95.5% 2025-09-07T11:13:19.0151835Z triton_mm_7634 0.0172 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:19.0152895Z triton_mm_7626 0.0178 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:19.0153958Z triton_mm_7627 0.0180 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:19.0155348Z triton_mm_7633 0.0187 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:19.0156423Z triton_mm_7631 0.0191 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:13:19.0157480Z triton_mm_7635 0.0193 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:13:19.0158548Z triton_mm_7630 0.0194 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:13:19.0159549Z triton_mm_7624 0.0197 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:13:19.0160390Z SingleProcess AUTOTUNE benchmarking takes 0.2201 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:13:34.2141317Z W0907 11:13:34.213000 54890 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T11:14:21.3218258Z pass 2025-09-07T11:14:29.2596476Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:14:29.2597757Z import pynvml # type: ignore[import] 2025-09-07T11:14:32.2729544Z 2025-09-07T11:14:33.2606360Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:14:33.2606730Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:14:33.2607035Z cuda train lcnet_050 2025-09-07T11:14:48.4811588Z Autotune Choices Stats: 2025-09-07T11:14:48.4812737Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_293", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007391999941319227, "best_triton_pos": 0} 2025-09-07T11:14:48.4864278Z AUTOTUNE addmm(8x1280, 8x256, 256x1280) 2025-09-07T11:14:48.4864686Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T11:14:48.4865432Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:14:48.4866137Z triton_mm_293 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:48.4867160Z triton_mm_288 0.0077 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:14:48.4868126Z triton_mm_289 0.0077 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:14:48.4869080Z triton_mm_287 0.0079 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:48.4870019Z triton_mm_292 0.0079 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:48.4871001Z triton_mm_297 0.0079 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:14:48.4871970Z triton_mm_301 0.0080 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:48.4872930Z triton_mm_286 0.0081 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:14:48.4873692Z bias_addmm 0.0082 ms 90.6% 2025-09-07T11:14:48.4874275Z triton_mm_296 0.0082 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:48.4875259Z SingleProcess AUTOTUNE benchmarking takes 0.2433 seconds and 0.0003 seconds precompiling for 19 choices 2025-09-07T11:14:48.9401486Z Autotune Choices Stats: 2025-09-07T11:14:48.9402523Z {"num_choices": 15, "num_triton_choices": 13, "best_kernel": "triton_mm_247", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.006912000011652708, "best_triton_pos": 0} 2025-09-07T11:14:48.9451555Z AUTOTUNE addmm(8x64, 8x256, 256x64) 2025-09-07T11:14:48.9451898Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T11:14:48.9452198Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:14:48.9452860Z triton_mm_247 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:14:48.9453956Z triton_mm_240 0.0070 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:14:48.9455327Z triton_mm_244 0.0070 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:48.9456820Z triton_mm_239 0.0071 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:14:48.9458026Z triton_mm_248 0.0071 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:48.9458985Z triton_mm_243 0.0073 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:48.9459963Z triton_mm_237 0.0075 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:14:48.9460913Z triton_mm_238 0.0075 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:48.9461988Z triton_mm_246 0.0079 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:14:48.9462942Z triton_mm_245 0.0084 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:48.9463807Z SingleProcess AUTOTUNE benchmarking takes 0.1995 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T11:14:49.3741787Z Autotune Choices Stats: 2025-09-07T11:14:49.3742846Z {"num_choices": 13, "num_triton_choices": 11, "best_kernel": "triton_mm_195", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006432000081986189, "best_triton_pos": 0} 2025-09-07T11:14:49.3791627Z AUTOTUNE addmm(8x32, 8x128, 128x32) 2025-09-07T11:14:49.3791915Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T11:14:49.3792326Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:14:49.3793073Z triton_mm_195 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:14:49.3794194Z triton_mm_203 0.0067 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:14:49.3795446Z triton_mm_199 0.0068 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T11:14:49.3796389Z triton_mm_201 0.0068 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T11:14:49.3797338Z triton_mm_200 0.0070 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T11:14:49.3798281Z triton_mm_194 0.0073 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:14:49.3799234Z triton_mm_196 0.0074 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:14:49.3800196Z triton_mm_202 0.0074 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T11:14:49.3800805Z bias_addmm 0.0077 ms 83.8% 2025-09-07T11:14:49.3801391Z triton_mm_198 0.0081 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:14:49.3802520Z SingleProcess AUTOTUNE benchmarking takes 0.1784 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T11:14:50.2093087Z Autotune Choices Stats: 2025-09-07T11:14:50.2094795Z {"num_choices": 6, "num_triton_choices": 5, "best_kernel": "triton_convolution2d_4", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.011296000331640244, "best_triton_pos": 0} 2025-09-07T11:14:50.2149547Z AUTOTUNE convolution(8x3x224x224, 8x3x3x3) 2025-09-07T11:14:50.2150060Z strides: [150528, 1, 672, 3], [27, 1, 9, 3] 2025-09-07T11:14:50.2150511Z dtypes: torch.float16, torch.float16 2025-09-07T11:14:50.2151680Z triton_convolution2d_4 0.0113 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:14:50.2153578Z triton_convolution2d_3 0.0167 ms 67.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:14:50.2155629Z triton_convolution2d_1 0.0303 ms 37.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:14:50.2157475Z triton_convolution2d_0 0.0313 ms 36.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:14:50.2159310Z triton_convolution2d_2 0.0328 ms 34.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:14:50.2160461Z convolution 0.0980 ms 11.5% 2025-09-07T11:14:50.2161173Z SingleProcess AUTOTUNE benchmarking takes 0.1317 seconds and 0.0002 seconds precompiling for 6 choices 2025-09-07T11:14:50.3688450Z Autotune Choices Stats: 2025-09-07T11:14:50.3689977Z {"num_choices": 12, "num_triton_choices": 11, "best_kernel": "triton_mm_15", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.007615999784320593, "best_triton_pos": 0} 2025-09-07T11:14:50.3744307Z AUTOTUNE mm(100352x8, 8x16) 2025-09-07T11:14:50.3744738Z strides: [8, 1], [1, 8] 2025-09-07T11:14:50.3745363Z dtypes: torch.float16, torch.float16 2025-09-07T11:14:50.3746351Z triton_mm_15 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:50.3747907Z triton_mm_10 0.0077 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:50.3749397Z triton_mm_14 0.0077 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:14:50.3750875Z triton_mm_9 0.0077 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:14:50.3752361Z triton_mm_11 0.0077 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:14:50.3753802Z triton_mm_12 0.0077 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:50.3756322Z triton_mm_13 0.0077 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:14:50.3757782Z triton_mm_8 0.0078 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:50.3759220Z triton_mm_7 0.0078 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:14:50.3760637Z triton_mm_5 0.0080 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:14:50.3761901Z SingleProcess AUTOTUNE benchmarking takes 0.1587 seconds and 0.0003 seconds precompiling for 12 choices 2025-09-07T11:14:50.5652356Z Autotune Choices Stats: 2025-09-07T11:14:50.5653951Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2", "best_time": 0.007007999811321497, "best_triton_pos": 0} 2025-09-07T11:14:50.5703827Z AUTOTUNE mm(25088x16, 16x32) 2025-09-07T11:14:50.5715984Z strides: [16, 1], [1, 16] 2025-09-07T11:14:50.5716486Z dtypes: torch.float16, torch.float16 2025-09-07T11:14:50.5717392Z triton_mm_16 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:14:50.5718794Z triton_mm_26 0.0070 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:50.5720212Z triton_mm_17 0.0072 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:14:50.5721583Z triton_mm_20 0.0073 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:50.5722875Z triton_mm_23 0.0073 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:50.5724211Z triton_mm_25 0.0073 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:14:50.5725782Z triton_mm_27 0.0073 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:14:50.5727175Z triton_mm_18 0.0074 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:50.5728523Z triton_mm_22 0.0075 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:14:50.5729849Z triton_mm_19 0.0075 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:50.5731031Z SingleProcess AUTOTUNE benchmarking takes 0.1952 seconds and 0.0003 seconds precompiling for 15 choices 2025-09-07T11:14:50.7613831Z Autotune Choices Stats: 2025-09-07T11:14:50.7615658Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.0071680000983178616, "best_triton_pos": 0} 2025-09-07T11:14:50.7667683Z AUTOTUNE mm(25088x32, 32x32) 2025-09-07T11:14:50.7668103Z strides: [32, 1], [1, 32] 2025-09-07T11:14:50.7668960Z dtypes: torch.float16, torch.float16 2025-09-07T11:14:50.7669954Z triton_mm_31 0.0072 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:14:50.7671417Z triton_mm_32 0.0073 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:50.7672874Z triton_mm_34 0.0073 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:50.7674336Z triton_mm_44 0.0073 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:50.7676074Z triton_mm_36 0.0073 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:14:50.7677515Z triton_mm_38 0.0073 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:50.7678964Z triton_mm_42 0.0073 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:14:50.7680391Z triton_mm_37 0.0074 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:14:50.7681811Z triton_mm_33 0.0074 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:50.7683277Z triton_mm_39 0.0074 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:14:50.7684571Z SingleProcess AUTOTUNE benchmarking takes 0.1957 seconds and 0.0003 seconds precompiling for 16 choices 2025-09-07T11:14:50.9755865Z Autotune Choices Stats: 2025-09-07T11:14:50.9757243Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_47", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.006463999859988689, "best_triton_pos": 0} 2025-09-07T11:14:50.9809691Z AUTOTUNE mm(6272x32, 32x64) 2025-09-07T11:14:50.9809953Z strides: [32, 1], [1, 32] 2025-09-07T11:14:50.9810201Z dtypes: torch.float16, torch.float16 2025-09-07T11:14:50.9810803Z triton_mm_47 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:50.9812095Z triton_mm_49 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:50.9812934Z triton_mm_48 0.0066 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:50.9813735Z triton_mm_54 0.0066 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:50.9814662Z triton_mm_56 0.0066 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:14:50.9816212Z triton_mm_53 0.0066 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:50.9817189Z triton_mm_52 0.0066 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:14:50.9818295Z triton_mm_55 0.0066 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:14:50.9819661Z triton_mm_58 0.0066 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:14:50.9821076Z triton_mm_45 0.0067 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:14:50.9822410Z SingleProcess AUTOTUNE benchmarking takes 0.2135 seconds and 0.0003 seconds precompiling for 17 choices 2025-09-07T11:14:51.1972526Z Autotune Choices Stats: 2025-09-07T11:14:51.1973545Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_62", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.00684799998998642, "best_triton_pos": 0} 2025-09-07T11:14:51.2025265Z AUTOTUNE mm(6272x64, 64x64) 2025-09-07T11:14:51.2025551Z strides: [64, 1], [1, 64] 2025-09-07T11:14:51.2025828Z dtypes: torch.float16, torch.float16 2025-09-07T11:14:51.2026508Z triton_mm_62 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:14:51.2027523Z triton_mm_68 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:14:51.2028526Z triton_mm_70 0.0069 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:51.2029511Z triton_mm_71 0.0070 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:14:51.2030476Z triton_mm_65 0.0071 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:51.2031441Z triton_mm_78 0.0071 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:51.2032408Z triton_mm_73 0.0071 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:14:51.2033357Z triton_mm_69 0.0072 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:51.2034548Z triton_mm_67 0.0073 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:14:51.2035701Z triton_mm_64 0.0073 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:51.2036606Z SingleProcess AUTOTUNE benchmarking takes 0.2208 seconds and 0.0003 seconds precompiling for 19 choices 2025-09-07T11:14:51.4318192Z Autotune Choices Stats: 2025-09-07T11:14:51.4319892Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_80", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006496000103652477, "best_triton_pos": 0} 2025-09-07T11:14:51.4372739Z AUTOTUNE mm(1568x64, 64x128) 2025-09-07T11:14:51.4373014Z strides: [64, 1], [1, 64] 2025-09-07T11:14:51.4373288Z dtypes: torch.float16, torch.float16 2025-09-07T11:14:51.4373963Z triton_mm_80 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:14:51.4375352Z triton_mm_81 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:51.4376337Z triton_mm_86 0.0066 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:14:51.4377318Z triton_mm_87 0.0067 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:51.4378275Z triton_mm_82 0.0067 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:51.4379254Z triton_mm_83 0.0068 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:51.4380203Z triton_mm_93 0.0069 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:14:51.4381161Z triton_mm_88 0.0069 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:51.4382212Z triton_mm_91 0.0070 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:14:51.4383164Z triton_mm_92 0.0070 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:51.4384036Z SingleProcess AUTOTUNE benchmarking takes 0.2340 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:14:51.6724373Z Autotune Choices Stats: 2025-09-07T11:14:51.6725675Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_101", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.007104000076651573, "best_triton_pos": 0} 2025-09-07T11:14:51.6780621Z AUTOTUNE mm(1568x128, 128x128) 2025-09-07T11:14:51.6781081Z strides: [128, 1], [1, 128] 2025-09-07T11:14:51.6781583Z dtypes: torch.float16, torch.float16 2025-09-07T11:14:51.6782567Z triton_mm_101 0.0071 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:51.6784040Z triton_mm_100 0.0072 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:51.6785729Z triton_mm_105 0.0073 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:14:51.6787167Z triton_mm_99 0.0074 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:14:51.6789162Z triton_mm_109 0.0076 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:51.6790923Z triton_mm_110 0.0076 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:14:51.6792430Z triton_mm_108 0.0076 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:14:51.6793871Z triton_mm_112 0.0077 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:14:51.6795587Z triton_mm_106 0.0077 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:51.6797048Z triton_mm_107 0.0078 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:51.6798298Z SingleProcess AUTOTUNE benchmarking takes 0.2401 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:14:51.8874078Z Autotune Choices Stats: 2025-09-07T11:14:51.8875709Z {"num_choices": 15, "num_triton_choices": 13, "best_kernel": "triton_mm_210", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.0058559998869895935, "best_triton_pos": 0} 2025-09-07T11:14:51.8929929Z AUTOTUNE addmm(8x128, 8x32, 32x128) 2025-09-07T11:14:51.8930201Z strides: [0, 1], [32, 1], [1, 32] 2025-09-07T11:14:51.8930516Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:14:51.8931237Z triton_mm_210 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:51.8932222Z triton_mm_205 0.0060 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:14:51.8933193Z triton_mm_207 0.0060 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:14:51.8934152Z triton_mm_206 0.0060 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:51.8935270Z triton_mm_214 0.0061 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:14:51.8936227Z triton_mm_212 0.0061 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:14:51.8937191Z triton_mm_215 0.0061 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:14:51.8938144Z triton_mm_216 0.0061 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:51.8939096Z triton_mm_209 0.0062 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:14:51.8940049Z triton_mm_204 0.0064 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:14:51.8941221Z SingleProcess AUTOTUNE benchmarking takes 0.2074 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T11:14:52.1232845Z Autotune Choices Stats: 2025-09-07T11:14:52.1234341Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_219", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.006719999946653843, "best_triton_pos": 0} 2025-09-07T11:14:52.1286212Z AUTOTUNE mm(392x128, 128x256) 2025-09-07T11:14:52.1286487Z strides: [128, 1], [1, 128] 2025-09-07T11:14:52.1286756Z dtypes: torch.float16, torch.float16 2025-09-07T11:14:52.1287432Z triton_mm_219 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:52.1288474Z triton_mm_228 0.0068 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:52.1289466Z triton_mm_220 0.0068 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:52.1290451Z triton_mm_224 0.0069 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:14:52.1291432Z triton_mm_218 0.0070 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:14:52.1292049Z mm 0.0073 ms 92.5% 2025-09-07T11:14:52.1292646Z triton_mm_226 0.0074 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:52.1293641Z triton_mm_227 0.0074 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:14:52.1294694Z triton_mm_231 0.0074 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:14:52.1295841Z triton_mm_225 0.0074 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:52.1296681Z SingleProcess AUTOTUNE benchmarking takes 0.2349 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:14:52.3710985Z Autotune Choices Stats: 2025-09-07T11:14:52.3712863Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_256", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.006271999794989824, "best_triton_pos": 0} 2025-09-07T11:14:52.3770708Z AUTOTUNE addmm(8x256, 8x64, 64x256) 2025-09-07T11:14:52.3770986Z strides: [0, 1], [64, 1], [1, 64] 2025-09-07T11:14:52.3771275Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:14:52.3771880Z triton_mm_256 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:52.3772751Z triton_mm_252 0.0065 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:14:52.3773588Z triton_mm_265 0.0065 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:52.3774416Z triton_mm_263 0.0065 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:14:52.3776041Z triton_mm_257 0.0066 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:52.3776871Z triton_mm_251 0.0066 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:52.3777690Z triton_mm_250 0.0066 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:14:52.3778506Z triton_mm_253 0.0066 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:14:52.3779329Z triton_mm_262 0.0066 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:52.3780149Z triton_mm_255 0.0068 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:14:52.3780886Z SingleProcess AUTOTUNE benchmarking takes 0.2477 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:14:52.6143121Z Autotune Choices Stats: 2025-09-07T11:14:52.6144208Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_270", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.00684799998998642, "best_triton_pos": 0} 2025-09-07T11:14:52.6197883Z AUTOTUNE mm(392x256, 256x256) 2025-09-07T11:14:52.6198190Z strides: [256, 1], [1, 256] 2025-09-07T11:14:52.6198531Z dtypes: torch.float16, torch.float16 2025-09-07T11:14:52.6199254Z triton_mm_270 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:52.6200320Z triton_mm_267 0.0073 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:14:52.6201354Z triton_mm_274 0.0074 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:52.6202381Z triton_mm_273 0.0074 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:14:52.6203029Z mm 0.0075 ms 91.1% 2025-09-07T11:14:52.6203639Z triton_mm_269 0.0076 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:52.6204676Z triton_mm_268 0.0076 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:52.6205870Z triton_mm_277 0.0077 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:52.6206831Z triton_mm_278 0.0078 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:14:52.6207785Z triton_mm_276 0.0080 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:14:52.6209003Z SingleProcess AUTOTUNE benchmarking takes 0.2421 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:14:52.8653341Z Autotune Choices Stats: 2025-09-07T11:14:52.8654699Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_306", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.009503999724984169, "best_triton_pos": 0} 2025-09-07T11:14:52.8713448Z AUTOTUNE addmm(8x1000, 8x1280, 1280x1000) 2025-09-07T11:14:52.8713916Z strides: [0, 1], [1280, 1], [1, 1280] 2025-09-07T11:14:52.8714379Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:14:52.8715655Z triton_mm_306 0.0095 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:14:52.8716633Z bias_addmm 0.0099 ms 96.1% 2025-09-07T11:14:52.8717563Z triton_mm_310 0.0101 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:52.8719031Z triton_mm_314 0.0114 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:14:52.8720308Z triton_mm_318 0.0124 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:14:52.8721652Z triton_mm_305 0.0132 ms 72.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:14:52.8722515Z addmm 0.0135 ms 70.5% 2025-09-07T11:14:52.8723384Z triton_mm_304 0.0140 ms 68.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:14:52.8724850Z triton_mm_309 0.0140 ms 67.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:14:52.8726511Z triton_mm_303 0.0145 ms 65.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:14:52.8727753Z SingleProcess AUTOTUNE benchmarking takes 0.2508 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:15:02.4304134Z Autotune Choices Stats: 2025-09-07T11:15:02.4306280Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_343", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006783999968320131, "best_triton_pos": 0} 2025-09-07T11:15:02.4362544Z AUTOTUNE mm(1000x8, 8x1280) 2025-09-07T11:15:02.4362959Z strides: [1, 1000], [1280, 1] 2025-09-07T11:15:02.4363391Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:02.4364528Z triton_mm_343 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:02.4366440Z triton_mm_341 0.0068 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:02.4367997Z triton_mm_346 0.0068 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:02.4368898Z triton_mm_348 0.0069 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:02.4370224Z triton_mm_344 0.0069 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:02.4371339Z triton_mm_340 0.0070 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:02.4372231Z triton_mm_342 0.0070 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:02.4373114Z triton_mm_345 0.0070 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:02.4373992Z triton_mm_339 0.0071 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:02.4374881Z triton_mm_347 0.0072 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:02.4375803Z SingleProcess AUTOTUNE benchmarking takes 0.1655 seconds and 0.0003 seconds precompiling for 17 choices 2025-09-07T11:15:02.8342274Z Autotune Choices Stats: 2025-09-07T11:15:02.8343600Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "mm", "best_time": 0.009216000325977802, "best_triton_pos": 1, "best_triton_time": 0.0098879998549819, "best_triton_kernel": "triton_mm_327", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:15:02.8399103Z AUTOTUNE mm(8x1000, 1000x1280) 2025-09-07T11:15:02.8399375Z strides: [1000, 1], [1280, 1] 2025-09-07T11:15:02.8399664Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:02.8399980Z mm 0.0092 ms 100.0% 2025-09-07T11:15:02.8400600Z triton_mm_327 0.0099 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:02.8401624Z triton_mm_323 0.0101 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:15:02.8402603Z triton_mm_331 0.0103 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:02.8403594Z triton_mm_321 0.0116 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:02.8404558Z triton_mm_335 0.0116 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:02.8405723Z triton_mm_322 0.0117 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:15:02.8406679Z triton_mm_326 0.0123 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:02.8407725Z triton_mm_333 0.0127 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:02.8408608Z triton_mm_330 0.0130 ms 71.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:02.8409387Z SingleProcess AUTOTUNE benchmarking takes 0.1919 seconds and 0.0003 seconds precompiling for 18 choices 2025-09-07T11:15:08.0771025Z pass 2025-09-07T11:15:12.4687576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:15:12.4689546Z import pynvml # type: ignore[import] 2025-09-07T11:15:15.4689584Z 2025-09-07T11:15:17.3214290Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:15:17.3214665Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:15:17.3215521Z cuda train levit_128 2025-09-07T11:15:50.8418254Z Autotune Choices Stats: 2025-09-07T11:15:50.8419370Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_1477", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.0071680000983178616, "best_triton_pos": 0} 2025-09-07T11:15:50.8541920Z AUTOTUNE mm(8x384, 384x1000) 2025-09-07T11:15:50.8542211Z strides: [384, 1], [1, 384] 2025-09-07T11:15:50.8542470Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:50.8543180Z triton_mm_1477 0.0072 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:15:50.8544197Z triton_mm_1481 0.0074 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:50.8544821Z mm 0.0075 ms 95.3% 2025-09-07T11:15:50.8547725Z triton_mm_1476 0.0078 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:15:50.8548704Z triton_mm_1485 0.0081 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:50.8549714Z triton_mm_1489 0.0081 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:50.8550715Z triton_mm_1475 0.0081 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:50.8551806Z triton_mm_1480 0.0082 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:50.8552764Z triton_mm_1474 0.0083 ms 86.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:15:50.8553722Z triton_mm_1487 0.0086 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:50.8554571Z SingleProcess AUTOTUNE benchmarking takes 0.2276 seconds and 0.0004 seconds precompiling for 18 choices 2025-09-07T11:15:51.6650749Z Autotune Choices Stats: 2025-09-07T11:15:51.6651919Z {"num_choices": 6, "num_triton_choices": 5, "best_kernel": "triton_convolution2d_1", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.014336000196635723, "best_triton_pos": 0} 2025-09-07T11:15:51.6706844Z AUTOTUNE convolution(8x3x224x224, 16x3x3x3) 2025-09-07T11:15:51.6707174Z strides: [150528, 50176, 224, 1], [27, 9, 3, 1] 2025-09-07T11:15:51.6707480Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:51.6708268Z triton_convolution2d_1 0.0143 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:15:51.6710057Z triton_convolution2d_4 0.0148 ms 97.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:15:51.6711410Z triton_convolution2d_3 0.0162 ms 88.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:15:51.6712665Z triton_convolution2d_2 0.0172 ms 83.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:15:51.6713887Z triton_convolution2d_0 0.0173 ms 82.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:15:51.6714635Z convolution 0.0201 ms 71.2% 2025-09-07T11:15:51.6715279Z SingleProcess AUTOTUNE benchmarking takes 0.0821 seconds and 0.0002 seconds precompiling for 6 choices 2025-09-07T11:15:51.7591779Z Autotune Choices Stats: 2025-09-07T11:15:51.7592839Z {"num_choices": 7, "num_triton_choices": 6, "best_kernel": "triton_convolution2d_5", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.010623999871313572, "best_triton_pos": 0} 2025-09-07T11:15:51.7643622Z AUTOTUNE convolution(8x16x112x112, 32x16x3x3) 2025-09-07T11:15:51.7643963Z strides: [200704, 12544, 112, 1], [144, 9, 3, 1] 2025-09-07T11:15:51.7644277Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:51.7645344Z triton_convolution2d_5 0.0106 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:15:51.7646594Z triton_convolution2d_9 0.0108 ms 98.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:15:51.7647808Z triton_convolution2d_8 0.0111 ms 95.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:15:51.7649026Z triton_convolution2d_6 0.0128 ms 83.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:15:51.7650244Z triton_convolution2d_10 0.0130 ms 81.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:15:51.7651029Z convolution 0.0251 ms 42.4% 2025-09-07T11:15:51.7651795Z triton_convolution2d_7 0.0371 ms 28.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:15:51.7652686Z SingleProcess AUTOTUNE benchmarking takes 0.0932 seconds and 0.0002 seconds precompiling for 7 choices 2025-09-07T11:15:51.8688249Z Autotune Choices Stats: 2025-09-07T11:15:51.8689364Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_16", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.011296000331640244, "best_triton_pos": 0} 2025-09-07T11:15:51.8742944Z AUTOTUNE convolution(8x32x56x56, 64x32x3x3) 2025-09-07T11:15:51.8743269Z strides: [100352, 3136, 56, 1], [288, 9, 3, 1] 2025-09-07T11:15:51.8743579Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:51.8744573Z triton_convolution2d_16 0.0113 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:15:51.8746337Z triton_convolution2d_15 0.0122 ms 92.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:15:51.8747570Z triton_convolution2d_14 0.0132 ms 85.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:15:51.8748801Z triton_convolution2d_11 0.0141 ms 80.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:15:51.8750037Z triton_convolution2d_17 0.0166 ms 68.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:15:51.8751360Z triton_convolution2d_12 0.0198 ms 57.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:15:51.8752170Z convolution 0.0206 ms 54.7% 2025-09-07T11:15:51.8752899Z triton_convolution2d_13 0.0524 ms 21.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:15:51.8753873Z SingleProcess AUTOTUNE benchmarking takes 0.1094 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:15:51.9783581Z Autotune Choices Stats: 2025-09-07T11:15:51.9784519Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_22", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.01744000054895878, "best_triton_pos": 0} 2025-09-07T11:15:51.9835267Z AUTOTUNE convolution(8x64x28x28, 128x64x3x3) 2025-09-07T11:15:51.9836232Z strides: [50176, 784, 28, 1], [576, 9, 3, 1] 2025-09-07T11:15:51.9836634Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:51.9837436Z triton_convolution2d_22 0.0174 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:15:51.9838205Z convolution 0.0204 ms 85.4% 2025-09-07T11:15:51.9838976Z triton_convolution2d_23 0.0226 ms 77.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:15:51.9840224Z triton_convolution2d_21 0.0261 ms 66.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:15:51.9841449Z triton_convolution2d_24 0.0272 ms 64.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:15:51.9842635Z triton_convolution2d_18 0.0286 ms 61.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:15:51.9844150Z triton_convolution2d_19 0.0348 ms 50.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:15:51.9845666Z triton_convolution2d_20 0.0748 ms 23.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:15:51.9846584Z SingleProcess AUTOTUNE benchmarking takes 0.1087 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:15:52.2136625Z Autotune Choices Stats: 2025-09-07T11:15:52.2137633Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_32", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.006912000011652708, "best_triton_pos": 0} 2025-09-07T11:15:52.2189625Z AUTOTUNE mm(1568x128, 128x256) 2025-09-07T11:15:52.2189890Z strides: [128, 1], [1, 128] 2025-09-07T11:15:52.2190168Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:52.2190861Z triton_mm_32 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:52.2191955Z triton_mm_36 0.0071 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:52.2192917Z triton_mm_26 0.0072 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:52.2193869Z triton_mm_33 0.0073 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:52.2194833Z triton_mm_34 0.0073 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:52.2196098Z triton_mm_39 0.0073 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:52.2197053Z triton_mm_35 0.0074 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:52.2198023Z triton_mm_37 0.0074 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:52.2198981Z triton_mm_28 0.0075 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:52.2199943Z triton_mm_38 0.0075 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:52.2200791Z SingleProcess AUTOTUNE benchmarking takes 0.2349 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:15:52.4144178Z Autotune Choices Stats: 2025-09-07T11:15:52.4146215Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_bmm_50", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007327999919652939, "best_triton_pos": 0} 2025-09-07T11:15:52.4197122Z AUTOTUNE bmm(32x196x16, 32x16x196) 2025-09-07T11:15:52.4197388Z strides: [3136, 16, 1], [3136, 196, 1] 2025-09-07T11:15:52.4197665Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:52.4198286Z triton_bmm_50 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:52.4199623Z triton_bmm_51 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:52.4200593Z triton_bmm_54 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:52.4201552Z triton_bmm_52 0.0074 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:52.4202497Z triton_bmm_53 0.0074 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:52.4203385Z triton_bmm_59 0.0074 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:52.4204278Z triton_bmm_49 0.0074 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:52.4205325Z triton_bmm_57 0.0074 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:15:52.4206242Z triton_bmm_58 0.0074 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:52.4207132Z triton_bmm_44 0.0075 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:15:52.4207917Z SingleProcess AUTOTUNE benchmarking takes 0.2002 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T11:15:52.6280553Z Autotune Choices Stats: 2025-09-07T11:15:52.6281629Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_70", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.008383999578654766, "best_triton_pos": 0} 2025-09-07T11:15:52.6335737Z AUTOTUNE bmm(32x196x196, 32x196x32) 2025-09-07T11:15:52.6336034Z strides: [38464, 196, 1], [6272, 32, 1] 2025-09-07T11:15:52.6336327Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:52.6336997Z triton_bmm_70 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:52.6338000Z triton_bmm_71 0.0084 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:52.6338979Z triton_bmm_62 0.0084 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:52.6339988Z triton_bmm_63 0.0085 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:52.6340973Z triton_bmm_64 0.0086 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:52.6342041Z triton_bmm_67 0.0086 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:52.6342873Z triton_bmm_69 0.0086 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:52.6344119Z triton_bmm_61 0.0090 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:52.6345218Z triton_bmm_68 0.0090 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:52.6346076Z triton_bmm_72 0.0094 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:52.6346817Z SingleProcess AUTOTUNE benchmarking takes 0.2132 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:15:52.8658132Z Autotune Choices Stats: 2025-09-07T11:15:52.8659146Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_119", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007296000141650438, "best_triton_pos": 0} 2025-09-07T11:15:52.8712208Z AUTOTUNE mm(1568x256, 256x128) 2025-09-07T11:15:52.8712449Z strides: [256, 1], [1, 256] 2025-09-07T11:15:52.8712698Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:52.8713355Z triton_mm_119 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:52.8714345Z triton_mm_123 0.0073 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:52.8715597Z triton_mm_122 0.0075 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:52.8716576Z triton_mm_118 0.0076 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:52.8717176Z mm 0.0076 ms 95.8% 2025-09-07T11:15:52.8717733Z triton_mm_117 0.0077 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:52.8718690Z triton_mm_116 0.0079 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:52.8719656Z triton_mm_126 0.0079 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:52.8720624Z triton_mm_127 0.0079 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:52.8721607Z triton_mm_125 0.0082 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:52.8722449Z SingleProcess AUTOTUNE benchmarking takes 0.2335 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:15:53.1355766Z Autotune Choices Stats: 2025-09-07T11:15:53.1356738Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_472", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.007360000163316727, "best_triton_pos": 0} 2025-09-07T11:15:53.1409354Z AUTOTUNE mm(1568x128, 128x640) 2025-09-07T11:15:53.1409620Z strides: [128, 1], [1, 128] 2025-09-07T11:15:53.1410172Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:53.1410848Z triton_mm_472 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:53.1412176Z triton_mm_475 0.0075 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:53.1413642Z triton_mm_471 0.0075 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:53.1414586Z mm 0.0076 ms 97.0% 2025-09-07T11:15:53.1415936Z triton_mm_468 0.0076 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:53.1417396Z triton_mm_474 0.0076 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:53.1418882Z triton_mm_470 0.0076 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:53.1420335Z triton_mm_473 0.0077 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:53.1421927Z triton_mm_479 0.0079 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:53.1422836Z triton_mm_469 0.0079 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:53.1423625Z SingleProcess AUTOTUNE benchmarking takes 0.2373 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:15:53.3729283Z Autotune Choices Stats: 2025-09-07T11:15:53.3730272Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_483", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.006624000146985054, "best_triton_pos": 0} 2025-09-07T11:15:53.3784310Z AUTOTUNE mm(392x128, 128x128) 2025-09-07T11:15:53.3784686Z strides: [128, 1], [1, 128] 2025-09-07T11:15:53.3785500Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:53.3786567Z triton_mm_483 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:53.3788359Z triton_mm_481 0.0067 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:53.3790139Z triton_mm_487 0.0067 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:53.3791960Z triton_mm_482 0.0068 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:53.3793254Z triton_mm_491 0.0068 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:53.3793890Z mm 0.0070 ms 95.0% 2025-09-07T11:15:53.3794522Z triton_mm_494 0.0070 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:53.3795773Z triton_mm_488 0.0070 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:53.3797147Z triton_mm_490 0.0071 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:53.3798123Z triton_mm_492 0.0071 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:53.3798980Z SingleProcess AUTOTUNE benchmarking takes 0.2369 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:15:53.5645978Z Autotune Choices Stats: 2025-09-07T11:15:53.5647490Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_bmm_504", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.0066559999249875546, "best_triton_pos": 0} 2025-09-07T11:15:53.5703332Z AUTOTUNE bmm(64x49x16, 64x16x196) 2025-09-07T11:15:53.5703749Z strides: [784, 16, 1], [3136, 196, 1] 2025-09-07T11:15:53.5704182Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:53.5705525Z triton_bmm_504 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:53.5707026Z triton_bmm_506 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:53.5708487Z triton_bmm_508 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:53.5709950Z triton_bmm_509 0.0067 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:53.5711433Z triton_bmm_512 0.0067 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:15:53.5712864Z triton_bmm_500 0.0067 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:53.5713830Z triton_bmm_501 0.0067 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:53.5714782Z triton_bmm_505 0.0067 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:53.5715850Z triton_bmm_507 0.0067 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:53.5716823Z triton_bmm_510 0.0067 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:53.5717670Z SingleProcess AUTOTUNE benchmarking takes 0.1913 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T11:15:53.7579507Z Autotune Choices Stats: 2025-09-07T11:15:53.7580991Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_bmm_517", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.008383999578654766, "best_triton_pos": 0} 2025-09-07T11:15:53.7646039Z AUTOTUNE bmm(64x49x196, 64x196x64) 2025-09-07T11:15:53.7646453Z strides: [9664, 196, 1], [12544, 64, 1] 2025-09-07T11:15:53.7646886Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:53.7648334Z triton_bmm_517 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:53.7650164Z triton_bmm_526 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:53.7651704Z triton_bmm_525 0.0084 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:53.7652969Z triton_bmm_516 0.0084 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:53.7653874Z triton_bmm_518 0.0084 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:53.7654783Z triton_bmm_521 0.0087 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:53.7655952Z triton_bmm_522 0.0087 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:53.7656856Z triton_bmm_524 0.0087 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:53.7657756Z triton_bmm_528 0.0088 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:53.7658661Z triton_bmm_515 0.0089 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:53.7659458Z SingleProcess AUTOTUNE benchmarking takes 0.1936 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T11:15:53.9992290Z Autotune Choices Stats: 2025-09-07T11:15:53.9993312Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_533", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007519999984651804, "best_triton_pos": 0} 2025-09-07T11:15:54.0047584Z AUTOTUNE mm(392x512, 512x256) 2025-09-07T11:15:54.0047988Z strides: [512, 1], [1, 512] 2025-09-07T11:15:54.0048423Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:54.0049430Z triton_mm_533 0.0075 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:54.0050959Z triton_mm_537 0.0077 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:54.0051990Z mm 0.0080 ms 93.6% 2025-09-07T11:15:54.0052781Z triton_mm_541 0.0084 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:54.0053679Z triton_mm_532 0.0087 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:54.0054559Z triton_mm_536 0.0088 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:54.0055729Z triton_mm_531 0.0088 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:54.0056998Z triton_mm_530 0.0090 ms 83.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:54.0057901Z triton_mm_540 0.0092 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:54.0058797Z triton_mm_547 0.0098 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:54.0059579Z SingleProcess AUTOTUNE benchmarking takes 0.2396 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:15:54.2372475Z Autotune Choices Stats: 2025-09-07T11:15:54.2374026Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_552", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006976000033318996, "best_triton_pos": 0} 2025-09-07T11:15:54.2430273Z AUTOTUNE mm(392x256, 256x512) 2025-09-07T11:15:54.2430545Z strides: [256, 1], [1, 256] 2025-09-07T11:15:54.2430816Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:54.2431484Z triton_mm_552 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:54.2432649Z triton_mm_556 0.0071 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:54.2433605Z triton_mm_551 0.0074 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:54.2434560Z triton_mm_555 0.0074 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:54.2435296Z mm 0.0075 ms 93.2% 2025-09-07T11:15:54.2435855Z triton_mm_550 0.0075 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:54.2436810Z triton_mm_559 0.0077 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:54.2437770Z triton_mm_549 0.0077 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:54.2438737Z triton_mm_560 0.0078 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:54.2439706Z triton_mm_558 0.0081 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:54.2440558Z SingleProcess AUTOTUNE benchmarking takes 0.2376 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:15:54.4067685Z Autotune Choices Stats: 2025-09-07T11:15:54.4068770Z {"num_choices": 14, "num_triton_choices": 13, "best_kernel": "triton_bmm_606", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006335999816656113, "best_triton_pos": 0} 2025-09-07T11:15:54.4126474Z AUTOTUNE bmm(64x49x16, 64x16x49) 2025-09-07T11:15:54.4126854Z strides: [784, 16, 1], [784, 49, 1] 2025-09-07T11:15:54.4127137Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:54.4128265Z triton_bmm_606 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:54.4129478Z triton_bmm_607 0.0064 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:54.4130438Z triton_bmm_608 0.0064 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:54.4131418Z triton_bmm_609 0.0065 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:54.4132475Z triton_bmm_605 0.0067 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:15:54.4133338Z triton_bmm_614 0.0067 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:54.4134192Z triton_bmm_617 0.0067 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:54.4135328Z triton_bmm_611 0.0067 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:54.4136167Z triton_bmm_610 0.0068 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:54.4137001Z triton_bmm_612 0.0068 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:54.4137738Z SingleProcess AUTOTUNE benchmarking takes 0.1654 seconds and 0.0002 seconds precompiling for 14 choices 2025-09-07T11:15:54.5869481Z Autotune Choices Stats: 2025-09-07T11:15:54.5870490Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_bmm_620", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006943999789655209, "best_triton_pos": 0} 2025-09-07T11:15:54.5929306Z AUTOTUNE bmm(64x49x49, 64x49x32) 2025-09-07T11:15:54.5929587Z strides: [2432, 49, 1], [1600, 32, 1] 2025-09-07T11:15:54.5929910Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:54.5930576Z triton_bmm_620 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:54.5931572Z triton_bmm_625 0.0070 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:54.5932651Z triton_bmm_631 0.0071 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:54.5933536Z triton_bmm_621 0.0071 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:54.5934432Z triton_bmm_627 0.0071 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:54.5935625Z triton_bmm_630 0.0071 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:15:54.5936851Z triton_bmm_622 0.0072 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:54.5937937Z triton_bmm_629 0.0072 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:54.5938835Z triton_bmm_624 0.0072 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:54.5939727Z triton_bmm_628 0.0073 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:54.5940509Z SingleProcess AUTOTUNE benchmarking takes 0.1798 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T11:15:54.8656743Z Autotune Choices Stats: 2025-09-07T11:15:54.8658031Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.0077760000713169575, "best_triton_pos": 1, "best_triton_time": 0.00800000037997961, "best_triton_kernel": "triton_mm_1010", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T11:15:54.8712860Z AUTOTUNE mm(392x256, 256x1280) 2025-09-07T11:15:54.8713133Z strides: [256, 1], [1, 256] 2025-09-07T11:15:54.8713401Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:54.8713662Z mm 0.0078 ms 100.0% 2025-09-07T11:15:54.8714257Z triton_mm_1010 0.0080 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:54.8715525Z triton_mm_1005 0.0080 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:54.8716505Z triton_mm_1009 0.0081 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:54.8717476Z triton_mm_1008 0.0083 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:54.8718436Z triton_mm_1012 0.0085 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:54.8719402Z triton_mm_999 0.0087 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:54.8720360Z triton_mm_1000 0.0088 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:54.8721340Z triton_mm_1011 0.0091 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:54.8722314Z triton_mm_1007 0.0091 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:54.8723164Z SingleProcess AUTOTUNE benchmarking takes 0.2392 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:15:55.1052477Z Autotune Choices Stats: 2025-09-07T11:15:55.1053484Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1021", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006752000190317631, "best_triton_pos": 0} 2025-09-07T11:15:55.1110840Z AUTOTUNE mm(128x256, 256x256) 2025-09-07T11:15:55.1111329Z strides: [256, 1], [1, 256] 2025-09-07T11:15:55.1111743Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:55.1113288Z triton_mm_1021 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:55.1114329Z triton_mm_1025 0.0071 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:55.1115639Z triton_mm_1020 0.0072 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:55.1116619Z triton_mm_1024 0.0072 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:55.1117587Z triton_mm_1019 0.0073 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:55.1118557Z triton_mm_1018 0.0075 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:55.1119520Z triton_mm_1028 0.0075 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:55.1120131Z mm 0.0076 ms 89.0% 2025-09-07T11:15:55.1120702Z triton_mm_1029 0.0076 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:55.1121690Z triton_mm_1027 0.0078 ms 86.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:55.1122554Z SingleProcess AUTOTUNE benchmarking takes 0.2379 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:15:55.2107755Z Autotune Choices Stats: 2025-09-07T11:15:55.2108647Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_bmm_1040", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006271999794989824, "best_triton_pos": 0} 2025-09-07T11:15:55.2165819Z AUTOTUNE bmm(128x16x16, 128x16x49) 2025-09-07T11:15:55.2166091Z strides: [256, 16, 1], [784, 49, 1] 2025-09-07T11:15:55.2166369Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:55.2167018Z triton_bmm_1040 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:55.2168015Z triton_bmm_1041 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:55.2169012Z triton_bmm_1038 0.0063 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:55.2170028Z triton_bmm_1042 0.0063 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:55.2171026Z triton_bmm_1039 0.0067 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:15:55.2172027Z triton_bmm_1036 0.0067 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:15:55.2173300Z triton_bmm_1037 0.0068 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:15:55.2174011Z bmm 0.0074 ms 84.5% 2025-09-07T11:15:55.2174418Z SingleProcess AUTOTUNE benchmarking takes 0.1050 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:15:55.3703003Z Autotune Choices Stats: 2025-09-07T11:15:55.3704533Z {"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_bmm_1046", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006719999946653843, "best_triton_pos": 0} 2025-09-07T11:15:55.3766702Z AUTOTUNE bmm(128x16x49, 128x49x64) 2025-09-07T11:15:55.3767128Z strides: [784, 49, 1], [3136, 64, 1] 2025-09-07T11:15:55.3767550Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:55.3768612Z triton_bmm_1046 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:15:55.3770118Z triton_bmm_1053 0.0068 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:55.3771577Z triton_bmm_1045 0.0068 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:55.3773069Z triton_bmm_1052 0.0069 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:55.3774034Z triton_bmm_1050 0.0070 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:55.3775169Z triton_bmm_1054 0.0071 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:55.3776145Z triton_bmm_1044 0.0071 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:15:55.3777106Z triton_bmm_1051 0.0071 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:55.3778072Z triton_bmm_1049 0.0072 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:55.3779037Z triton_bmm_1047 0.0072 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:15:55.3779889Z SingleProcess AUTOTUNE benchmarking takes 0.1594 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T11:15:55.6086975Z Autotune Choices Stats: 2025-09-07T11:15:55.6088498Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1059", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.00848000030964613, "best_triton_pos": 0} 2025-09-07T11:15:55.6144469Z AUTOTUNE mm(128x1024, 1024x384) 2025-09-07T11:15:55.6144695Z strides: [1024, 1], [1, 1024] 2025-09-07T11:15:55.6145217Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:55.6145813Z triton_mm_1059 0.0085 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:55.6146649Z mm 0.0087 ms 97.8% 2025-09-07T11:15:55.6147187Z triton_mm_1063 0.0088 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:55.6148238Z triton_mm_1067 0.0101 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:55.6149144Z triton_mm_1058 0.0115 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:55.6150029Z triton_mm_1062 0.0118 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:55.6150914Z triton_mm_1057 0.0120 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:55.6151816Z triton_mm_1073 0.0122 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:55.6152791Z triton_mm_1066 0.0124 ms 68.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:55.6153754Z triton_mm_1056 0.0124 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:55.6154588Z SingleProcess AUTOTUNE benchmarking takes 0.2372 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:15:55.8410668Z Autotune Choices Stats: 2025-09-07T11:15:55.8411672Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1078", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007296000141650438, "best_triton_pos": 0} 2025-09-07T11:15:55.8476868Z AUTOTUNE mm(128x384, 384x768) 2025-09-07T11:15:55.8477117Z strides: [384, 1], [1, 384] 2025-09-07T11:15:55.8477362Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:55.8478021Z triton_mm_1078 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:55.8478660Z mm 0.0075 ms 97.0% 2025-09-07T11:15:55.8479241Z triton_mm_1082 0.0076 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:55.8480234Z triton_mm_1077 0.0082 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:55.8481201Z triton_mm_1076 0.0083 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:55.8482174Z triton_mm_1086 0.0083 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:55.8483154Z triton_mm_1075 0.0084 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:55.8484065Z triton_mm_1081 0.0084 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:55.8485278Z triton_mm_1085 0.0087 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:55.8486532Z triton_mm_1084 0.0091 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:55.8487327Z SingleProcess AUTOTUNE benchmarking takes 0.2326 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:15:56.0794614Z Autotune Choices Stats: 2025-09-07T11:15:56.0795716Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1097", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007840000092983246, "best_triton_pos": 0} 2025-09-07T11:15:56.0853281Z AUTOTUNE mm(128x768, 768x384) 2025-09-07T11:15:56.0853547Z strides: [768, 1], [1, 768] 2025-09-07T11:15:56.0853802Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:56.0854496Z triton_mm_1097 0.0078 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:56.0855299Z mm 0.0081 ms 96.8% 2025-09-07T11:15:56.0855890Z triton_mm_1101 0.0085 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:56.0856873Z triton_mm_1105 0.0094 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:56.0857842Z triton_mm_1096 0.0101 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:56.0858798Z triton_mm_1100 0.0105 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:56.0859760Z triton_mm_1095 0.0106 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:56.0860732Z triton_mm_1094 0.0108 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:56.0861766Z triton_mm_1104 0.0110 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:56.0862797Z triton_mm_1111 0.0110 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:56.0863538Z SingleProcess AUTOTUNE benchmarking takes 0.2371 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:15:56.1634684Z Autotune Choices Stats: 2025-09-07T11:15:56.1635772Z {"num_choices": 6, "num_triton_choices": 5, "best_kernel": "triton_bmm_1131", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=1", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-09-07T11:15:56.1694328Z AUTOTUNE bmm(96x16x16, 96x16x16) 2025-09-07T11:15:56.1694596Z strides: [256, 16, 1], [256, 16, 1] 2025-09-07T11:15:56.1694875Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:56.1695705Z triton_bmm_1131 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=1 2025-09-07T11:15:56.1696714Z triton_bmm_1133 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1 2025-09-07T11:15:56.1698005Z triton_bmm_1135 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=1 2025-09-07T11:15:56.1699162Z triton_bmm_1132 0.0060 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1 2025-09-07T11:15:56.1700146Z triton_bmm_1134 0.0060 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=1 2025-09-07T11:15:56.1700758Z bmm 0.0068 ms 86.9% 2025-09-07T11:15:56.1701218Z SingleProcess AUTOTUNE benchmarking takes 0.0816 seconds and 0.0002 seconds precompiling for 6 choices 2025-09-07T11:15:56.2458197Z Autotune Choices Stats: 2025-09-07T11:15:56.2459166Z {"num_choices": 6, "num_triton_choices": 5, "best_kernel": "triton_bmm_1136", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2", "best_time": 0.00598399993032217, "best_triton_pos": 0} 2025-09-07T11:15:56.2516771Z AUTOTUNE bmm(96x16x16, 96x16x32) 2025-09-07T11:15:56.2517041Z strides: [256, 16, 1], [512, 32, 1] 2025-09-07T11:15:56.2517322Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:56.2518007Z triton_bmm_1136 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:15:56.2519009Z triton_bmm_1137 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:15:56.2519978Z triton_bmm_1138 0.0060 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:15:56.2520947Z triton_bmm_1139 0.0060 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T11:15:56.2521917Z triton_bmm_1140 0.0061 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T11:15:56.2522527Z bmm 0.0070 ms 85.0% 2025-09-07T11:15:56.2522963Z SingleProcess AUTOTUNE benchmarking takes 0.0818 seconds and 0.0002 seconds precompiling for 6 choices 2025-09-07T11:15:56.4864806Z Autotune Choices Stats: 2025-09-07T11:15:56.4866016Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1145", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.0071680000983178616, "best_triton_pos": 0} 2025-09-07T11:15:56.4923374Z AUTOTUNE mm(128x384, 384x384) 2025-09-07T11:15:56.4923670Z strides: [384, 1], [1, 384] 2025-09-07T11:15:56.4923937Z dtypes: torch.float16, torch.float16 2025-09-07T11:15:56.4924617Z triton_mm_1145 0.0072 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:56.4925806Z triton_mm_1149 0.0073 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:15:56.4926452Z mm 0.0073 ms 98.2% 2025-09-07T11:15:56.4927033Z triton_mm_1144 0.0080 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:56.4928002Z triton_mm_1148 0.0080 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:15:56.4929415Z triton_mm_1142 0.0082 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:15:56.4930584Z triton_mm_1153 0.0082 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:15:56.4931560Z triton_mm_1143 0.0083 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:15:56.4932578Z triton_mm_1152 0.0084 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:15:56.4933585Z triton_mm_1155 0.0090 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:15:56.4934371Z SingleProcess AUTOTUNE benchmarking takes 0.2401 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:29.3267145Z Autotune Choices Stats: 2025-09-07T11:16:29.3268850Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2331", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.00687999976798892, "best_triton_pos": 0} 2025-09-07T11:16:29.3330501Z AUTOTUNE mm(384x128, 128x1024) 2025-09-07T11:16:29.3330770Z strides: [1, 384], [1024, 1] 2025-09-07T11:16:29.3331026Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:29.3331696Z triton_mm_2331 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:29.3332711Z triton_mm_2335 0.0071 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:29.3333714Z triton_mm_2332 0.0071 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:29.3334687Z triton_mm_2334 0.0072 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:29.3335999Z triton_mm_2338 0.0073 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:29.3336967Z triton_mm_2333 0.0074 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:29.3337604Z mm 0.0075 ms 91.9% 2025-09-07T11:16:29.3338210Z triton_mm_2336 0.0075 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:29.3339203Z triton_mm_2337 0.0075 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:29.3340294Z triton_mm_2327 0.0076 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:29.3348275Z SingleProcess AUTOTUNE benchmarking takes 0.1923 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:16:29.6966912Z Autotune Choices Stats: 2025-09-07T11:16:29.6968008Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_1547", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006111999973654747, "best_triton_pos": 0} 2025-09-07T11:16:29.7029001Z AUTOTUNE mm(1000x8, 8x384) 2025-09-07T11:16:29.7029581Z strides: [1, 1000], [384, 1] 2025-09-07T11:16:29.7029863Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:29.7030542Z triton_mm_1547 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:29.7031541Z triton_mm_1541 0.0061 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:29.7032671Z triton_mm_1542 0.0061 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:29.7033646Z triton_mm_1545 0.0061 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:29.7034646Z triton_mm_1546 0.0061 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:29.7035967Z triton_mm_1540 0.0062 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:16:29.7036926Z triton_mm_1544 0.0062 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:29.7037890Z triton_mm_1543 0.0063 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:29.7038893Z triton_mm_1548 0.0063 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:29.7039876Z triton_mm_1551 0.0063 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:29.7040729Z SingleProcess AUTOTUNE benchmarking takes 0.1614 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T11:16:30.0973829Z Autotune Choices Stats: 2025-09-07T11:16:30.0975310Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2450", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008031999692320824, "best_triton_pos": 0} 2025-09-07T11:16:30.1034100Z AUTOTUNE mm(1280x392, 392x256) 2025-09-07T11:16:30.1034405Z strides: [1, 1280], [256, 1] 2025-09-07T11:16:30.1034658Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:30.1035502Z triton_mm_2450 0.0080 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:30.1036153Z mm 0.0082 ms 98.4% 2025-09-07T11:16:30.1036741Z triton_mm_2454 0.0089 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:30.1037726Z triton_mm_2449 0.0089 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:30.1038698Z triton_mm_2452 0.0092 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:30.1040076Z triton_mm_2456 0.0092 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:30.1041257Z triton_mm_2453 0.0093 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:30.1042242Z triton_mm_2445 0.0096 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:30.1043235Z triton_mm_2460 0.0099 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:30.1044149Z triton_mm_2443 0.0102 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:30.1045081Z SingleProcess AUTOTUNE benchmarking takes 0.1962 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:30.4803499Z Autotune Choices Stats: 2025-09-07T11:16:30.4804597Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1563", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007071999832987785, "best_triton_pos": 0} 2025-09-07T11:16:30.4865421Z AUTOTUNE mm(384x128, 128x768) 2025-09-07T11:16:30.4865692Z strides: [1, 384], [768, 1] 2025-09-07T11:16:30.4865961Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:30.4866633Z triton_mm_1563 0.0071 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:30.4867663Z triton_mm_1564 0.0072 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:30.4868721Z triton_mm_1567 0.0074 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:30.4869699Z triton_mm_1566 0.0074 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:30.4870653Z triton_mm_1559 0.0076 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:30.4871608Z triton_mm_1558 0.0076 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:30.4872599Z triton_mm_1570 0.0076 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:30.4873440Z triton_mm_1569 0.0077 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:30.4874287Z triton_mm_1565 0.0077 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:30.4875262Z triton_mm_1557 0.0078 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:30.4875999Z SingleProcess AUTOTUNE benchmarking takes 0.1884 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:30.8560220Z Autotune Choices Stats: 2025-09-07T11:16:30.8561908Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1602", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007007999811321497, "best_triton_pos": 0} 2025-09-07T11:16:30.8622181Z AUTOTUNE mm(768x128, 128x384) 2025-09-07T11:16:30.8622469Z strides: [1, 768], [384, 1] 2025-09-07T11:16:30.8622765Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:30.8623448Z triton_mm_1602 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:30.8624478Z triton_mm_1601 0.0073 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:30.8625891Z triton_mm_1605 0.0073 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:30.8626555Z mm 0.0075 ms 94.0% 2025-09-07T11:16:30.8627138Z triton_mm_1597 0.0075 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:30.8628100Z triton_mm_1596 0.0075 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:30.8629063Z triton_mm_1598 0.0077 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:30.8630060Z triton_mm_1606 0.0077 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:30.8631036Z triton_mm_1595 0.0078 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:30.8632044Z triton_mm_1603 0.0078 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:30.8633010Z SingleProcess AUTOTUNE benchmarking takes 0.1886 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:31.2142219Z Autotune Choices Stats: 2025-09-07T11:16:31.2143315Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_3826", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007327999919652939, "best_triton_pos": 0} 2025-09-07T11:16:31.2209698Z AUTOTUNE mm(1568x256, 256x128) 2025-09-07T11:16:31.2210007Z strides: [256, 1], [128, 1] 2025-09-07T11:16:31.2210288Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:31.2210961Z triton_mm_3826 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:31.2211965Z triton_mm_3830 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:31.2212934Z triton_mm_3825 0.0076 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:31.2213882Z triton_mm_3829 0.0076 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:31.2214834Z triton_mm_3824 0.0077 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:31.2216577Z triton_mm_3833 0.0078 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:31.2217569Z triton_mm_3834 0.0079 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:31.2218178Z mm 0.0079 ms 92.3% 2025-09-07T11:16:31.2218750Z triton_mm_3823 0.0082 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:31.2219724Z triton_mm_3836 0.0083 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:31.2220578Z SingleProcess AUTOTUNE benchmarking takes 0.1904 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:32.0003259Z Autotune Choices Stats: 2025-09-07T11:16:32.0004430Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1634", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.006783999968320131, "best_triton_pos": 0} 2025-09-07T11:16:32.0069786Z AUTOTUNE mm(384x128, 128x384) 2025-09-07T11:16:32.0070069Z strides: [1, 384], [384, 1] 2025-09-07T11:16:32.0070341Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:32.0071014Z triton_mm_1634 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:32.0072018Z triton_mm_1635 0.0068 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:32.0073200Z triton_mm_1639 0.0068 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:32.0074161Z triton_mm_1633 0.0070 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:32.0075328Z triton_mm_1640 0.0071 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:32.0076298Z triton_mm_1643 0.0072 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:32.0077257Z triton_mm_1642 0.0072 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:32.0078232Z triton_mm_1646 0.0072 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:32.0078840Z mm 0.0073 ms 93.4% 2025-09-07T11:16:32.0079402Z triton_mm_1636 0.0073 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:32.0080239Z SingleProcess AUTOTUNE benchmarking takes 0.1901 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:32.4262393Z Autotune Choices Stats: 2025-09-07T11:16:32.4263444Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2484", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.00774399982765317, "best_triton_pos": 0} 2025-09-07T11:16:32.4326959Z AUTOTUNE mm(256x392, 392x512) 2025-09-07T11:16:32.4327244Z strides: [1, 256], [512, 1] 2025-09-07T11:16:32.4327790Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:32.4328456Z triton_mm_2484 0.0077 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:32.4329472Z triton_mm_2488 0.0077 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:32.4330094Z mm 0.0078 ms 99.6% 2025-09-07T11:16:32.4330672Z triton_mm_2483 0.0082 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:32.4331644Z triton_mm_2482 0.0084 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:32.4332663Z triton_mm_2487 0.0085 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:32.4333695Z triton_mm_2492 0.0085 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:32.4334596Z triton_mm_2490 0.0089 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:32.4335660Z triton_mm_2491 0.0090 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:32.4336564Z triton_mm_2494 0.0090 ms 86.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:32.4337354Z SingleProcess AUTOTUNE benchmarking takes 0.1968 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:32.8189274Z Autotune Choices Stats: 2025-09-07T11:16:32.8190140Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2526", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007679999805986881, "best_triton_pos": 0} 2025-09-07T11:16:32.8252895Z AUTOTUNE mm(512x392, 392x256) 2025-09-07T11:16:32.8253188Z strides: [1, 512], [256, 1] 2025-09-07T11:16:32.8253473Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:32.8254159Z triton_mm_2526 0.0077 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:32.8255613Z triton_mm_2522 0.0077 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:32.8256247Z mm 0.0078 ms 98.4% 2025-09-07T11:16:32.8256834Z triton_mm_2521 0.0082 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:32.8257810Z triton_mm_2520 0.0084 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:32.8258800Z triton_mm_2530 0.0085 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:32.8260093Z triton_mm_2525 0.0086 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:32.8261293Z triton_mm_2528 0.0089 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:32.8262371Z triton_mm_2532 0.0090 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:32.8263402Z triton_mm_2529 0.0090 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:32.8264193Z SingleProcess AUTOTUNE benchmarking takes 0.1958 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:33.1998093Z Autotune Choices Stats: 2025-09-07T11:16:33.1999185Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2744", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007615999784320593, "best_triton_pos": 0} 2025-09-07T11:16:33.2063106Z AUTOTUNE mm(392x512, 512x256) 2025-09-07T11:16:33.2063384Z strides: [512, 1], [256, 1] 2025-09-07T11:16:33.2063653Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:33.2064331Z triton_mm_2744 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:33.2065753Z triton_mm_2748 0.0078 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:33.2066382Z mm 0.0079 ms 96.7% 2025-09-07T11:16:33.2066969Z triton_mm_2752 0.0087 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:33.2067957Z triton_mm_2743 0.0087 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:33.2068921Z triton_mm_2742 0.0089 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:33.2069883Z triton_mm_2747 0.0089 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:33.2070852Z triton_mm_2751 0.0092 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:33.2071857Z triton_mm_2741 0.0095 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:33.2072909Z triton_mm_2754 0.0097 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:33.2073852Z SingleProcess AUTOTUNE benchmarking takes 0.1957 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:34.0092969Z Autotune Choices Stats: 2025-09-07T11:16:34.0094308Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.009696000255644321, "best_triton_pos": 1, "best_triton_time": 0.010623999871313572, "best_triton_kernel": "triton_mm_3512", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:16:34.0159881Z AUTOTUNE mm(640x1568, 1568x128) 2025-09-07T11:16:34.0160150Z strides: [1, 640], [128, 1] 2025-09-07T11:16:34.0160403Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:34.0160664Z mm 0.0097 ms 100.0% 2025-09-07T11:16:34.0161534Z triton_mm_3512 0.0106 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:34.0162564Z triton_mm_3516 0.0109 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:34.0163558Z triton_mm_3520 0.0129 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:34.0164512Z triton_mm_3511 0.0138 ms 70.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:34.0165589Z triton_mm_3510 0.0149 ms 64.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:34.0166494Z triton_mm_3515 0.0157 ms 61.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:34.0167394Z triton_mm_3519 0.0163 ms 59.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:34.0168287Z triton_mm_3518 0.0165 ms 58.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:34.0169176Z triton_mm_3522 0.0166 ms 58.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:34.0169965Z SingleProcess AUTOTUNE benchmarking takes 0.2188 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:34.3839535Z Autotune Choices Stats: 2025-09-07T11:16:34.3840563Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2405", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006624000146985054, "best_triton_pos": 0} 2025-09-07T11:16:34.3904134Z AUTOTUNE mm(256x128, 128x256) 2025-09-07T11:16:34.3904406Z strides: [1, 256], [256, 1] 2025-09-07T11:16:34.3904674Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:34.3905546Z triton_mm_2405 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:34.3906565Z triton_mm_2411 0.0067 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:34.3907552Z triton_mm_2407 0.0067 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:34.3908529Z triton_mm_2406 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:34.3909512Z triton_mm_2412 0.0068 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:34.3910480Z triton_mm_2415 0.0070 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:34.3911484Z mm 0.0071 ms 93.2% 2025-09-07T11:16:34.3912264Z triton_mm_2414 0.0072 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:34.3913317Z triton_mm_2413 0.0072 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:34.3914349Z triton_mm_2418 0.0072 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:34.3915328Z SingleProcess AUTOTUNE benchmarking takes 0.1868 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:34.5982795Z Autotune Choices Stats: 2025-09-07T11:16:34.5983897Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2560", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007584000006318092, "best_triton_pos": 0} 2025-09-07T11:16:34.6051079Z AUTOTUNE mm(256x392, 392x256) 2025-09-07T11:16:34.6051503Z strides: [1, 256], [256, 1] 2025-09-07T11:16:34.6051896Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:34.6052884Z triton_mm_2560 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:34.6053824Z mm 0.0076 ms 99.6% 2025-09-07T11:16:34.6054688Z triton_mm_2564 0.0077 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:34.6056459Z triton_mm_2559 0.0081 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:34.6057992Z triton_mm_2558 0.0082 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:34.6059439Z triton_mm_2563 0.0084 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:34.6060907Z triton_mm_2568 0.0085 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:34.6062502Z triton_mm_2567 0.0089 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:34.6064012Z triton_mm_2566 0.0089 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:34.6064928Z triton_mm_2570 0.0090 ms 84.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:34.6065867Z SingleProcess AUTOTUNE benchmarking takes 0.1938 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:34.8520680Z Autotune Choices Stats: 2025-09-07T11:16:34.8521751Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1887", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008063999935984612, "best_triton_pos": 0} 2025-09-07T11:16:34.8586948Z AUTOTUNE mm(128x768, 768x384) 2025-09-07T11:16:34.8587355Z strides: [768, 1], [384, 1] 2025-09-07T11:16:34.8587647Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:34.8588887Z triton_mm_1887 0.0081 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:34.8589795Z mm 0.0082 ms 98.1% 2025-09-07T11:16:34.8590398Z triton_mm_1891 0.0084 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:34.8591409Z triton_mm_1895 0.0092 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:34.8592395Z triton_mm_1886 0.0101 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:34.8593472Z triton_mm_1890 0.0103 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:34.8594538Z triton_mm_1885 0.0103 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:34.8595694Z triton_mm_1894 0.0106 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:34.8596669Z triton_mm_1901 0.0111 ms 72.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:34.8597641Z triton_mm_1884 0.0112 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:34.8598491Z SingleProcess AUTOTUNE benchmarking takes 0.2045 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:35.5806412Z Autotune Choices Stats: 2025-09-07T11:16:35.5807760Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.009312000125646591, "best_triton_pos": 1, "best_triton_time": 0.01027199998497963, "best_triton_kernel": "triton_mm_3550", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:16:35.5877154Z AUTOTUNE mm(128x1568, 1568x256) 2025-09-07T11:16:35.5877497Z strides: [1, 128], [256, 1] 2025-09-07T11:16:35.5877758Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:35.5878013Z mm 0.0093 ms 100.0% 2025-09-07T11:16:35.5878615Z triton_mm_3550 0.0103 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:35.5879616Z triton_mm_3554 0.0106 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:35.5880624Z triton_mm_3558 0.0132 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:35.5881598Z triton_mm_3549 0.0137 ms 68.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:35.5882577Z triton_mm_3548 0.0147 ms 63.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:35.5883543Z triton_mm_3553 0.0154 ms 60.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:35.5885086Z triton_mm_3557 0.0158 ms 58.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:35.5886298Z triton_mm_3560 0.0161 ms 57.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:35.5887262Z triton_mm_3556 0.0163 ms 57.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:35.5888099Z SingleProcess AUTOTUNE benchmarking takes 0.2188 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:16:35.9809401Z Autotune Choices Stats: 2025-09-07T11:16:35.9810589Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.009312000125646591, "best_triton_pos": 1, "best_triton_time": 0.010239999741315842, "best_triton_kernel": "triton_mm_3588", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:16:35.9876199Z AUTOTUNE mm(256x1568, 1568x128) 2025-09-07T11:16:35.9876478Z strides: [1, 256], [128, 1] 2025-09-07T11:16:35.9876751Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:35.9877008Z mm 0.0093 ms 100.0% 2025-09-07T11:16:35.9877606Z triton_mm_3588 0.0102 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:35.9878606Z triton_mm_3592 0.0107 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:35.9879587Z triton_mm_3596 0.0128 ms 72.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:35.9880569Z triton_mm_3587 0.0135 ms 69.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:35.9881528Z triton_mm_3586 0.0145 ms 64.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:35.9882498Z triton_mm_3591 0.0156 ms 59.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:35.9883491Z triton_mm_3595 0.0160 ms 58.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:35.9884460Z triton_mm_3594 0.0162 ms 57.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:35.9885758Z triton_mm_3598 0.0165 ms 56.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:35.9886606Z SingleProcess AUTOTUNE benchmarking takes 0.2172 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:36.3331840Z Autotune Choices Stats: 2025-09-07T11:16:36.3333141Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.007455999962985516, "best_triton_pos": 1, "best_triton_time": 0.007519999984651804, "best_triton_kernel": "triton_mm_3474", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:16:36.3402356Z AUTOTUNE mm(128x392, 392x128) 2025-09-07T11:16:36.3402605Z strides: [1, 128], [128, 1] 2025-09-07T11:16:36.3403166Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:36.3403442Z mm 0.0075 ms 100.0% 2025-09-07T11:16:36.3404394Z triton_mm_3474 0.0075 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:36.3405690Z triton_mm_3478 0.0076 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:36.3406693Z triton_mm_3473 0.0080 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:36.3407675Z triton_mm_3472 0.0083 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:36.3408700Z triton_mm_3477 0.0083 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:36.3409704Z triton_mm_3482 0.0083 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:36.3410691Z triton_mm_3481 0.0088 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:36.3411676Z triton_mm_3484 0.0088 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:36.3412660Z triton_mm_3480 0.0088 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:36.3413528Z SingleProcess AUTOTUNE benchmarking takes 0.1954 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:16:36.5716653Z Autotune Choices Stats: 2025-09-07T11:16:36.5717860Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.009216000325977802, "best_triton_pos": 1, "best_triton_time": 0.010239999741315842, "best_triton_kernel": "triton_mm_3630", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:16:36.5782567Z AUTOTUNE mm(128x1568, 1568x128) 2025-09-07T11:16:36.5782836Z strides: [1, 128], [128, 1] 2025-09-07T11:16:36.5783110Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:36.5783387Z mm 0.0092 ms 100.0% 2025-09-07T11:16:36.5783991Z triton_mm_3630 0.0102 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:36.5785345Z triton_mm_3626 0.0106 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:36.5786396Z triton_mm_3634 0.0124 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:36.5787387Z triton_mm_3625 0.0135 ms 68.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:36.5788351Z triton_mm_3624 0.0143 ms 64.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:36.5789311Z triton_mm_3629 0.0148 ms 62.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:36.5790545Z triton_mm_3633 0.0152 ms 60.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:36.5791715Z triton_mm_3640 0.0156 ms 59.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:36.5792730Z triton_mm_3636 0.0156 ms 58.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:36.5793598Z SingleProcess AUTOTUNE benchmarking takes 0.2153 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:37.2653577Z Autotune Choices Stats: 2025-09-07T11:16:37.2654854Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "mm", "best_time": 0.00848000030964613, "best_triton_pos": 1, "best_triton_time": 0.00886400043964386, "best_triton_kernel": "triton_mm_1494", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2"} 2025-09-07T11:16:37.2728096Z AUTOTUNE mm(8x1000, 1000x384) 2025-09-07T11:16:37.2728398Z strides: [1000, 1], [384, 1] 2025-09-07T11:16:37.2728657Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:37.2728951Z mm 0.0085 ms 100.0% 2025-09-07T11:16:37.2729556Z triton_mm_1494 0.0089 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:16:37.2730577Z triton_mm_1498 0.0093 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:37.2731569Z triton_mm_1502 0.0098 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:37.2732554Z triton_mm_1493 0.0109 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:16:37.2733521Z triton_mm_1492 0.0110 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:37.2734569Z triton_mm_1506 0.0113 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:37.2735639Z triton_mm_1497 0.0120 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:37.2736544Z triton_mm_1504 0.0125 ms 67.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:37.2737453Z triton_mm_1501 0.0126 ms 67.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:37.2738236Z SingleProcess AUTOTUNE benchmarking takes 0.1883 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:16:37.4705521Z Autotune Choices Stats: 2025-09-07T11:16:37.4706613Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1579", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007360000163316727, "best_triton_pos": 0} 2025-09-07T11:16:37.4774680Z AUTOTUNE mm(128x384, 384x768) 2025-09-07T11:16:37.4775143Z strides: [384, 1], [768, 1] 2025-09-07T11:16:37.4776065Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:37.4776762Z triton_mm_1579 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:37.4778028Z triton_mm_1583 0.0076 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:37.4778660Z mm 0.0077 ms 95.0% 2025-09-07T11:16:37.4779241Z triton_mm_1578 0.0083 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:37.4780234Z triton_mm_1577 0.0084 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:37.4781186Z triton_mm_1582 0.0084 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:37.4782269Z triton_mm_1587 0.0084 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:37.4783248Z triton_mm_1576 0.0086 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:37.4784286Z triton_mm_1586 0.0086 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:37.4785327Z triton_mm_1585 0.0090 ms 82.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:37.4786108Z SingleProcess AUTOTUNE benchmarking takes 0.2006 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:37.6669222Z Autotune Choices Stats: 2025-09-07T11:16:37.6670292Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1655", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.0072639998979866505, "best_triton_pos": 0} 2025-09-07T11:16:37.6735910Z AUTOTUNE mm(128x384, 384x384) 2025-09-07T11:16:37.6736248Z strides: [384, 1], [384, 1] 2025-09-07T11:16:37.6736532Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:37.6737257Z triton_mm_1655 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:37.6737909Z mm 0.0074 ms 97.8% 2025-09-07T11:16:37.6738500Z triton_mm_1659 0.0075 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:37.6739512Z triton_mm_1654 0.0081 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:37.6740487Z triton_mm_1663 0.0082 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:37.6741558Z triton_mm_1653 0.0082 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:37.6742551Z triton_mm_1658 0.0083 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:37.6743516Z triton_mm_1652 0.0084 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:37.6745742Z triton_mm_1662 0.0085 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:37.6746589Z triton_mm_1661 0.0089 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:37.6747318Z SingleProcess AUTOTUNE benchmarking takes 0.1931 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:16:37.7317824Z Autotune Choices Stats: 2025-09-07T11:16:37.7318772Z {"num_choices": 6, "num_triton_choices": 5, "best_kernel": "triton_bmm_1670", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2", "best_time": 0.006144000217318535, "best_triton_pos": 0} 2025-09-07T11:16:37.7384269Z AUTOTUNE bmm(96x16x16, 96x16x32) 2025-09-07T11:16:37.7384630Z strides: [256, 1, 16], [512, 32, 1] 2025-09-07T11:16:37.7385076Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:37.7385735Z triton_bmm_1670 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:16:37.7386718Z triton_bmm_1673 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T11:16:37.7387694Z triton_bmm_1674 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T11:16:37.7388678Z triton_bmm_1671 0.0062 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:16:37.7389675Z triton_bmm_1672 0.0062 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:16:37.7390284Z bmm 0.0072 ms 85.7% 2025-09-07T11:16:37.7390747Z SingleProcess AUTOTUNE benchmarking takes 0.0644 seconds and 0.0002 seconds precompiling for 6 choices 2025-09-07T11:16:37.8050762Z Autotune Choices Stats: 2025-09-07T11:16:37.8051780Z {"num_choices": 7, "num_triton_choices": 6, "best_kernel": "triton_bmm_1677", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1", "best_time": 0.006144000217318535, "best_triton_pos": 0} 2025-09-07T11:16:37.8117582Z AUTOTUNE bmm(96x16x32, 96x32x16) 2025-09-07T11:16:37.8117841Z strides: [512, 32, 1], [512, 1, 32] 2025-09-07T11:16:37.8118127Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:37.8118767Z triton_bmm_1677 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1 2025-09-07T11:16:37.8119755Z triton_bmm_1680 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=1 2025-09-07T11:16:37.8120729Z triton_bmm_1676 0.0062 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1 2025-09-07T11:16:37.8121689Z triton_bmm_1679 0.0062 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=1 2025-09-07T11:16:37.8122647Z triton_bmm_1678 0.0063 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1 2025-09-07T11:16:37.8124101Z triton_bmm_1675 0.0064 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=1 2025-09-07T11:16:37.8124714Z bmm 0.0069 ms 88.5% 2025-09-07T11:16:37.8125308Z SingleProcess AUTOTUNE benchmarking takes 0.0728 seconds and 0.0002 seconds precompiling for 7 choices 2025-09-07T11:16:37.8698506Z Autotune Choices Stats: 2025-09-07T11:16:37.8699466Z {"num_choices": 6, "num_triton_choices": 5, "best_kernel": "triton_bmm_1682", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1", "best_time": 0.00598399993032217, "best_triton_pos": 0} 2025-09-07T11:16:37.8766289Z AUTOTUNE bmm(96x16x16, 96x16x16) 2025-09-07T11:16:37.8766572Z strides: [256, 1, 16], [256, 16, 1] 2025-09-07T11:16:37.8766870Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:37.8767554Z triton_bmm_1682 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1 2025-09-07T11:16:37.8768552Z triton_bmm_1683 0.0060 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1 2025-09-07T11:16:37.8769534Z triton_bmm_1684 0.0060 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=1 2025-09-07T11:16:37.8770514Z triton_bmm_1681 0.0060 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=1 2025-09-07T11:16:37.8771482Z triton_bmm_1685 0.0061 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=1 2025-09-07T11:16:37.8772100Z bmm 0.0069 ms 86.2% 2025-09-07T11:16:37.8772563Z SingleProcess AUTOTUNE benchmarking takes 0.0643 seconds and 0.0002 seconds precompiling for 6 choices 2025-09-07T11:16:37.9346184Z Autotune Choices Stats: 2025-09-07T11:16:37.9347156Z {"num_choices": 6, "num_triton_choices": 5, "best_kernel": "triton_bmm_1690", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=1", "best_time": 0.006016000173985958, "best_triton_pos": 0} 2025-09-07T11:16:37.9413261Z AUTOTUNE bmm(96x16x16, 96x16x16) 2025-09-07T11:16:37.9413532Z strides: [256, 16, 1], [256, 1, 16] 2025-09-07T11:16:37.9413811Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:37.9414464Z triton_bmm_1690 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=1 2025-09-07T11:16:37.9415836Z triton_bmm_1686 0.0060 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=1 2025-09-07T11:16:37.9416820Z triton_bmm_1688 0.0060 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1 2025-09-07T11:16:37.9417781Z triton_bmm_1689 0.0060 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=1 2025-09-07T11:16:37.9418737Z triton_bmm_1687 0.0061 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1 2025-09-07T11:16:37.9419349Z bmm 0.0070 ms 85.8% 2025-09-07T11:16:37.9419821Z SingleProcess AUTOTUNE benchmarking takes 0.0642 seconds and 0.0002 seconds precompiling for 6 choices 2025-09-07T11:16:38.1818683Z Autotune Choices Stats: 2025-09-07T11:16:38.1820104Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2347", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007519999984651804, "best_triton_pos": 0} 2025-09-07T11:16:38.1887030Z AUTOTUNE mm(128x384, 384x1024) 2025-09-07T11:16:38.1887348Z strides: [384, 1], [1024, 1] 2025-09-07T11:16:38.1887623Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:38.1888340Z triton_mm_2347 0.0075 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:38.1889364Z triton_mm_2351 0.0076 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:38.1890015Z mm 0.0077 ms 97.1% 2025-09-07T11:16:38.1890598Z triton_mm_2346 0.0083 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:38.1891575Z triton_mm_2345 0.0084 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:38.1892553Z triton_mm_2350 0.0084 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:38.1893520Z triton_mm_2355 0.0084 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:38.1894587Z triton_mm_2354 0.0087 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:38.1895880Z triton_mm_2344 0.0087 ms 86.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:38.1896771Z triton_mm_2353 0.0092 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:38.1897550Z SingleProcess AUTOTUNE benchmarking takes 0.1926 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:16:38.3142424Z Autotune Choices Stats: 2025-09-07T11:16:38.3143429Z {"num_choices": 14, "num_triton_choices": 13, "best_kernel": "triton_bmm_2364", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.006463999859988689, "best_triton_pos": 0} 2025-09-07T11:16:38.3212294Z AUTOTUNE bmm(128x49x16, 128x16x64) 2025-09-07T11:16:38.3212577Z strides: [784, 1, 49], [1024, 64, 1] 2025-09-07T11:16:38.3212857Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:38.3213573Z triton_bmm_2364 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:38.3214590Z triton_bmm_2367 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:38.3215740Z triton_bmm_2374 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:38.3216721Z triton_bmm_2363 0.0065 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:38.3218074Z triton_bmm_2368 0.0065 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:38.3219319Z triton_bmm_2369 0.0065 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:38.3220316Z triton_bmm_2370 0.0065 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:38.3221276Z triton_bmm_2371 0.0065 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:38.3222355Z triton_bmm_2373 0.0065 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:16:38.3223354Z triton_bmm_2366 0.0065 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:38.3224213Z SingleProcess AUTOTUNE benchmarking takes 0.1320 seconds and 0.0002 seconds precompiling for 14 choices 2025-09-07T11:16:38.4393124Z Autotune Choices Stats: 2025-09-07T11:16:38.4394141Z {"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_bmm_2378", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006816000211983919, "best_triton_pos": 0} 2025-09-07T11:16:38.4463475Z AUTOTUNE bmm(128x16x64, 128x64x49) 2025-09-07T11:16:38.4463769Z strides: [1024, 64, 1], [3136, 1, 64] 2025-09-07T11:16:38.4464048Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:38.4464733Z triton_bmm_2378 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:16:38.4465951Z triton_bmm_2377 0.0069 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:38.4466940Z triton_bmm_2383 0.0069 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:38.4467955Z triton_bmm_2384 0.0069 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:38.4468968Z triton_bmm_2386 0.0069 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:38.4469947Z triton_bmm_2385 0.0069 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:38.4470926Z triton_bmm_2382 0.0069 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:38.4471894Z triton_bmm_2376 0.0072 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:16:38.4472856Z triton_bmm_2379 0.0072 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:16:38.4473825Z triton_bmm_2381 0.0074 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:38.4475152Z SingleProcess AUTOTUNE benchmarking takes 0.1245 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T11:16:38.5224327Z Autotune Choices Stats: 2025-09-07T11:16:38.5225834Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_bmm_2387", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2", "best_time": 0.006783999968320131, "best_triton_pos": 0} 2025-09-07T11:16:38.5292922Z AUTOTUNE bmm(128x16x16, 128x16x49) 2025-09-07T11:16:38.5293221Z strides: [256, 1, 16], [784, 49, 1] 2025-09-07T11:16:38.5293535Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:38.5294207Z triton_bmm_2387 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:16:38.5295394Z triton_bmm_2388 0.0068 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:16:38.5296398Z triton_bmm_2390 0.0068 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:16:38.5297018Z bmm 0.0076 ms 89.1% 2025-09-07T11:16:38.5297598Z triton_bmm_2391 0.0079 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:38.5298573Z triton_bmm_2392 0.0079 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:38.5299543Z triton_bmm_2389 0.0080 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:38.5300511Z triton_bmm_2393 0.0080 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:38.5301363Z SingleProcess AUTOTUNE benchmarking takes 0.0824 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:16:38.6316572Z Autotune Choices Stats: 2025-09-07T11:16:38.6317582Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_bmm_2395", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1", "best_time": 0.007040000054985285, "best_triton_pos": 0} 2025-09-07T11:16:38.6387117Z AUTOTUNE bmm(128x16x49, 128x49x16) 2025-09-07T11:16:38.6387406Z strides: [784, 49, 1], [784, 1, 49] 2025-09-07T11:16:38.6387692Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:38.6388360Z triton_bmm_2395 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1 2025-09-07T11:16:38.6389436Z triton_bmm_2403 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=1 2025-09-07T11:16:38.6390463Z triton_bmm_2400 0.0071 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=1 2025-09-07T11:16:38.6391476Z triton_bmm_2397 0.0072 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1 2025-09-07T11:16:38.6392100Z bmm 0.0074 ms 94.8% 2025-09-07T11:16:38.6392692Z triton_bmm_2402 0.0075 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=1 2025-09-07T11:16:38.6394033Z triton_bmm_2396 0.0075 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1 2025-09-07T11:16:38.6395777Z triton_bmm_2401 0.0075 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=1 2025-09-07T11:16:38.6396764Z triton_bmm_2399 0.0076 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1 2025-09-07T11:16:38.6397737Z triton_bmm_2394 0.0085 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=1 2025-09-07T11:16:38.6398592Z SingleProcess AUTOTUNE benchmarking takes 0.1089 seconds and 0.0002 seconds precompiling for 11 choices 2025-09-07T11:16:38.8220978Z Autotune Choices Stats: 2025-09-07T11:16:38.8222052Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2427", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.00687999976798892, "best_triton_pos": 0} 2025-09-07T11:16:38.8288966Z AUTOTUNE mm(128x256, 256x256) 2025-09-07T11:16:38.8289230Z strides: [256, 1], [256, 1] 2025-09-07T11:16:38.8289500Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:38.8290166Z triton_mm_2427 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:38.8291177Z triton_mm_2431 0.0071 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:38.8291818Z mm 0.0071 ms 96.8% 2025-09-07T11:16:38.8292386Z triton_mm_2430 0.0072 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:38.8293346Z triton_mm_2426 0.0073 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:38.8294309Z triton_mm_2425 0.0074 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:38.8295657Z triton_mm_2424 0.0075 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:38.8296542Z triton_mm_2434 0.0075 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:38.8297439Z triton_mm_2435 0.0077 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:38.8298329Z triton_mm_2433 0.0079 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:38.8299110Z SingleProcess AUTOTUNE benchmarking takes 0.1896 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:39.0329913Z Autotune Choices Stats: 2025-09-07T11:16:39.0331133Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.009247999638319016, "best_triton_pos": 1, "best_triton_time": 0.009344000369310379, "best_triton_kernel": "triton_mm_2465", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:16:39.0400802Z AUTOTUNE mm(392x1280, 1280x256) 2025-09-07T11:16:39.0401038Z strides: [1280, 1], [256, 1] 2025-09-07T11:16:39.0401284Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:39.0401740Z mm 0.0092 ms 100.0% 2025-09-07T11:16:39.0402344Z triton_mm_2465 0.0093 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:39.0403335Z triton_mm_2469 0.0098 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:39.0404308Z triton_mm_2473 0.0110 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:39.0405590Z triton_mm_2464 0.0127 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:39.0406497Z triton_mm_2468 0.0129 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:39.0407383Z triton_mm_2463 0.0131 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:39.0408278Z triton_mm_2479 0.0135 ms 68.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:39.0409181Z triton_mm_2472 0.0135 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:39.0410075Z triton_mm_2462 0.0150 ms 61.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:39.0410858Z SingleProcess AUTOTUNE benchmarking takes 0.2106 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:39.2241703Z Autotune Choices Stats: 2025-09-07T11:16:39.2242665Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2503", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007135999854654074, "best_triton_pos": 0} 2025-09-07T11:16:39.2311346Z AUTOTUNE mm(392x256, 256x512) 2025-09-07T11:16:39.2311625Z strides: [256, 1], [512, 1] 2025-09-07T11:16:39.2311882Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:39.2312534Z triton_mm_2503 0.0071 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:39.2313554Z triton_mm_2507 0.0072 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:39.2314599Z triton_mm_2502 0.0074 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:39.2315945Z triton_mm_2506 0.0075 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:39.2316906Z triton_mm_2501 0.0076 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:39.2317509Z mm 0.0076 ms 94.1% 2025-09-07T11:16:39.2318080Z triton_mm_2510 0.0077 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:39.2319398Z triton_mm_2511 0.0078 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:39.2320371Z triton_mm_2500 0.0080 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:39.2321354Z triton_mm_2509 0.0080 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:39.2322193Z SingleProcess AUTOTUNE benchmarking takes 0.1894 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:16:39.4163873Z Autotune Choices Stats: 2025-09-07T11:16:39.4165243Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2579", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007007999811321497, "best_triton_pos": 0} 2025-09-07T11:16:39.4234798Z AUTOTUNE mm(392x256, 256x256) 2025-09-07T11:16:39.4235186Z strides: [256, 1], [256, 1] 2025-09-07T11:16:39.4235439Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:39.4236085Z triton_mm_2579 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:39.4237087Z triton_mm_2583 0.0071 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:39.4238056Z triton_mm_2578 0.0073 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:39.4239026Z triton_mm_2582 0.0073 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:39.4239632Z mm 0.0075 ms 94.0% 2025-09-07T11:16:39.4240202Z triton_mm_2577 0.0075 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:39.4241174Z triton_mm_2586 0.0076 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:39.4242152Z triton_mm_2576 0.0077 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:39.4243127Z triton_mm_2587 0.0078 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:39.4244104Z triton_mm_2585 0.0080 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:39.4245085Z SingleProcess AUTOTUNE benchmarking takes 0.1894 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:39.5599171Z Autotune Choices Stats: 2025-09-07T11:16:39.5600209Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_bmm_2596", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.0066559999249875546, "best_triton_pos": 0} 2025-09-07T11:16:39.5672895Z AUTOTUNE bmm(64x56x56, 64x56x32) 2025-09-07T11:16:39.5673290Z strides: [3136, 1, 56], [1792, 32, 1] 2025-09-07T11:16:39.5674263Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:39.5675674Z triton_bmm_2596 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:39.5676686Z triton_bmm_2603 0.0068 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:39.5677668Z triton_bmm_2595 0.0069 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:39.5678655Z triton_bmm_2601 0.0070 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:39.5679650Z triton_bmm_2607 0.0070 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:39.5680645Z triton_bmm_2602 0.0070 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:39.5681635Z triton_bmm_2597 0.0071 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:39.5682613Z triton_bmm_2598 0.0074 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:39.5683583Z triton_bmm_2604 0.0075 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:39.5684563Z triton_bmm_2605 0.0075 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:39.5685564Z SingleProcess AUTOTUNE benchmarking takes 0.1432 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T11:16:39.7050600Z Autotune Choices Stats: 2025-09-07T11:16:39.7051618Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_bmm_2609", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006560000125318766, "best_triton_pos": 0} 2025-09-07T11:16:39.7126457Z AUTOTUNE bmm(64x49x32, 64x32x49) 2025-09-07T11:16:39.7126789Z strides: [1568, 32, 1], [1600, 1, 32] 2025-09-07T11:16:39.7127087Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:39.7127779Z triton_bmm_2609 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:39.7128847Z triton_bmm_2612 0.0067 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:39.7129872Z triton_bmm_2608 0.0067 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:16:39.7130846Z triton_bmm_2611 0.0067 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:39.7131815Z triton_bmm_2610 0.0068 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:39.7132781Z triton_bmm_2614 0.0068 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:39.7134568Z triton_bmm_2618 0.0069 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:39.7135955Z triton_bmm_2621 0.0069 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:39.7136854Z triton_bmm_2615 0.0069 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:39.7137748Z triton_bmm_2620 0.0069 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:16:39.7138533Z SingleProcess AUTOTUNE benchmarking takes 0.1447 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T11:16:39.8318629Z Autotune Choices Stats: 2025-09-07T11:16:39.8319605Z {"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_bmm_2629", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.007007999811321497, "best_triton_pos": 0} 2025-09-07T11:16:39.8392229Z AUTOTUNE bmm(64x16x49, 64x49x49) 2025-09-07T11:16:39.8392644Z strides: [784, 1, 16], [2401, 49, 1] 2025-09-07T11:16:39.8392936Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:39.8393624Z triton_bmm_2629 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:39.8394764Z triton_bmm_2630 0.0070 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:39.8396111Z triton_bmm_2633 0.0070 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:39.8397105Z triton_bmm_2623 0.0071 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:16:39.8398082Z triton_bmm_2626 0.0072 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:16:39.8398709Z bmm 0.0076 ms 92.8% 2025-09-07T11:16:39.8399276Z triton_bmm_2622 0.0082 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:16:39.8400257Z triton_bmm_2624 0.0082 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:39.8401240Z triton_bmm_2632 0.0084 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:39.8402211Z triton_bmm_2628 0.0087 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:39.8403062Z SingleProcess AUTOTUNE benchmarking takes 0.1260 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T11:16:39.9486792Z Autotune Choices Stats: 2025-09-07T11:16:39.9487776Z {"num_choices": 12, "num_triton_choices": 11, "best_kernel": "triton_bmm_2636", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.00684799998998642, "best_triton_pos": 0} 2025-09-07T11:16:39.9560996Z AUTOTUNE bmm(64x49x49, 64x49x16) 2025-09-07T11:16:39.9561235Z strides: [2401, 49, 1], [784, 1, 49] 2025-09-07T11:16:39.9561481Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:39.9562301Z triton_bmm_2636 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:16:39.9563244Z triton_bmm_2643 0.0069 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:39.9564176Z triton_bmm_2637 0.0070 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:39.9565377Z triton_bmm_2640 0.0070 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:39.9566292Z triton_bmm_2642 0.0070 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:39.9567196Z triton_bmm_2638 0.0070 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:39.9568098Z triton_bmm_2641 0.0070 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:39.9568996Z triton_bmm_2644 0.0071 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:39.9569894Z triton_bmm_2635 0.0072 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:16:39.9570458Z bmm 0.0074 ms 92.2% 2025-09-07T11:16:39.9570874Z SingleProcess AUTOTUNE benchmarking takes 0.1164 seconds and 0.0002 seconds precompiling for 12 choices 2025-09-07T11:16:40.1867716Z Autotune Choices Stats: 2025-09-07T11:16:40.1868741Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_bmm_3422", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.007519999984651804, "best_triton_pos": 0} 2025-09-07T11:16:40.1942925Z AUTOTUNE bmm(64x196x49, 64x49x64) 2025-09-07T11:16:40.1943231Z strides: [9664, 1, 196], [3136, 64, 1] 2025-09-07T11:16:40.1943518Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:40.1944213Z triton_bmm_3422 0.0075 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:40.1945564Z triton_bmm_3415 0.0076 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:40.1946571Z triton_bmm_3418 0.0076 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:40.1947582Z triton_bmm_3419 0.0076 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:40.1948589Z triton_bmm_3416 0.0076 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:40.1949562Z triton_bmm_3413 0.0077 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:40.1950973Z triton_bmm_3414 0.0077 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:40.1951961Z triton_bmm_3423 0.0077 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:40.1952943Z triton_bmm_3417 0.0077 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:40.1953919Z triton_bmm_3420 0.0077 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:40.1954817Z SingleProcess AUTOTUNE benchmarking takes 0.1785 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:16:40.3620396Z Autotune Choices Stats: 2025-09-07T11:16:40.3621466Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_3431", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.00774399982765317, "best_triton_pos": 0} 2025-09-07T11:16:40.3696421Z AUTOTUNE bmm(64x49x64, 64x64x196) 2025-09-07T11:16:40.3696743Z strides: [3136, 64, 1], [12544, 1, 64] 2025-09-07T11:16:40.3697043Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:40.3697720Z triton_bmm_3431 0.0077 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:40.3698732Z triton_bmm_3432 0.0078 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:40.3699757Z triton_bmm_3440 0.0078 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:40.3700741Z triton_bmm_3436 0.0079 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:40.3701837Z triton_bmm_3435 0.0079 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:40.3702832Z triton_bmm_3437 0.0079 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:40.3703805Z triton_bmm_3438 0.0079 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:40.3704829Z triton_bmm_3425 0.0079 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:40.3706197Z triton_bmm_3433 0.0080 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:40.3707170Z triton_bmm_3434 0.0081 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:40.3708021Z SingleProcess AUTOTUNE benchmarking takes 0.1747 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:16:40.5307386Z Autotune Choices Stats: 2025-09-07T11:16:40.5308320Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_3457", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.007199999876320362, "best_triton_pos": 0} 2025-09-07T11:16:40.5382941Z AUTOTUNE bmm(64x16x49, 64x49x196) 2025-09-07T11:16:40.5383296Z strides: [784, 1, 16], [9604, 196, 1] 2025-09-07T11:16:40.5383590Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:40.5384283Z triton_bmm_3457 0.0072 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:40.5385709Z triton_bmm_3443 0.0072 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:40.5386693Z triton_bmm_3454 0.0072 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:40.5387685Z triton_bmm_3455 0.0072 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:40.5388654Z triton_bmm_3448 0.0073 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:40.5389620Z triton_bmm_3449 0.0073 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:40.5390583Z triton_bmm_3444 0.0074 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:16:40.5391554Z triton_bmm_3451 0.0074 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:40.5392539Z triton_bmm_3450 0.0075 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:40.5393535Z triton_bmm_3453 0.0075 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:40.5394447Z SingleProcess AUTOTUNE benchmarking takes 0.1680 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:16:40.6581830Z Autotune Choices Stats: 2025-09-07T11:16:40.6582809Z {"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_bmm_3469", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007648000027984381, "best_triton_pos": 0} 2025-09-07T11:16:40.6657096Z AUTOTUNE bmm(64x49x196, 64x196x16) 2025-09-07T11:16:40.6657409Z strides: [9604, 196, 1], [3136, 1, 196] 2025-09-07T11:16:40.6657716Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:40.6658421Z triton_bmm_3469 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:40.6659438Z triton_bmm_3460 0.0078 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:16:40.6660454Z triton_bmm_3459 0.0078 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:16:40.6661535Z triton_bmm_3461 0.0079 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:40.6663030Z triton_bmm_3465 0.0079 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:40.6664006Z triton_bmm_3467 0.0079 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:40.6665378Z triton_bmm_3468 0.0079 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:40.6666355Z triton_bmm_3462 0.0081 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:40.6667329Z triton_bmm_3466 0.0084 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:40.6667957Z bmm 0.0090 ms 85.4% 2025-09-07T11:16:40.6668412Z SingleProcess AUTOTUNE benchmarking takes 0.1269 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T11:16:40.8453682Z Autotune Choices Stats: 2025-09-07T11:16:40.8454669Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_3492", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.0066559999249875546, "best_triton_pos": 0} 2025-09-07T11:16:40.8526284Z AUTOTUNE mm(392x128, 128x128) 2025-09-07T11:16:40.8526567Z strides: [128, 1], [128, 1] 2025-09-07T11:16:40.8526840Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:40.8527502Z triton_mm_3492 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:40.8528538Z triton_mm_3496 0.0067 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:40.8529520Z triton_mm_3491 0.0068 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:40.8530481Z triton_mm_3490 0.0068 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:40.8531447Z triton_mm_3497 0.0069 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:40.8532416Z triton_mm_3500 0.0069 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:40.8533048Z mm 0.0071 ms 94.1% 2025-09-07T11:16:40.8533624Z triton_mm_3499 0.0071 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:40.8534596Z triton_mm_3503 0.0071 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:40.8535732Z triton_mm_3493 0.0072 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:40.8536466Z SingleProcess AUTOTUNE benchmarking takes 0.1863 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:41.0436450Z Autotune Choices Stats: 2025-09-07T11:16:41.0437710Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_3531", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008576000109314919, "best_triton_pos": 0} 2025-09-07T11:16:41.0509653Z AUTOTUNE mm(1568x640, 640x128) 2025-09-07T11:16:41.0509911Z strides: [640, 1], [128, 1] 2025-09-07T11:16:41.0510173Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:41.0510820Z triton_mm_3531 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:41.0519403Z triton_mm_3535 0.0089 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:41.0520027Z mm 0.0089 ms 96.1% 2025-09-07T11:16:41.0520600Z triton_mm_3539 0.0094 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:41.0521545Z triton_mm_3530 0.0099 ms 86.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:41.0522475Z triton_mm_3534 0.0101 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:41.0523386Z triton_mm_3529 0.0103 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:41.0524279Z triton_mm_3538 0.0106 ms 81.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:41.0525348Z triton_mm_3545 0.0109 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:41.0526242Z triton_mm_3528 0.0111 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:41.0526935Z SingleProcess AUTOTUNE benchmarking takes 0.1978 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:41.2310785Z Autotune Choices Stats: 2025-09-07T11:16:41.2311732Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_3572", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.006976000033318996, "best_triton_pos": 0} 2025-09-07T11:16:41.2385327Z AUTOTUNE mm(1568x128, 128x256) 2025-09-07T11:16:41.2385830Z strides: [128, 1], [256, 1] 2025-09-07T11:16:41.2386128Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:41.2386786Z triton_mm_3572 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:41.2387790Z triton_mm_3573 0.0072 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:41.2388755Z triton_mm_3576 0.0072 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:41.2389704Z triton_mm_3575 0.0074 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:41.2390699Z triton_mm_3577 0.0074 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:41.2392132Z triton_mm_3579 0.0074 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:41.2393109Z triton_mm_3574 0.0074 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:41.2394070Z triton_mm_3566 0.0075 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:41.2395181Z triton_mm_3578 0.0075 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:41.2396155Z triton_mm_3567 0.0076 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:41.2396873Z SingleProcess AUTOTUNE benchmarking takes 0.1869 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:41.4205989Z Autotune Choices Stats: 2025-09-07T11:16:41.4207563Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_3644", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.006912000011652708, "best_triton_pos": 0} 2025-09-07T11:16:41.4279030Z AUTOTUNE mm(1568x128, 128x128) 2025-09-07T11:16:41.4279531Z strides: [128, 1], [128, 1] 2025-09-07T11:16:41.4279967Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:41.4281076Z triton_mm_3644 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:41.4282717Z triton_mm_3648 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:41.4284276Z triton_mm_3643 0.0070 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:41.4286116Z triton_mm_3642 0.0072 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:41.4287017Z triton_mm_3649 0.0072 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:41.4287924Z triton_mm_3652 0.0072 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:41.4288842Z triton_mm_3650 0.0074 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:41.4289744Z triton_mm_3651 0.0074 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:41.4290644Z triton_mm_3653 0.0074 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:41.4291542Z triton_mm_3655 0.0074 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:41.4292324Z SingleProcess AUTOTUNE benchmarking takes 0.1865 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:16:41.5926865Z Autotune Choices Stats: 2025-09-07T11:16:41.5928793Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_3662", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008383999578654766, "best_triton_pos": 0} 2025-09-07T11:16:41.6005975Z AUTOTUNE bmm(32x196x196, 32x196x32) 2025-09-07T11:16:41.6006495Z strides: [38464, 1, 196], [6272, 32, 1] 2025-09-07T11:16:41.6006993Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:41.6008119Z triton_bmm_3662 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:41.6009763Z triton_bmm_3663 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:41.6011335Z triton_bmm_3670 0.0085 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:41.6012913Z triton_bmm_3671 0.0085 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:41.6014478Z triton_bmm_3667 0.0086 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:41.6016223Z triton_bmm_3664 0.0086 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:41.6017072Z triton_bmm_3669 0.0086 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:41.6017929Z triton_bmm_3675 0.0089 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:41.6018790Z triton_bmm_3673 0.0089 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:41.6019653Z triton_bmm_3661 0.0091 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:41.6020392Z SingleProcess AUTOTUNE benchmarking takes 0.1721 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:16:41.7621718Z Autotune Choices Stats: 2025-09-07T11:16:41.7623324Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_3686", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.007615999784320593, "best_triton_pos": 0} 2025-09-07T11:16:41.7697382Z AUTOTUNE bmm(32x196x32, 32x32x196) 2025-09-07T11:16:41.7697865Z strides: [6272, 32, 1], [6272, 1, 32] 2025-09-07T11:16:41.7698335Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:41.7699420Z triton_bmm_3686 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:41.7701067Z triton_bmm_3688 0.0077 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:41.7702793Z triton_bmm_3684 0.0077 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:16:41.7704820Z triton_bmm_3687 0.0077 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:41.7706682Z triton_bmm_3678 0.0077 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:41.7707659Z triton_bmm_3683 0.0077 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:16:41.7708619Z triton_bmm_3685 0.0077 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:41.7709588Z triton_bmm_3691 0.0078 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:16:41.7710570Z triton_bmm_3690 0.0079 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:41.7711554Z triton_bmm_3692 0.0079 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:41.7712408Z SingleProcess AUTOTUNE benchmarking takes 0.1687 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:16:41.9347275Z Autotune Choices Stats: 2025-09-07T11:16:41.9348239Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_3697", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.008320000022649765, "best_triton_pos": 0} 2025-09-07T11:16:41.9426988Z AUTOTUNE bmm(32x16x196, 32x196x196) 2025-09-07T11:16:41.9427277Z strides: [3136, 1, 16], [38416, 196, 1] 2025-09-07T11:16:41.9427557Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:41.9428222Z triton_bmm_3697 0.0083 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:16:41.9429249Z triton_bmm_3696 0.0084 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:41.9430239Z triton_bmm_3698 0.0085 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:16:41.9431219Z triton_bmm_3701 0.0085 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:41.9432214Z triton_bmm_3702 0.0086 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:41.9433231Z triton_bmm_3708 0.0086 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:41.9434219Z triton_bmm_3695 0.0088 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:16:41.9436236Z triton_bmm_3705 0.0088 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:41.9437183Z triton_bmm_3704 0.0089 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:16:41.9438492Z triton_bmm_3707 0.0091 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:41.9439296Z SingleProcess AUTOTUNE benchmarking takes 0.1724 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:16:42.1007813Z Autotune Choices Stats: 2025-09-07T11:16:42.1009408Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_bmm_3715", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008191999979317188, "best_triton_pos": 0} 2025-09-07T11:16:42.1087663Z AUTOTUNE bmm(32x200x200, 32x200x16) 2025-09-07T11:16:42.1088130Z strides: [40000, 200, 1], [3200, 1, 200] 2025-09-07T11:16:42.1088586Z dtypes: torch.float16, torch.float16 2025-09-07T11:16:42.1089661Z triton_bmm_3715 0.0082 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:42.1091277Z triton_bmm_3721 0.0082 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:42.1092847Z triton_bmm_3713 0.0083 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:16:42.1094392Z triton_bmm_3714 0.0084 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:16:42.1096389Z triton_bmm_3718 0.0085 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:42.1097297Z triton_bmm_3720 0.0085 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:16:42.1097883Z bmm 0.0086 ms 95.2% 2025-09-07T11:16:42.1098419Z triton_bmm_3726 0.0086 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:16:42.1099334Z triton_bmm_3725 0.0087 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:16:42.1100249Z triton_bmm_3712 0.0088 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:16:42.1101036Z SingleProcess AUTOTUNE benchmarking takes 0.1654 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T11:16:50.2268765Z skipping cudagraphs due to disabling cudagraphs due to incompatible op aten.index_put.default Found from File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 442, in torch_dynamo_resume_in_forward_and_backward_pass_at_440 2025-09-07T11:16:50.2269838Z pred = mod(*cloned_inputs) 2025-09-07T11:16:50.2270314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 718, in forward 2025-09-07T11:16:50.2270797Z x = self.forward_features(x) 2025-09-07T11:16:50.2271284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 709, in forward_features 2025-09-07T11:16:50.2271792Z x = self.stages(x) 2025-09-07T11:16:50.2272214Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 520, in forward 2025-09-07T11:16:50.2272671Z x = self.blocks(x) 2025-09-07T11:16:50.2273081Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 458, in forward 2025-09-07T11:16:50.2274009Z x = x + self.drop_path1(self.attn(x)) 2025-09-07T11:16:50.2274489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 237, in forward 2025-09-07T11:16:50.2275550Z attn = q @ k * self.scale + self.get_attention_biases(x.device) 2025-09-07T11:16:50.2276170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 212, in get_attention_biases 2025-09-07T11:16:50.2276771Z return self.attention_biases[:, self.attention_bias_idxs] 2025-09-07T11:16:50.2277014Z 2025-09-07T11:16:50.2277018Z 2025-09-07T11:16:54.4461245Z W0907 11:16:54.445000 73538 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T11:17:23.8872945Z pass 2025-09-07T11:17:31.0842350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:17:31.0844246Z import pynvml # type: ignore[import] 2025-09-07T11:17:34.0951159Z 2025-09-07T11:17:35.4380443Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:17:35.4380815Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:17:35.4381113Z cuda train mixer_b16_224 2025-09-07T11:17:49.6269285Z Autotune Choices Stats: 2025-09-07T11:17:49.6271137Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.018751999363303185, "best_triton_pos": 1, "best_triton_time": 0.024671999737620354, "best_triton_kernel": "triton_mm_62", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:17:49.6354802Z AUTOTUNE addmm(1568x3072, 1568x768, 768x3072) 2025-09-07T11:17:49.6355356Z strides: [0, 1], [768, 1], [1, 768] 2025-09-07T11:17:49.6355630Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:17:49.6355897Z bias_addmm 0.0188 ms 100.0% 2025-09-07T11:17:49.6356427Z triton_mm_62 0.0247 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:17:49.6357268Z triton_mm_56 0.0248 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:17:49.6358122Z triton_mm_63 0.0278 ms 67.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:17:49.6358898Z triton_mm_55 0.0284 ms 65.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:17:49.6359704Z triton_mm_61 0.0295 ms 63.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:17:49.6360207Z addmm 0.0296 ms 63.4% 2025-09-07T11:17:49.6360696Z triton_mm_58 0.0309 ms 60.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:17:49.6361476Z triton_mm_57 0.0314 ms 59.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:17:49.6362247Z triton_mm_59 0.0316 ms 59.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:17:49.6362928Z SingleProcess AUTOTUNE benchmarking takes 0.3148 seconds and 0.0004 seconds precompiling for 21 choices 2025-09-07T11:17:50.4054172Z Autotune Choices Stats: 2025-09-07T11:17:50.4056388Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.013055999763309956, "best_triton_pos": 0} 2025-09-07T11:17:50.4137009Z AUTOTUNE mm(6144x196, 196x384) 2025-09-07T11:17:50.4137293Z strides: [196, 1], [1, 196] 2025-09-07T11:17:50.4137583Z dtypes: torch.float16, torch.float16 2025-09-07T11:17:50.4138290Z triton_mm_16 0.0131 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:17:50.4139349Z triton_mm_22 0.0137 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:17:50.4140407Z triton_mm_13 0.0137 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:17:50.4141540Z triton_mm_20 0.0138 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:17:50.4142602Z triton_mm_23 0.0138 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:17:50.4143640Z triton_mm_17 0.0144 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:17:50.4144705Z triton_mm_21 0.0145 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:17:50.4145915Z triton_mm_24 0.0150 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:17:50.4146879Z triton_mm_18 0.0150 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:17:50.4147825Z triton_mm_12 0.0158 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:17:50.4148671Z SingleProcess AUTOTUNE benchmarking takes 0.2500 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:17:51.1586196Z Autotune Choices Stats: 2025-09-07T11:17:51.1589243Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.13014400005340576, "best_triton_pos": 1, "best_triton_time": 0.13264000415802002, "best_triton_kernel": "triton_convolution2d_6", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T11:17:51.1672878Z AUTOTUNE convolution(8x3x224x224, 768x3x16x16) 2025-09-07T11:17:51.1673263Z strides: [150528, 50176, 224, 1], [768, 256, 16, 1] 2025-09-07T11:17:51.1673585Z dtypes: torch.float16, torch.float16 2025-09-07T11:17:51.1673853Z convolution 0.1301 ms 100.0% 2025-09-07T11:17:51.1675268Z triton_convolution2d_6 0.1326 ms 98.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:17:51.1676723Z triton_convolution2d_3 0.1455 ms 89.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:17:51.1678612Z triton_convolution2d_1 0.1466 ms 88.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:17:51.1679909Z triton_convolution2d_4 0.1751 ms 74.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:17:51.1681191Z triton_convolution2d_5 0.1967 ms 66.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:17:51.1682535Z triton_convolution2d_0 0.2205 ms 59.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:17:51.1683785Z triton_convolution2d_2 0.4037 ms 32.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:17:51.1684761Z SingleProcess AUTOTUNE benchmarking takes 0.2331 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T11:17:51.6942492Z Autotune Choices Stats: 2025-09-07T11:17:51.6943920Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_33", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.011296000331640244, "best_triton_pos": 0} 2025-09-07T11:17:51.7027256Z AUTOTUNE mm(6144x384, 384x196) 2025-09-07T11:17:51.7027520Z strides: [384, 1], [1, 384] 2025-09-07T11:17:51.7027739Z dtypes: torch.float16, torch.float16 2025-09-07T11:17:51.7028306Z triton_mm_33 0.0113 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:17:51.7029168Z triton_mm_37 0.0116 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:17:51.7030314Z triton_mm_44 0.0119 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:17:51.7031549Z triton_mm_40 0.0120 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:17:51.7032778Z triton_mm_43 0.0122 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:17:51.7034167Z triton_mm_36 0.0123 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:17:51.7035794Z triton_mm_39 0.0128 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:17:51.7037192Z triton_mm_35 0.0130 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:17:51.7038596Z triton_mm_42 0.0134 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:17:51.7039999Z triton_mm_38 0.0136 ms 83.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:17:51.7041689Z SingleProcess AUTOTUNE benchmarking takes 0.2522 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:17:53.2466450Z Autotune Choices Stats: 2025-09-07T11:17:53.2468372Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01929599978029728, "best_triton_pos": 1, "best_triton_time": 0.024191999807953835, "best_triton_kernel": "triton_mm_82", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:17:53.2549798Z AUTOTUNE mm(1568x3072, 3072x768) 2025-09-07T11:17:53.2550086Z strides: [3072, 1], [1, 3072] 2025-09-07T11:17:53.2550363Z dtypes: torch.float16, torch.float16 2025-09-07T11:17:53.2550641Z mm 0.0193 ms 100.0% 2025-09-07T11:17:53.2551325Z triton_mm_82 0.0242 ms 79.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:17:53.2552440Z triton_mm_76 0.0289 ms 66.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:17:53.2553490Z triton_mm_75 0.0296 ms 65.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:17:53.2554525Z triton_mm_71 0.0305 ms 63.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:17:53.2555769Z triton_mm_81 0.0316 ms 61.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:17:53.2556784Z triton_mm_72 0.0329 ms 58.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:17:53.2557739Z triton_mm_74 0.0354 ms 54.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:17:53.2558688Z triton_mm_78 0.0357 ms 54.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:17:53.2559645Z triton_mm_68 0.0439 ms 44.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:17:53.2560485Z SingleProcess AUTOTUNE benchmarking takes 0.3322 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:17:53.5343898Z Autotune Choices Stats: 2025-09-07T11:17:53.5346174Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_923", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.008448000065982342, "best_triton_pos": 0} 2025-09-07T11:17:53.5426224Z AUTOTUNE addmm(8x1000, 8x768, 768x1000) 2025-09-07T11:17:53.5426513Z strides: [0, 1], [768, 1], [1, 768] 2025-09-07T11:17:53.5426811Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:17:53.5427512Z triton_mm_923 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:17:53.5428501Z triton_mm_927 0.0090 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:17:53.5429126Z bias_addmm 0.0092 ms 92.3% 2025-09-07T11:17:53.5429724Z triton_mm_931 0.0098 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:17:53.5431217Z triton_mm_935 0.0101 ms 83.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:17:53.5432183Z triton_mm_922 0.0105 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:17:53.5433135Z triton_mm_921 0.0109 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:17:53.5434076Z triton_mm_926 0.0110 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:17:53.5435202Z triton_mm_920 0.0113 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:17:53.5436178Z triton_mm_930 0.0117 ms 72.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:17:53.5436991Z SingleProcess AUTOTUNE benchmarking takes 0.2525 seconds and 0.0003 seconds precompiling for 19 choices 2025-09-07T11:18:05.9224720Z Autotune Choices Stats: 2025-09-07T11:18:05.9226202Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01945599913597107, "best_triton_pos": 1, "best_triton_time": 0.02006400004029274, "best_triton_kernel": "triton_mm_985", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:18:05.9306914Z AUTOTUNE mm(1568x768, 768x3072) 2025-09-07T11:18:05.9307216Z strides: [768, 1], [3072, 1] 2025-09-07T11:18:05.9307483Z dtypes: torch.float16, torch.float16 2025-09-07T11:18:05.9307762Z mm 0.0195 ms 100.0% 2025-09-07T11:18:05.9308403Z triton_mm_985 0.0201 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:05.9309587Z triton_mm_978 0.0214 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:05.9310572Z triton_mm_986 0.0226 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:05.9311536Z triton_mm_980 0.0232 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:05.9312499Z triton_mm_979 0.0240 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:18:05.9313469Z triton_mm_987 0.0254 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:18:05.9314454Z triton_mm_983 0.0278 ms 69.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:18:05.9315589Z triton_mm_982 0.0281 ms 69.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:05.9316543Z triton_mm_981 0.0288 ms 67.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:18:05.9317800Z SingleProcess AUTOTUNE benchmarking takes 0.2625 seconds and 0.0004 seconds precompiling for 20 choices 2025-09-07T11:18:06.7194735Z Autotune Choices Stats: 2025-09-07T11:18:06.7196943Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.018239999189972878, "best_triton_pos": 1, "best_triton_time": 0.021824000403285027, "best_triton_kernel": "triton_mm_998", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8"} 2025-09-07T11:18:06.7280202Z AUTOTUNE mm(768x1568, 1568x3072) 2025-09-07T11:18:06.7280537Z strides: [1, 768], [3072, 1] 2025-09-07T11:18:06.7280816Z dtypes: torch.float16, torch.float16 2025-09-07T11:18:06.7281095Z mm 0.0182 ms 100.0% 2025-09-07T11:18:06.7281723Z triton_mm_998 0.0218 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:18:06.7282736Z triton_mm_1002 0.0228 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:18:06.7283757Z triton_mm_999 0.0235 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:06.7284743Z triton_mm_1005 0.0246 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:06.7285942Z triton_mm_997 0.0265 ms 68.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:06.7286916Z triton_mm_1001 0.0273 ms 66.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:06.7287909Z triton_mm_1004 0.0275 ms 66.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:06.7288898Z triton_mm_1000 0.0297 ms 61.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:18:06.7290027Z triton_mm_995 0.0298 ms 61.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:18:06.7290871Z SingleProcess AUTOTUNE benchmarking takes 0.2636 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:18:07.1726802Z Autotune Choices Stats: 2025-09-07T11:18:07.1728131Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.017952000722289085, "best_triton_pos": 1, "best_triton_time": 0.021824000403285027, "best_triton_kernel": "triton_mm_1036", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8"} 2025-09-07T11:18:07.1813939Z AUTOTUNE mm(3072x1568, 1568x768) 2025-09-07T11:18:07.1814263Z strides: [1, 3072], [768, 1] 2025-09-07T11:18:07.1814551Z dtypes: torch.float16, torch.float16 2025-09-07T11:18:07.1814836Z mm 0.0180 ms 100.0% 2025-09-07T11:18:07.1815662Z triton_mm_1036 0.0218 ms 82.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:18:07.1816672Z triton_mm_1040 0.0226 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:18:07.1817646Z triton_mm_1037 0.0240 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:07.1819309Z triton_mm_1043 0.0247 ms 72.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:07.1820306Z triton_mm_1035 0.0266 ms 67.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:07.1821203Z triton_mm_1042 0.0270 ms 66.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:07.1822206Z triton_mm_1039 0.0271 ms 66.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:07.1823113Z triton_mm_1038 0.0294 ms 61.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:18:07.1824023Z triton_mm_1044 0.0300 ms 59.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:18:07.1824810Z SingleProcess AUTOTUNE benchmarking takes 0.2649 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:18:07.6698259Z Autotune Choices Stats: 2025-09-07T11:18:07.6699324Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1054", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.011359999887645245, "best_triton_pos": 0} 2025-09-07T11:18:07.6783695Z AUTOTUNE mm(6144x196, 196x384) 2025-09-07T11:18:07.6784054Z strides: [196, 1], [384, 1] 2025-09-07T11:18:07.6784396Z dtypes: torch.float16, torch.float16 2025-09-07T11:18:07.6785323Z triton_mm_1054 0.0114 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:07.6786386Z triton_mm_1056 0.0122 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:07.6787390Z triton_mm_1061 0.0124 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:07.6788373Z triton_mm_1055 0.0124 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:18:07.6789361Z triton_mm_1051 0.0124 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:18:07.6790516Z triton_mm_1058 0.0128 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:07.6791492Z triton_mm_1060 0.0130 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:18:07.6792466Z triton_mm_1062 0.0132 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:07.6793446Z triton_mm_1059 0.0132 ms 85.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:18:07.6794413Z triton_mm_1050 0.0148 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:18:07.6796022Z SingleProcess AUTOTUNE benchmarking takes 0.2091 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:18:08.4676774Z Autotune Choices Stats: 2025-09-07T11:18:08.4677792Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_957", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006560000125318766, "best_triton_pos": 0} 2025-09-07T11:18:08.4762292Z AUTOTUNE mm(1000x8, 8x768) 2025-09-07T11:18:08.4762566Z strides: [1, 1000], [768, 1] 2025-09-07T11:18:08.4762842Z dtypes: torch.float16, torch.float16 2025-09-07T11:18:08.4763534Z triton_mm_957 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:18:08.4764562Z triton_mm_963 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:18:08.4765882Z triton_mm_965 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:18:08.4766864Z triton_mm_959 0.0066 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:18:08.4767822Z triton_mm_964 0.0066 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:08.4768781Z triton_mm_955 0.0067 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:18:08.4769775Z triton_mm_962 0.0067 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:18:08.4770766Z triton_mm_960 0.0067 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:18:08.4771644Z triton_mm_961 0.0067 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:08.4772529Z triton_mm_954 0.0067 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:18:08.4773310Z SingleProcess AUTOTUNE benchmarking takes 0.1685 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T11:18:09.0092898Z Autotune Choices Stats: 2025-09-07T11:18:09.0094201Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.019551999866962433, "best_triton_pos": 1, "best_triton_time": 0.04089599847793579, "best_triton_kernel": "triton_mm_1067", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:18:09.0177778Z AUTOTUNE mm(196x6144, 6144x384) 2025-09-07T11:18:09.0178098Z strides: [1, 196], [384, 1] 2025-09-07T11:18:09.0178316Z dtypes: torch.float16, torch.float16 2025-09-07T11:18:09.0178543Z mm 0.0196 ms 100.0% 2025-09-07T11:18:09.0179060Z triton_mm_1067 0.0409 ms 47.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:18:09.0179908Z triton_mm_1068 0.0411 ms 47.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:18:09.0181530Z triton_mm_1066 0.0414 ms 47.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:18:09.0182529Z triton_mm_1072 0.0417 ms 46.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:18:09.0183506Z triton_mm_1076 0.0445 ms 44.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:18:09.0184473Z triton_mm_1071 0.0451 ms 43.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:18:09.0185613Z triton_mm_1075 0.0468 ms 41.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:09.0186626Z triton_mm_1074 0.0518 ms 37.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:18:09.0187608Z triton_mm_1065 0.0526 ms 37.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:18:09.0188477Z SingleProcess AUTOTUNE benchmarking takes 0.3414 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:18:09.5223126Z Autotune Choices Stats: 2025-09-07T11:18:09.5224455Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.019007999449968338, "best_triton_pos": 1, "best_triton_time": 0.029920000582933426, "best_triton_kernel": "triton_mm_1087", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:18:09.5302984Z AUTOTUNE mm(384x6144, 6144x196) 2025-09-07T11:18:09.5303406Z strides: [1, 384], [196, 1] 2025-09-07T11:18:09.5303677Z dtypes: torch.float16, torch.float16 2025-09-07T11:18:09.5303953Z mm 0.0190 ms 100.0% 2025-09-07T11:18:09.5304570Z triton_mm_1087 0.0299 ms 63.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:18:09.5305746Z triton_mm_1086 0.0387 ms 49.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:18:09.5306741Z triton_mm_1085 0.0428 ms 44.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:18:09.5307726Z triton_mm_1091 0.0431 ms 44.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:18:09.5308708Z triton_mm_1090 0.0442 ms 43.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:18:09.5309684Z triton_mm_1094 0.0512 ms 37.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:09.5310806Z triton_mm_1097 0.0517 ms 36.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:18:09.5311769Z triton_mm_1093 0.0533 ms 35.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:18:09.5313512Z triton_mm_1095 0.0535 ms 35.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:18:09.5314370Z SingleProcess AUTOTUNE benchmarking takes 0.3337 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:18:10.6164447Z Autotune Choices Stats: 2025-09-07T11:18:10.6166125Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "mm", "best_time": 0.00902399979531765, "best_triton_pos": 1, "best_triton_time": 0.00940799992531538, "best_triton_kernel": "triton_mm_940", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2"} 2025-09-07T11:18:10.6248484Z AUTOTUNE mm(8x1000, 1000x768) 2025-09-07T11:18:10.6248762Z strides: [1000, 1], [768, 1] 2025-09-07T11:18:10.6249061Z dtypes: torch.float16, torch.float16 2025-09-07T11:18:10.6249333Z mm 0.0090 ms 100.0% 2025-09-07T11:18:10.6249980Z triton_mm_940 0.0094 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:18:10.6250993Z triton_mm_944 0.0097 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:18:10.6252002Z triton_mm_948 0.0099 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:18:10.6252976Z triton_mm_939 0.0114 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:18:10.6253936Z triton_mm_938 0.0115 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:18:10.6254910Z triton_mm_952 0.0115 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:18:10.6256040Z triton_mm_943 0.0121 ms 74.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:10.6257033Z triton_mm_950 0.0128 ms 70.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:18:10.6257998Z triton_mm_947 0.0128 ms 70.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:10.6258870Z SingleProcess AUTOTUNE benchmarking takes 0.1939 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:18:10.8976101Z Autotune Choices Stats: 2025-09-07T11:18:10.8977385Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01849599927663803, "best_triton_pos": 1, "best_triton_time": 0.02470399998128414, "best_triton_kernel": "triton_mm_1025", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:18:10.9054869Z AUTOTUNE mm(1568x3072, 3072x768) 2025-09-07T11:18:10.9055425Z strides: [3072, 1], [768, 1] 2025-09-07T11:18:10.9055695Z dtypes: torch.float16, torch.float16 2025-09-07T11:18:10.9055965Z mm 0.0185 ms 100.0% 2025-09-07T11:18:10.9056591Z triton_mm_1025 0.0247 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:18:10.9058101Z triton_mm_1018 0.0288 ms 64.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:10.9059342Z triton_mm_1019 0.0291 ms 63.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:18:10.9060341Z triton_mm_1014 0.0299 ms 61.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:18:10.9061306Z triton_mm_1024 0.0308 ms 60.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:10.9062314Z triton_mm_1015 0.0317 ms 58.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:18:10.9063220Z triton_mm_1017 0.0339 ms 54.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:18:10.9064118Z triton_mm_1021 0.0346 ms 53.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:18:10.9065156Z triton_mm_1016 0.0429 ms 43.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:10.9065956Z SingleProcess AUTOTUNE benchmarking takes 0.2793 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:18:11.1089318Z Autotune Choices Stats: 2025-09-07T11:18:11.1090596Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.010816000401973724, "best_triton_pos": 1, "best_triton_time": 0.012256000190973282, "best_triton_kernel": "triton_mm_1120", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:18:11.1168258Z AUTOTUNE mm(6144x384, 384x200) 2025-09-07T11:18:11.1168529Z strides: [384, 1], [200, 1] 2025-09-07T11:18:11.1168793Z dtypes: torch.float16, torch.float16 2025-09-07T11:18:11.1169070Z mm 0.0108 ms 100.0% 2025-09-07T11:18:11.1169713Z triton_mm_1120 0.0123 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:18:11.1170702Z triton_mm_1119 0.0125 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:11.1171685Z triton_mm_1116 0.0130 ms 83.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:18:11.1172685Z triton_mm_1115 0.0134 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:11.1173660Z triton_mm_1113 0.0138 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:11.1174628Z triton_mm_1109 0.0139 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:18:11.1175986Z triton_mm_1118 0.0139 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:11.1176963Z triton_mm_1112 0.0145 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:18:11.1178705Z triton_mm_1111 0.0146 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:18:11.1179568Z SingleProcess AUTOTUNE benchmarking takes 0.2098 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:18:17.4046206Z W0907 11:18:17.403000 91579 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T11:18:37.6494720Z pass 2025-09-07T11:18:42.8679092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:18:42.8680571Z import pynvml # type: ignore[import] 2025-09-07T11:18:45.8677239Z 2025-09-07T11:18:47.8207531Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:18:47.8207911Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:18:47.8208256Z cuda train mixnet_l 2025-09-07T11:19:25.7441449Z Autotune Choices Stats: 2025-09-07T11:19:25.7442726Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_1338", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.010495999827980995, "best_triton_pos": 0} 2025-09-07T11:19:25.7527202Z AUTOTUNE addmm(8x132, 8x1584, 1584x132) 2025-09-07T11:19:25.7527512Z strides: [0, 1], [1584, 1], [1, 1584] 2025-09-07T11:19:25.7527853Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:19:25.7528575Z triton_mm_1338 0.0105 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:25.7529630Z triton_mm_1342 0.0115 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:25.7530623Z triton_mm_1346 0.0131 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:25.7531635Z triton_mm_1337 0.0140 ms 74.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:25.7532653Z triton_mm_1350 0.0142 ms 73.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:25.7533221Z addmm 0.0147 ms 71.5% 2025-09-07T11:19:25.7533754Z triton_mm_1336 0.0149 ms 70.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:25.7534660Z triton_mm_1341 0.0155 ms 67.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:25.7535805Z triton_mm_1335 0.0167 ms 62.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:19:25.7536710Z triton_mm_1345 0.0173 ms 60.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:25.7537500Z SingleProcess AUTOTUNE benchmarking takes 0.2577 seconds and 0.0004 seconds precompiling for 19 choices 2025-09-07T11:19:26.2796968Z Autotune Choices Stats: 2025-09-07T11:19:26.2798695Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_955", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.0074880002066493034, "best_triton_pos": 0} 2025-09-07T11:19:26.2884105Z AUTOTUNE addmm(8x80, 8x480, 480x80) 2025-09-07T11:19:26.2884418Z strides: [0, 1], [480, 1], [1, 480] 2025-09-07T11:19:26.2884729Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:19:26.2885842Z triton_mm_955 0.0075 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:26.2886896Z triton_mm_959 0.0076 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:26.2887540Z bias_addmm 0.0079 ms 94.7% 2025-09-07T11:19:26.2888172Z triton_mm_954 0.0083 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:26.2889185Z triton_mm_963 0.0084 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:26.2890171Z triton_mm_953 0.0085 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:26.2891138Z triton_mm_958 0.0089 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:26.2892185Z triton_mm_965 0.0090 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:26.2893091Z triton_mm_967 0.0090 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:26.2893993Z triton_mm_952 0.0091 ms 82.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:19:26.2894786Z SingleProcess AUTOTUNE benchmarking takes 0.2542 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:19:26.7732392Z Autotune Choices Stats: 2025-09-07T11:19:26.7733789Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "bias_addmm", "best_time": 0.008576000109314919, "best_triton_pos": 1, "best_triton_time": 0.008991999551653862, "best_triton_kernel": "triton_mm_1270", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:19:26.7816293Z AUTOTUNE addmm(8x80, 8x960, 960x80) 2025-09-07T11:19:26.7816622Z strides: [0, 1], [960, 1], [1, 960] 2025-09-07T11:19:26.7816942Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:19:26.7817304Z bias_addmm 0.0086 ms 100.0% 2025-09-07T11:19:26.7817956Z triton_mm_1270 0.0090 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:26.7818998Z triton_mm_1266 0.0091 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:26.7819996Z triton_mm_1274 0.0098 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:26.7820982Z triton_mm_1265 0.0105 ms 82.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:26.7822835Z triton_mm_1278 0.0105 ms 81.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:26.7823751Z triton_mm_1264 0.0109 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:26.7824324Z addmm 0.0111 ms 77.2% 2025-09-07T11:19:26.7824853Z triton_mm_1269 0.0113 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:26.7825963Z triton_mm_1273 0.0116 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:26.7826758Z SingleProcess AUTOTUNE benchmarking takes 0.2439 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:19:26.9952670Z Autotune Choices Stats: 2025-09-07T11:19:26.9953725Z {"num_choices": 15, "num_triton_choices": 13, "best_kernel": "triton_mm_868", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.007615999784320593, "best_triton_pos": 0} 2025-09-07T11:19:27.0037085Z AUTOTUNE addmm(8x52, 8x624, 624x52) 2025-09-07T11:19:27.0037394Z strides: [0, 1], [624, 1], [1, 624] 2025-09-07T11:19:27.0037707Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:19:27.0038432Z triton_mm_868 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:27.0039427Z triton_mm_872 0.0079 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:27.0040429Z triton_mm_876 0.0082 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:27.0041446Z triton_mm_875 0.0082 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:27.0042417Z triton_mm_867 0.0090 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:27.0043369Z triton_mm_866 0.0093 ms 81.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:27.0044255Z triton_mm_871 0.0094 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:27.0045366Z triton_mm_865 0.0101 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:19:27.0046311Z triton_mm_874 0.0101 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:27.0046916Z bias_addmm 0.0106 ms 71.9% 2025-09-07T11:19:27.0047378Z SingleProcess AUTOTUNE benchmarking takes 0.1998 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T11:19:27.3882539Z Autotune Choices Stats: 2025-09-07T11:19:27.3883700Z {"num_choices": 13, "num_triton_choices": 11, "best_kernel": "triton_mm_251", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.007071999832987785, "best_triton_pos": 0} 2025-09-07T11:19:27.3967333Z AUTOTUNE addmm(8x28, 8x336, 336x28) 2025-09-07T11:19:27.3968075Z strides: [0, 1], [336, 1], [1, 336] 2025-09-07T11:19:27.3968430Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:19:27.3969162Z triton_mm_251 0.0071 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:27.3970229Z triton_mm_257 0.0071 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T11:19:27.3971216Z triton_mm_258 0.0071 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:27.3972261Z triton_mm_250 0.0075 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:27.3973195Z triton_mm_254 0.0077 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T11:19:27.3974075Z triton_mm_256 0.0080 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T11:19:27.3975158Z triton_mm_249 0.0081 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:19:27.3976048Z triton_mm_255 0.0087 ms 81.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T11:19:27.3976622Z bias_addmm 0.0092 ms 77.0% 2025-09-07T11:19:27.3977166Z triton_mm_253 0.0113 ms 62.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:19:27.3977955Z SingleProcess AUTOTUNE benchmarking takes 0.1782 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T11:19:27.8131582Z Autotune Choices Stats: 2025-09-07T11:19:27.8132807Z {"num_choices": 13, "num_triton_choices": 11, "best_kernel": "triton_mm_592", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.007135999854654074, "best_triton_pos": 0} 2025-09-07T11:19:27.8219170Z AUTOTUNE addmm(8x26, 8x624, 624x26) 2025-09-07T11:19:27.8219466Z strides: [0, 1], [624, 1], [1, 624] 2025-09-07T11:19:27.8219779Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:19:27.8220531Z triton_mm_592 0.0071 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:27.8221668Z triton_mm_598 0.0076 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T11:19:27.8222701Z triton_mm_599 0.0078 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:27.8223585Z triton_mm_591 0.0089 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:27.8224466Z triton_mm_595 0.0090 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T11:19:27.8226238Z triton_mm_597 0.0098 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T11:19:27.8227368Z triton_mm_590 0.0102 ms 69.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:19:27.8227950Z bias_addmm 0.0109 ms 65.6% 2025-09-07T11:19:27.8228493Z triton_mm_596 0.0114 ms 62.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T11:19:27.8229047Z addmm 0.0133 ms 53.6% 2025-09-07T11:19:27.8229475Z SingleProcess AUTOTUNE benchmarking takes 0.1820 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T11:19:28.2349683Z Autotune Choices Stats: 2025-09-07T11:19:28.2350731Z {"num_choices": 13, "num_triton_choices": 11, "best_kernel": "triton_mm_182", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006624000146985054, "best_triton_pos": 0} 2025-09-07T11:19:28.2433988Z AUTOTUNE addmm(8x20, 8x240, 240x20) 2025-09-07T11:19:28.2434290Z strides: [0, 1], [240, 1], [1, 240] 2025-09-07T11:19:28.2434581Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:19:28.2435417Z triton_mm_182 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:28.2436416Z triton_mm_175 0.0069 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:28.2437391Z triton_mm_181 0.0069 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T11:19:28.2438368Z triton_mm_178 0.0071 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T11:19:28.2439332Z triton_mm_174 0.0071 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:28.2440289Z triton_mm_180 0.0073 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T11:19:28.2441256Z triton_mm_173 0.0074 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:19:28.2442219Z triton_mm_179 0.0079 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T11:19:28.2442844Z bias_addmm 0.0082 ms 80.9% 2025-09-07T11:19:28.2443446Z triton_mm_177 0.0097 ms 68.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:19:28.2444255Z SingleProcess AUTOTUNE benchmarking takes 0.1754 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T11:19:28.6226064Z Autotune Choices Stats: 2025-09-07T11:19:28.6227160Z {"num_choices": 13, "num_triton_choices": 11, "best_kernel": "triton_mm_518", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=1", "best_time": 0.006943999789655209, "best_triton_pos": 0} 2025-09-07T11:19:28.6311813Z AUTOTUNE addmm(8x14, 8x336, 336x14) 2025-09-07T11:19:28.6312105Z strides: [0, 1], [336, 1], [1, 336] 2025-09-07T11:19:28.6312926Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:19:28.6313855Z triton_mm_518 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=1 2025-09-07T11:19:28.6314863Z triton_mm_512 0.0070 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1 2025-09-07T11:19:28.6316006Z triton_mm_519 0.0071 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1 2025-09-07T11:19:28.6316957Z triton_mm_511 0.0075 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1 2025-09-07T11:19:28.6317909Z triton_mm_515 0.0077 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=1 2025-09-07T11:19:28.6318874Z triton_mm_517 0.0079 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=1 2025-09-07T11:19:28.6319830Z triton_mm_510 0.0081 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1 2025-09-07T11:19:28.6320777Z triton_mm_516 0.0085 ms 81.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=1 2025-09-07T11:19:28.6321385Z bias_addmm 0.0089 ms 78.3% 2025-09-07T11:19:28.6321969Z triton_mm_514 0.0108 ms 64.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1 2025-09-07T11:19:28.6322819Z SingleProcess AUTOTUNE benchmarking takes 0.1795 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T11:19:29.5449202Z Autotune Choices Stats: 2025-09-07T11:19:29.5451569Z {"num_choices": 7, "num_triton_choices": 6, "best_kernel": "convolution", "best_time": 0.015776000916957855, "best_triton_pos": 1, "best_triton_time": 0.023520000278949738, "best_triton_kernel": "triton_convolution2d_4", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T11:19:29.5538110Z AUTOTUNE convolution(8x3x224x224, 32x3x3x3) 2025-09-07T11:19:29.5538645Z strides: [150528, 1, 672, 3], [27, 1, 9, 3] 2025-09-07T11:19:29.5539125Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:29.5539582Z convolution 0.0158 ms 100.0% 2025-09-07T11:19:29.5540787Z triton_convolution2d_4 0.0235 ms 67.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:19:29.5542961Z triton_convolution2d_0 0.0269 ms 58.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:19:29.5544501Z triton_convolution2d_2 0.0274 ms 57.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T11:19:29.5546027Z triton_convolution2d_3 0.0289 ms 54.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:19:29.5547150Z triton_convolution2d_5 0.0344 ms 45.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T11:19:29.5549000Z triton_convolution2d_1 0.0416 ms 37.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T11:19:29.5549904Z SingleProcess AUTOTUNE benchmarking takes 0.1006 seconds and 0.0002 seconds precompiling for 7 choices 2025-09-07T11:19:29.7428391Z Autotune Choices Stats: 2025-09-07T11:19:29.7430074Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.010015999898314476, "best_triton_pos": 0} 2025-09-07T11:19:29.7514542Z AUTOTUNE mm(100352x32, 32x32) 2025-09-07T11:19:29.7514801Z strides: [32, 1], [1, 32] 2025-09-07T11:19:29.7515204Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:29.7515865Z triton_mm_17 0.0100 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:29.7516847Z triton_mm_12 0.0101 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:29.7517805Z triton_mm_14 0.0101 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:29.7518762Z triton_mm_18 0.0101 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:29.7519724Z triton_mm_16 0.0102 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:29.7520682Z triton_mm_20 0.0102 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:29.7521630Z triton_mm_10 0.0103 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:29.7522571Z triton_mm_19 0.0103 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:19:29.7523526Z triton_mm_7 0.0103 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:29.7524439Z triton_mm_8 0.0104 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:29.7525339Z SingleProcess AUTOTUNE benchmarking takes 0.1960 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T11:19:29.9499169Z Autotune Choices Stats: 2025-09-07T11:19:29.9500802Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.013535999692976475, "best_triton_pos": 0} 2025-09-07T11:19:29.9584176Z AUTOTUNE mm(100352x16, 16x96) 2025-09-07T11:19:29.9584635Z strides: [16, 1], [1, 16] 2025-09-07T11:19:29.9585262Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:29.9586342Z triton_mm_35 0.0135 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:29.9588436Z triton_mm_21 0.0136 ms 99.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:19:29.9590270Z triton_mm_30 0.0138 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:29.9591267Z mm 0.0140 ms 97.0% 2025-09-07T11:19:29.9592207Z triton_mm_26 0.0141 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:29.9593895Z triton_mm_28 0.0142 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:29.9594844Z triton_mm_29 0.0143 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:29.9595955Z triton_mm_33 0.0143 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:29.9596915Z triton_mm_31 0.0144 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:29.9597879Z triton_mm_36 0.0147 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:29.9598725Z SingleProcess AUTOTUNE benchmarking takes 0.2063 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T11:19:30.1674683Z Autotune Choices Stats: 2025-09-07T11:19:30.1675923Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_60", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.009247999638319016, "best_triton_pos": 0} 2025-09-07T11:19:30.1759816Z AUTOTUNE mm(25088x96, 96x20) 2025-09-07T11:19:30.1760064Z strides: [96, 1], [1, 96] 2025-09-07T11:19:30.1760313Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:30.1760959Z triton_mm_60 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:30.1761915Z triton_mm_62 0.0093 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:30.1762867Z triton_mm_63 0.0093 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:30.1763819Z triton_mm_61 0.0094 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:30.1764715Z triton_mm_56 0.0094 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:30.1765741Z triton_mm_65 0.0094 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:30.1766617Z triton_mm_66 0.0095 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:30.1767495Z triton_mm_57 0.0096 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:30.1768620Z triton_mm_68 0.0096 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:30.1769613Z triton_mm_59 0.0096 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:30.1770383Z SingleProcess AUTOTUNE benchmarking takes 0.2144 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:19:30.3859266Z Autotune Choices Stats: 2025-09-07T11:19:30.3860499Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_94", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007296000141650438, "best_triton_pos": 0} 2025-09-07T11:19:30.3984754Z AUTOTUNE mm(25088x20, 20x60) 2025-09-07T11:19:30.3985229Z strides: [20, 1], [1, 20] 2025-09-07T11:19:30.3985533Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:30.3986257Z triton_mm_94 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:30.3987343Z triton_mm_97 0.0074 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:30.3988420Z triton_mm_93 0.0075 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:30.3989460Z triton_mm_98 0.0075 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:30.3990491Z triton_mm_96 0.0076 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:30.3991551Z triton_mm_101 0.0076 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:19:30.3992601Z triton_mm_100 0.0076 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:30.3993770Z triton_mm_92 0.0077 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:30.3994807Z triton_mm_95 0.0077 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:30.3996013Z triton_mm_91 0.0077 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:30.3996944Z SingleProcess AUTOTUNE benchmarking takes 0.2189 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T11:19:30.6223841Z Autotune Choices Stats: 2025-09-07T11:19:30.6225792Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_126", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.008031999692320824, "best_triton_pos": 0} 2025-09-07T11:19:30.6313858Z AUTOTUNE mm(25088x60, 60x20) 2025-09-07T11:19:30.6314211Z strides: [60, 1], [1, 60] 2025-09-07T11:19:30.6314533Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:30.6315601Z triton_mm_126 0.0080 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:30.6317635Z triton_mm_123 0.0083 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:30.6319212Z triton_mm_130 0.0085 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:30.6320477Z triton_mm_127 0.0086 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:30.6321778Z triton_mm_129 0.0086 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:30.6323095Z triton_mm_131 0.0087 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:30.6324487Z triton_mm_122 0.0088 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:30.6325912Z triton_mm_135 0.0088 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:30.6327160Z triton_mm_128 0.0089 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:30.6328445Z triton_mm_134 0.0089 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:30.6329523Z SingleProcess AUTOTUNE benchmarking takes 0.2282 seconds and 0.0003 seconds precompiling for 18 choices 2025-09-07T11:19:30.8669142Z Autotune Choices Stats: 2025-09-07T11:19:30.8681101Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_164", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.012608000077307224, "best_triton_pos": 0} 2025-09-07T11:19:30.8753315Z AUTOTUNE mm(25088x40, 40x240) 2025-09-07T11:19:30.8753690Z strides: [40, 1], [1, 40] 2025-09-07T11:19:30.8754024Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:30.8754886Z triton_mm_164 0.0126 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:30.8756351Z triton_mm_165 0.0127 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:30.8757674Z triton_mm_161 0.0128 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:30.8758972Z triton_mm_169 0.0134 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:30.8760269Z triton_mm_170 0.0137 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:30.8761579Z triton_mm_171 0.0137 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:30.8762874Z triton_mm_160 0.0140 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:30.8764059Z mm 0.0142 ms 88.9% 2025-09-07T11:19:30.8765291Z triton_mm_166 0.0144 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:30.8766644Z triton_mm_167 0.0146 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:30.8767789Z SingleProcess AUTOTUNE benchmarking takes 0.2407 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:19:31.0670624Z Autotune Choices Stats: 2025-09-07T11:19:31.0671736Z {"num_choices": 15, "num_triton_choices": 13, "best_kernel": "triton_mm_189", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.005888000130653381, "best_triton_pos": 0} 2025-09-07T11:19:31.0757516Z AUTOTUNE addmm(8x240, 8x20, 20x240) 2025-09-07T11:19:31.0757815Z strides: [0, 1], [20, 1], [1, 20] 2025-09-07T11:19:31.0758126Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:19:31.0758829Z triton_mm_189 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:31.0759827Z triton_mm_194 0.0061 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:19:31.0760813Z triton_mm_195 0.0061 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:31.0761788Z triton_mm_193 0.0062 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:31.0762759Z triton_mm_188 0.0062 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:31.0763735Z triton_mm_191 0.0062 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:31.0764669Z triton_mm_185 0.0063 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:31.0765718Z triton_mm_186 0.0065 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:31.0766606Z triton_mm_184 0.0065 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:19:31.0767510Z triton_mm_183 0.0066 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:19:31.0768306Z SingleProcess AUTOTUNE benchmarking takes 0.1999 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T11:19:31.2964121Z Autotune Choices Stats: 2025-09-07T11:19:31.2965637Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_208", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.008736000396311283, "best_triton_pos": 0} 2025-09-07T11:19:31.3050983Z AUTOTUNE mm(6272x240, 240x56) 2025-09-07T11:19:31.3051300Z strides: [240, 1], [1, 240] 2025-09-07T11:19:31.3051566Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:31.3052974Z triton_mm_208 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:31.3054212Z triton_mm_204 0.0089 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:31.3055426Z triton_mm_203 0.0089 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:31.3056387Z triton_mm_207 0.0089 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:31.3057001Z mm 0.0091 ms 95.8% 2025-09-07T11:19:31.3057582Z triton_mm_197 0.0093 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:31.3058569Z triton_mm_206 0.0093 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:31.3059574Z triton_mm_213 0.0094 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:31.3060565Z triton_mm_212 0.0095 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:31.3061610Z triton_mm_199 0.0095 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:31.3062466Z SingleProcess AUTOTUNE benchmarking takes 0.2289 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:19:31.5122543Z Autotune Choices Stats: 2025-09-07T11:19:31.5123693Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_216", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.0072639998979866505, "best_triton_pos": 0} 2025-09-07T11:19:31.5209634Z AUTOTUNE mm(6272x28, 28x168) 2025-09-07T11:19:31.5209909Z strides: [28, 1], [1, 28] 2025-09-07T11:19:31.5210178Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:31.5210840Z triton_mm_216 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:31.5211830Z triton_mm_220 0.0073 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:31.5212808Z triton_mm_219 0.0073 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:31.5213884Z triton_mm_221 0.0073 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:31.5214787Z triton_mm_217 0.0074 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:31.5215840Z triton_mm_224 0.0074 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:31.5216731Z triton_mm_222 0.0074 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:31.5218145Z triton_mm_225 0.0075 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:31.5219270Z triton_mm_227 0.0075 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:31.5220169Z triton_mm_223 0.0075 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:31.5220952Z SingleProcess AUTOTUNE benchmarking takes 0.2154 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:19:31.7129852Z Autotune Choices Stats: 2025-09-07T11:19:31.7130877Z {"num_choices": 15, "num_triton_choices": 13, "best_kernel": "triton_mm_261", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.00595200015231967, "best_triton_pos": 0} 2025-09-07T11:19:31.7216658Z AUTOTUNE addmm(8x336, 8x28, 28x336) 2025-09-07T11:19:31.7216984Z strides: [0, 1], [28, 1], [1, 28] 2025-09-07T11:19:31.7217298Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:19:31.7217986Z triton_mm_261 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:31.7218972Z triton_mm_264 0.0060 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:31.7219958Z triton_mm_265 0.0060 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:31.7220913Z triton_mm_269 0.0060 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:31.7221967Z triton_mm_267 0.0061 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:31.7222934Z triton_mm_270 0.0061 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:19:31.7223948Z triton_mm_260 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:19:31.7224768Z triton_mm_262 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:31.7226030Z triton_mm_271 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:31.7226868Z triton_mm_259 0.0063 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:19:31.7227593Z SingleProcess AUTOTUNE benchmarking takes 0.1974 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T11:19:31.9287345Z Autotune Choices Stats: 2025-09-07T11:19:31.9288381Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_275", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.008191999979317188, "best_triton_pos": 0} 2025-09-07T11:19:31.9375898Z AUTOTUNE mm(6272x168, 168x28) 2025-09-07T11:19:31.9376673Z strides: [168, 1], [1, 168] 2025-09-07T11:19:31.9376961Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:31.9377867Z triton_mm_275 0.0082 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:31.9378885Z triton_mm_282 0.0083 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:31.9379852Z triton_mm_281 0.0083 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:31.9380824Z triton_mm_279 0.0083 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:31.9381898Z triton_mm_288 0.0084 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:31.9382881Z triton_mm_287 0.0084 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:31.9383909Z triton_mm_285 0.0085 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:31.9384753Z triton_mm_276 0.0087 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:31.9385823Z triton_mm_283 0.0087 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:31.9386659Z triton_mm_280 0.0088 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:31.9387401Z SingleProcess AUTOTUNE benchmarking takes 0.2154 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:19:32.2012424Z Autotune Choices Stats: 2025-09-07T11:19:32.2013434Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_498", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008352000266313553, "best_triton_pos": 0} 2025-09-07T11:19:32.2099235Z AUTOTUNE mm(6272x56, 56x336) 2025-09-07T11:19:32.2099537Z strides: [56, 1], [1, 56] 2025-09-07T11:19:32.2099797Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:32.2100451Z triton_mm_498 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:32.2101535Z triton_mm_497 0.0085 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:32.2102542Z triton_mm_501 0.0086 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:32.2103529Z triton_mm_502 0.0086 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:32.2104490Z triton_mm_508 0.0087 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:32.2105491Z triton_mm_494 0.0088 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:32.2107152Z triton_mm_507 0.0089 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:32.2108005Z triton_mm_505 0.0096 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:19:32.2108837Z triton_mm_506 0.0097 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:32.2109673Z triton_mm_504 0.0098 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:32.2110401Z SingleProcess AUTOTUNE benchmarking takes 0.2421 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:19:32.3881873Z Autotune Choices Stats: 2025-09-07T11:19:32.3882907Z {"num_choices": 14, "num_triton_choices": 12, "best_kernel": "triton_mm_530", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8", "best_time": 0.005824000108987093, "best_triton_pos": 0} 2025-09-07T11:19:32.3970481Z AUTOTUNE addmm(8x336, 8x14, 14x336) 2025-09-07T11:19:32.3970830Z strides: [0, 1], [14, 1], [1, 14] 2025-09-07T11:19:32.3971136Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:19:32.3971845Z triton_mm_530 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:19:32.3972859Z triton_mm_522 0.0059 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:32.3973908Z triton_mm_525 0.0059 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:32.3974805Z triton_mm_520 0.0059 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:19:32.3976128Z triton_mm_524 0.0060 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:32.3977009Z triton_mm_529 0.0060 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:32.3977891Z triton_mm_521 0.0060 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:19:32.3978793Z triton_mm_523 0.0060 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:32.3979684Z triton_mm_531 0.0060 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:32.3980584Z triton_mm_527 0.0061 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:32.3981369Z SingleProcess AUTOTUNE benchmarking takes 0.1865 seconds and 0.0002 seconds precompiling for 14 choices 2025-09-07T11:19:32.6299511Z Autotune Choices Stats: 2025-09-07T11:19:32.6300546Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_536", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.00774399982765317, "best_triton_pos": 0} 2025-09-07T11:19:32.6388267Z AUTOTUNE mm(1568x336, 336x104) 2025-09-07T11:19:32.6388553Z strides: [336, 1], [1, 336] 2025-09-07T11:19:32.6388819Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:32.6389518Z triton_mm_536 0.0077 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:32.6390514Z triton_mm_540 0.0081 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:32.6391140Z mm 0.0082 ms 94.2% 2025-09-07T11:19:32.6391732Z triton_mm_534 0.0082 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:32.6392715Z triton_mm_535 0.0084 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:32.6393765Z triton_mm_539 0.0087 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:32.6394814Z triton_mm_533 0.0089 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:32.6396127Z triton_mm_544 0.0090 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:32.6397110Z triton_mm_543 0.0092 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:32.6398093Z triton_mm_542 0.0093 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:32.6398939Z SingleProcess AUTOTUNE benchmarking takes 0.2413 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:19:32.8683587Z Autotune Choices Stats: 2025-09-07T11:19:32.8684817Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_552", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.007040000054985285, "best_triton_pos": 0} 2025-09-07T11:19:32.8772378Z AUTOTUNE mm(1568x52, 52x312) 2025-09-07T11:19:32.8772633Z strides: [52, 1], [1, 52] 2025-09-07T11:19:32.8772870Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:32.8773488Z triton_mm_552 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:32.8774579Z triton_mm_562 0.0071 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:32.8775800Z triton_mm_559 0.0072 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:32.8776774Z triton_mm_554 0.0073 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:32.8777736Z triton_mm_557 0.0075 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:32.8779119Z triton_mm_558 0.0075 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:32.8780294Z triton_mm_563 0.0075 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:32.8781277Z triton_mm_553 0.0076 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:32.8782344Z triton_mm_555 0.0076 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:32.8783307Z triton_mm_561 0.0076 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:32.8784189Z SingleProcess AUTOTUNE benchmarking takes 0.2379 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:19:33.0707940Z Autotune Choices Stats: 2025-09-07T11:19:33.0709004Z {"num_choices": 15, "num_triton_choices": 13, "best_kernel": "triton_mm_602", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.005824000108987093, "best_triton_pos": 0} 2025-09-07T11:19:33.0797472Z AUTOTUNE addmm(8x624, 8x26, 26x624) 2025-09-07T11:19:33.0797756Z strides: [0, 1], [26, 1], [1, 26] 2025-09-07T11:19:33.0798051Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:19:33.0798724Z triton_mm_602 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:33.0799705Z triton_mm_603 0.0060 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:33.0800683Z triton_mm_605 0.0061 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:33.0801634Z triton_mm_601 0.0061 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:19:33.0802598Z triton_mm_606 0.0061 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:33.0803548Z triton_mm_610 0.0061 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:33.0804516Z triton_mm_611 0.0063 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:19:33.0805603Z triton_mm_608 0.0063 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:33.0806497Z triton_mm_609 0.0063 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:33.0807390Z triton_mm_612 0.0063 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:33.0808158Z SingleProcess AUTOTUNE benchmarking takes 0.1991 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T11:19:33.2994686Z Autotune Choices Stats: 2025-09-07T11:19:33.2996669Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_617", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007712000049650669, "best_triton_pos": 0} 2025-09-07T11:19:33.3085495Z AUTOTUNE mm(1568x312, 312x52) 2025-09-07T11:19:33.3085786Z strides: [312, 1], [1, 312] 2025-09-07T11:19:33.3086055Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:33.3086718Z triton_mm_617 0.0077 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:33.3087726Z triton_mm_625 0.0082 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:33.3088720Z triton_mm_621 0.0082 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:33.3089701Z triton_mm_616 0.0083 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:33.3090663Z triton_mm_624 0.0084 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:33.3091626Z triton_mm_615 0.0085 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:33.3092618Z triton_mm_620 0.0087 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:33.3093598Z triton_mm_614 0.0088 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:33.3094643Z triton_mm_623 0.0090 ms 86.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:33.3095722Z triton_mm_630 0.0090 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:33.3096521Z SingleProcess AUTOTUNE benchmarking takes 0.2283 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:19:33.5719180Z Autotune Choices Stats: 2025-09-07T11:19:33.5720192Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_854", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.00800000037997961, "best_triton_pos": 0} 2025-09-07T11:19:33.5808153Z AUTOTUNE mm(1568x104, 104x624) 2025-09-07T11:19:33.5808438Z strides: [104, 1], [1, 104] 2025-09-07T11:19:33.5808717Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:33.5809373Z triton_mm_854 0.0080 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:33.5810361Z triton_mm_857 0.0080 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:33.5811331Z triton_mm_855 0.0082 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:33.5811956Z mm 0.0083 ms 96.9% 2025-09-07T11:19:33.5812534Z triton_mm_856 0.0083 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:33.5814439Z triton_mm_852 0.0084 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:33.5815762Z triton_mm_859 0.0085 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:33.5816656Z triton_mm_853 0.0085 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:33.5817549Z triton_mm_862 0.0085 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:33.5818454Z triton_mm_858 0.0086 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:33.5819263Z SingleProcess AUTOTUNE benchmarking takes 0.2413 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:19:33.8160814Z Autotune Choices Stats: 2025-09-07T11:19:33.8161853Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_880", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.0063680000603199005, "best_triton_pos": 0} 2025-09-07T11:19:33.8285918Z AUTOTUNE addmm(8x624, 8x52, 52x624) 2025-09-07T11:19:33.8286233Z strides: [0, 1], [52, 1], [1, 52] 2025-09-07T11:19:33.8286553Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:19:33.8287259Z triton_mm_880 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:33.8288296Z triton_mm_885 0.0064 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:33.8289283Z triton_mm_883 0.0065 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:33.8290253Z triton_mm_893 0.0065 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:33.8291233Z triton_mm_889 0.0065 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:33.8292197Z triton_mm_884 0.0065 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:33.8293165Z triton_mm_891 0.0065 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:33.8294181Z triton_mm_879 0.0066 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:33.8295423Z triton_mm_887 0.0066 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:33.8296325Z triton_mm_890 0.0066 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:33.8297698Z SingleProcess AUTOTUNE benchmarking takes 0.2470 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:19:34.0625682Z Autotune Choices Stats: 2025-09-07T11:19:34.0627371Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_898", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008799999952316284, "best_triton_pos": 0} 2025-09-07T11:19:34.0715465Z AUTOTUNE mm(1568x624, 624x160) 2025-09-07T11:19:34.0715718Z strides: [624, 1], [1, 624] 2025-09-07T11:19:34.0715981Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:34.0716623Z triton_mm_898 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:34.0717612Z triton_mm_902 0.0091 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:34.0718236Z mm 0.0092 ms 96.2% 2025-09-07T11:19:34.0718800Z triton_mm_897 0.0102 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:34.0719776Z triton_mm_906 0.0103 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:34.0720752Z triton_mm_901 0.0105 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:34.0721729Z triton_mm_895 0.0111 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:34.0722700Z triton_mm_905 0.0112 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:34.0723678Z triton_mm_912 0.0116 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:34.0724646Z triton_mm_904 0.0117 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:34.0725627Z SingleProcess AUTOTUNE benchmarking takes 0.2422 seconds and 0.0005 seconds precompiling for 20 choices 2025-09-07T11:19:34.3039149Z Autotune Choices Stats: 2025-09-07T11:19:34.3040126Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_916", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.007104000076651573, "best_triton_pos": 0} 2025-09-07T11:19:34.3130431Z AUTOTUNE mm(1568x80, 80x240) 2025-09-07T11:19:34.3130722Z strides: [80, 1], [1, 80] 2025-09-07T11:19:34.3130990Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:34.3131665Z triton_mm_916 0.0071 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:34.3132651Z triton_mm_920 0.0072 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:34.3133623Z triton_mm_914 0.0072 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:34.3134657Z triton_mm_923 0.0072 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:34.3136674Z triton_mm_927 0.0072 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:34.3137593Z triton_mm_922 0.0073 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:34.3138481Z triton_mm_926 0.0073 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:34.3139378Z triton_mm_924 0.0073 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:34.3140273Z triton_mm_915 0.0074 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:34.3140845Z mm 0.0075 ms 95.3% 2025-09-07T11:19:34.3141257Z SingleProcess AUTOTUNE benchmarking takes 0.2409 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:19:34.5559620Z Autotune Choices Stats: 2025-09-07T11:19:34.5560644Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_982", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.006463999859988689, "best_triton_pos": 0} 2025-09-07T11:19:34.5658470Z AUTOTUNE addmm(8x480, 8x80, 80x480) 2025-09-07T11:19:34.5658728Z strides: [0, 1], [80, 1], [1, 80] 2025-09-07T11:19:34.5659001Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:19:34.5659658Z triton_mm_982 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:34.5660639Z triton_mm_981 0.0065 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:34.5661607Z triton_mm_971 0.0067 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:34.5662490Z triton_mm_970 0.0068 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:34.5663381Z triton_mm_978 0.0069 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:34.5664291Z triton_mm_975 0.0070 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:34.5665603Z triton_mm_979 0.0070 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:34.5666568Z triton_mm_984 0.0071 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:34.5667530Z triton_mm_977 0.0071 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:34.5668486Z triton_mm_974 0.0073 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:34.5669322Z SingleProcess AUTOTUNE benchmarking takes 0.2494 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:19:34.7989384Z Autotune Choices Stats: 2025-09-07T11:19:34.7991398Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_989", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007391999941319227, "best_triton_pos": 0} 2025-09-07T11:19:34.8081809Z AUTOTUNE mm(1568x240, 240x80) 2025-09-07T11:19:34.8082060Z strides: [240, 1], [1, 240] 2025-09-07T11:19:34.8082307Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:34.8082958Z triton_mm_989 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:34.8083931Z triton_mm_993 0.0076 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:34.8085914Z triton_mm_987 0.0077 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:34.8086877Z mm 0.0077 ms 95.9% 2025-09-07T11:19:34.8087769Z triton_mm_988 0.0077 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:34.8089304Z triton_mm_992 0.0079 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:34.8090818Z triton_mm_996 0.0080 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:34.8092324Z triton_mm_986 0.0081 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:34.8093875Z triton_mm_997 0.0083 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:34.8095467Z triton_mm_995 0.0084 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:34.8096251Z SingleProcess AUTOTUNE benchmarking takes 0.2418 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:19:35.0727730Z Autotune Choices Stats: 2025-09-07T11:19:35.0729411Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.008736000396311283, "best_triton_pos": 1, "best_triton_time": 0.008799999952316284, "best_triton_kernel": "triton_mm_1257", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8"} 2025-09-07T11:19:35.0819496Z AUTOTUNE mm(1568x160, 160x960) 2025-09-07T11:19:35.0819847Z strides: [160, 1], [1, 160] 2025-09-07T11:19:35.0820200Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:35.0820584Z mm 0.0087 ms 100.0% 2025-09-07T11:19:35.0821449Z triton_mm_1257 0.0088 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:35.0822837Z triton_mm_1256 0.0089 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:35.0824211Z triton_mm_1252 0.0091 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:35.0826293Z triton_mm_1253 0.0091 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:35.0827871Z triton_mm_1259 0.0093 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:35.0829257Z triton_mm_1261 0.0093 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:35.0830628Z triton_mm_1254 0.0093 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:35.0832001Z triton_mm_1260 0.0094 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:35.0833402Z triton_mm_1250 0.0098 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:35.0834610Z SingleProcess AUTOTUNE benchmarking takes 0.2419 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:19:36.0607641Z Autotune Choices Stats: 2025-09-07T11:19:36.0609281Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_1282", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006591999903321266, "best_triton_pos": 0} 2025-09-07T11:19:36.0721652Z AUTOTUNE addmm(8x960, 8x80, 80x960) 2025-09-07T11:19:36.0721925Z strides: [0, 1], [80, 1], [1, 80] 2025-09-07T11:19:36.0722224Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:19:36.0722925Z triton_mm_1282 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:36.0723961Z triton_mm_1292 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:36.0725445Z triton_mm_1281 0.0068 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:36.0726511Z triton_mm_1286 0.0068 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:36.0727492Z triton_mm_1288 0.0070 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:36.0728477Z triton_mm_1289 0.0071 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:36.0729475Z triton_mm_1290 0.0071 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:36.0730442Z triton_mm_1285 0.0074 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:36.0731405Z triton_mm_1295 0.0074 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:36.0732368Z triton_mm_1287 0.0075 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:36.0733517Z SingleProcess AUTOTUNE benchmarking takes 0.9871 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:19:36.3129425Z Autotune Choices Stats: 2025-09-07T11:19:36.3131702Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.009247999638319016, "best_triton_pos": 1, "best_triton_time": 0.009247999638319016, "best_triton_kernel": "triton_mm_1300", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:19:36.3228840Z AUTOTUNE mm(392x960, 960x264) 2025-09-07T11:19:36.3229083Z strides: [960, 1], [1, 960] 2025-09-07T11:19:36.3229306Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:36.3229562Z mm 0.0092 ms 100.0% 2025-09-07T11:19:36.3230112Z triton_mm_1300 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:36.3231057Z triton_mm_1304 0.0096 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:36.3231969Z triton_mm_1308 0.0106 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:36.3232866Z triton_mm_1299 0.0113 ms 81.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:36.3233763Z triton_mm_1303 0.0115 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:36.3234675Z triton_mm_1298 0.0117 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:36.3235970Z triton_mm_1307 0.0119 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:36.3236954Z triton_mm_1314 0.0121 ms 76.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:36.3237938Z triton_mm_1297 0.0128 ms 72.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:36.3238782Z SingleProcess AUTOTUNE benchmarking takes 0.2500 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:19:36.5590996Z Autotune Choices Stats: 2025-09-07T11:19:36.5592145Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.008704000152647495, "best_triton_pos": 1, "best_triton_time": 0.009119999594986439, "best_triton_kernel": "triton_mm_1325", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8"} 2025-09-07T11:19:36.5684020Z AUTOTUNE mm(392x264, 264x1584) 2025-09-07T11:19:36.5684285Z strides: [264, 1], [1, 264] 2025-09-07T11:19:36.5684539Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:36.5684850Z mm 0.0087 ms 100.0% 2025-09-07T11:19:36.5686282Z triton_mm_1325 0.0091 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:36.5687946Z triton_mm_1326 0.0092 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:36.5689570Z triton_mm_1329 0.0093 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:36.5691790Z triton_mm_1327 0.0097 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:36.5693368Z triton_mm_1324 0.0098 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:36.5695212Z triton_mm_1322 0.0099 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:36.5696294Z triton_mm_1333 0.0099 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:36.5697207Z triton_mm_1328 0.0100 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:36.5698120Z triton_mm_1332 0.0104 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:36.5698911Z SingleProcess AUTOTUNE benchmarking takes 0.2436 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:19:36.8053491Z Autotune Choices Stats: 2025-09-07T11:19:36.8055486Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_1354", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006943999789655209, "best_triton_pos": 0} 2025-09-07T11:19:36.8150336Z AUTOTUNE addmm(8x1584, 8x132, 132x1584) 2025-09-07T11:19:36.8150603Z strides: [0, 1], [132, 1], [1, 132] 2025-09-07T11:19:36.8150890Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:19:36.8151535Z triton_mm_1354 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:36.8152471Z triton_mm_1355 0.0072 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:36.8153393Z triton_mm_1353 0.0073 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:36.8154303Z triton_mm_1352 0.0074 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:19:36.8155381Z triton_mm_1364 0.0074 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:36.8156356Z triton_mm_1358 0.0075 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:36.8157331Z triton_mm_1359 0.0075 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:36.8158303Z triton_mm_1365 0.0076 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:36.8159268Z triton_mm_1361 0.0076 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:36.8160447Z triton_mm_1362 0.0078 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:36.8161437Z SingleProcess AUTOTUNE benchmarking takes 0.2449 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:19:37.0487200Z Autotune Choices Stats: 2025-09-07T11:19:37.0488793Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1372", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.00940799992531538, "best_triton_pos": 0} 2025-09-07T11:19:37.0581527Z AUTOTUNE mm(392x792, 792x132) 2025-09-07T11:19:37.0581956Z strides: [792, 1], [1, 792] 2025-09-07T11:19:37.0582369Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:37.0583465Z triton_mm_1372 0.0094 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:37.0585703Z triton_mm_1376 0.0101 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:37.0586812Z triton_mm_1371 0.0105 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:37.0587777Z triton_mm_1375 0.0108 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:37.0588738Z triton_mm_1370 0.0109 ms 86.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:37.0589704Z triton_mm_1369 0.0118 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:19:37.0590686Z triton_mm_1379 0.0120 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:37.0591667Z triton_mm_1380 0.0121 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:37.0592298Z mm 0.0126 ms 74.4% 2025-09-07T11:19:37.0592874Z triton_mm_1378 0.0127 ms 73.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:37.0593725Z SingleProcess AUTOTUNE benchmarking takes 0.2426 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:19:37.3200009Z Autotune Choices Stats: 2025-09-07T11:19:37.3201959Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.00863999966531992, "best_triton_pos": 1, "best_triton_time": 0.009216000325977802, "best_triton_kernel": "triton_mm_1602", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8"} 2025-09-07T11:19:37.3294753Z AUTOTUNE mm(392x264, 264x1536) 2025-09-07T11:19:37.3295434Z strides: [264, 1], [1, 264] 2025-09-07T11:19:37.3295870Z dtypes: torch.float16, torch.float16 2025-09-07T11:19:37.3296309Z mm 0.0086 ms 100.0% 2025-09-07T11:19:37.3297255Z triton_mm_1602 0.0092 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:37.3298840Z triton_mm_1598 0.0094 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:19:37.3300827Z triton_mm_1599 0.0094 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:37.3302783Z triton_mm_1600 0.0097 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:37.3304354Z triton_mm_1595 0.0098 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:19:37.3306091Z triton_mm_1597 0.0098 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:37.3307066Z triton_mm_1606 0.0100 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:37.3308053Z triton_mm_1601 0.0101 ms 85.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:37.3309031Z triton_mm_1605 0.0103 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:37.3309877Z SingleProcess AUTOTUNE benchmarking takes 0.2423 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:19:37.6862624Z Autotune Choices Stats: 2025-09-07T11:19:37.6863654Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_1611", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.00979200005531311, "best_triton_pos": 0} 2025-09-07T11:19:37.6961813Z AUTOTUNE addmm(8x1000, 8x1536, 1536x1000) 2025-09-07T11:19:37.6962315Z strides: [0, 1], [1536, 1], [1, 1536] 2025-09-07T11:19:37.6962809Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T11:19:37.6963939Z triton_mm_1611 0.0098 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:37.6965845Z triton_mm_1615 0.0104 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:37.6966420Z bias_addmm 0.0105 ms 93.6% 2025-09-07T11:19:37.6966974Z triton_mm_1619 0.0122 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:19:37.6967869Z triton_mm_1623 0.0132 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:19:37.6968442Z addmm 0.0134 ms 72.9% 2025-09-07T11:19:37.6968973Z triton_mm_1610 0.0146 ms 67.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:19:37.6969861Z triton_mm_1609 0.0154 ms 63.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:19:37.6970746Z triton_mm_1614 0.0155 ms 63.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:19:37.6971644Z triton_mm_1608 0.0160 ms 61.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:19:37.6972746Z SingleProcess AUTOTUNE benchmarking takes 0.3650 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:20:06.7181194Z Autotune Choices Stats: 2025-09-07T11:20:06.7183053Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_1648", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006624000146985054, "best_triton_pos": 0} 2025-09-07T11:20:06.7281754Z AUTOTUNE mm(1000x8, 8x1536) 2025-09-07T11:20:06.7282013Z strides: [1, 1000], [1536, 1] 2025-09-07T11:20:06.7282278Z dtypes: torch.float16, torch.float16 2025-09-07T11:20:06.7282945Z triton_mm_1648 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:06.7284152Z triton_mm_1646 0.0067 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:20:06.7285510Z triton_mm_1652 0.0067 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:06.7286506Z triton_mm_1653 0.0068 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:20:06.7287483Z triton_mm_1647 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:20:06.7288464Z triton_mm_1651 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:20:06.7289459Z triton_mm_1649 0.0068 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:06.7290437Z triton_mm_1650 0.0068 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:20:06.7291409Z triton_mm_1645 0.0069 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:06.7292373Z triton_mm_1644 0.0069 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:20:06.7293231Z SingleProcess AUTOTUNE benchmarking takes 0.1702 seconds and 0.0004 seconds precompiling for 17 choices 2025-09-07T11:20:07.3671573Z Autotune Choices Stats: 2025-09-07T11:20:07.3672851Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "mm", "best_time": 0.009312000125646591, "best_triton_pos": 1, "best_triton_time": 0.009824000298976898, "best_triton_kernel": "triton_mm_1632", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:20:07.3765966Z AUTOTUNE mm(8x1000, 1000x1536) 2025-09-07T11:20:07.3766276Z strides: [1000, 1], [1536, 1] 2025-09-07T11:20:07.3766553Z dtypes: torch.float16, torch.float16 2025-09-07T11:20:07.3766842Z mm 0.0093 ms 100.0% 2025-09-07T11:20:07.3767514Z triton_mm_1632 0.0098 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:07.3768546Z triton_mm_1628 0.0100 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:20:07.3770161Z triton_mm_1636 0.0102 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:20:07.3771369Z triton_mm_1640 0.0115 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:20:07.3772367Z triton_mm_1626 0.0117 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:07.3773345Z triton_mm_1627 0.0119 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:20:07.3774269Z triton_mm_1631 0.0126 ms 73.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:07.3775289Z triton_mm_1635 0.0129 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:07.3776125Z triton_mm_1638 0.0132 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:20:07.3776856Z SingleProcess AUTOTUNE benchmarking takes 0.1939 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:20:23.9142803Z pass 2025-09-07T11:20:30.2205709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:20:30.2206964Z import pynvml # type: ignore[import] 2025-09-07T11:20:33.2637456Z 2025-09-07T11:20:34.6748128Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:20:34.6748622Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:20:34.6749056Z cuda train mnasnet_100 2025-09-07T11:20:57.3240459Z Autotune Choices Stats: 2025-09-07T11:20:57.3242166Z {"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_mm_12", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.009535999968647957, "best_triton_pos": 0} 2025-09-07T11:20:57.3342866Z AUTOTUNE mm(100352x32, 32x16) 2025-09-07T11:20:57.3343153Z strides: [32, 1], [1, 32] 2025-09-07T11:20:57.3343442Z dtypes: torch.float16, torch.float16 2025-09-07T11:20:57.3344116Z triton_mm_12 0.0095 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:57.3345272Z triton_mm_13 0.0096 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:20:57.3346246Z triton_mm_9 0.0096 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:57.3347200Z triton_mm_7 0.0097 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T11:20:57.3348303Z triton_mm_11 0.0097 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:20:57.3349257Z triton_mm_16 0.0097 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:20:57.3350678Z triton_mm_8 0.0097 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T11:20:57.3351854Z triton_mm_17 0.0097 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:20:57.3352816Z triton_mm_14 0.0098 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:57.3353768Z triton_mm_15 0.0098 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:20:57.3354614Z SingleProcess AUTOTUNE benchmarking takes 0.1762 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T11:20:57.5273136Z Autotune Choices Stats: 2025-09-07T11:20:57.5274120Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_mm_23", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.010463999584317207, "best_triton_pos": 0} 2025-09-07T11:20:57.5371134Z AUTOTUNE mm(100352x16, 16x48) 2025-09-07T11:20:57.5371366Z strides: [16, 1], [1, 16] 2025-09-07T11:20:57.5371584Z dtypes: torch.float16, torch.float16 2025-09-07T11:20:57.5372150Z triton_mm_23 0.0105 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:20:57.5373007Z triton_mm_32 0.0105 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:20:57.5373867Z triton_mm_28 0.0105 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:20:57.5374725Z triton_mm_25 0.0106 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:57.5375692Z triton_mm_26 0.0107 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:57.5376518Z triton_mm_31 0.0108 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:20:57.5377340Z triton_mm_30 0.0108 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:20:57.5378296Z triton_mm_29 0.0110 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:57.5379250Z triton_mm_18 0.0111 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:20:57.5379852Z mm 0.0111 ms 94.0% 2025-09-07T11:20:57.5380292Z SingleProcess AUTOTUNE benchmarking takes 0.2023 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T11:20:57.7540993Z Autotune Choices Stats: 2025-09-07T11:20:57.7542684Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_40", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007679999805986881, "best_triton_pos": 0} 2025-09-07T11:20:57.7640308Z AUTOTUNE mm(25088x48, 48x24) 2025-09-07T11:20:57.7640718Z strides: [48, 1], [1, 48] 2025-09-07T11:20:57.7641554Z dtypes: torch.float16, torch.float16 2025-09-07T11:20:57.7642834Z triton_mm_40 0.0077 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:20:57.7644466Z triton_mm_34 0.0077 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:20:57.7646315Z triton_mm_44 0.0078 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:20:57.7647911Z triton_mm_43 0.0079 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:57.7649067Z triton_mm_37 0.0079 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:57.7649961Z triton_mm_49 0.0080 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:20:57.7650850Z triton_mm_42 0.0080 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:20:57.7651737Z triton_mm_48 0.0081 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:57.7652624Z triton_mm_36 0.0081 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:20:57.7653522Z triton_mm_46 0.0082 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:20:57.7654307Z SingleProcess AUTOTUNE benchmarking takes 0.2263 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:20:57.9817278Z Autotune Choices Stats: 2025-09-07T11:20:57.9818880Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_60", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.007712000049650669, "best_triton_pos": 0} 2025-09-07T11:20:57.9920300Z AUTOTUNE mm(25088x24, 24x72) 2025-09-07T11:20:57.9920573Z strides: [24, 1], [1, 24] 2025-09-07T11:20:57.9920825Z dtypes: torch.float16, torch.float16 2025-09-07T11:20:57.9921493Z triton_mm_60 0.0077 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:20:57.9922492Z triton_mm_61 0.0077 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:20:57.9923468Z triton_mm_59 0.0078 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:57.9924462Z triton_mm_64 0.0079 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:20:57.9925787Z triton_mm_66 0.0081 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:20:57.9926772Z triton_mm_65 0.0082 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:57.9928246Z triton_mm_56 0.0083 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:20:57.9929148Z triton_mm_54 0.0084 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:57.9930034Z triton_mm_58 0.0084 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:57.9930921Z triton_mm_63 0.0085 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:20:57.9931709Z SingleProcess AUTOTUNE benchmarking takes 0.2275 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:20:58.2086071Z Autotune Choices Stats: 2025-09-07T11:20:58.2087677Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_76", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.008799999952316284, "best_triton_pos": 0} 2025-09-07T11:20:58.2183933Z AUTOTUNE mm(25088x72, 72x24) 2025-09-07T11:20:58.2184344Z strides: [72, 1], [1, 72] 2025-09-07T11:20:58.2184754Z dtypes: torch.float16, torch.float16 2025-09-07T11:20:58.2186084Z triton_mm_76 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:20:58.2187633Z triton_mm_70 0.0089 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:20:58.2189117Z triton_mm_80 0.0090 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:20:58.2190085Z triton_mm_78 0.0093 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:20:58.2191052Z triton_mm_71 0.0094 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:58.2191663Z mm 0.0094 ms 93.2% 2025-09-07T11:20:58.2192226Z triton_mm_68 0.0095 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:20:58.2193188Z triton_mm_81 0.0095 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:20:58.2194157Z triton_mm_67 0.0095 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:20:58.2195250Z triton_mm_72 0.0096 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:20:58.2196090Z SingleProcess AUTOTUNE benchmarking takes 0.2258 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:20:58.4596241Z Autotune Choices Stats: 2025-09-07T11:20:58.4597449Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "mm", "best_time": 0.007679999805986881, "best_triton_pos": 1, "best_triton_time": 0.00774399982765317, "best_triton_kernel": "triton_mm_136", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4"} 2025-09-07T11:20:58.4694852Z AUTOTUNE mm(6272x72, 72x40) 2025-09-07T11:20:58.4695575Z strides: [72, 1], [1, 72] 2025-09-07T11:20:58.4696005Z dtypes: torch.float16, torch.float16 2025-09-07T11:20:58.4696828Z mm 0.0077 ms 100.0% 2025-09-07T11:20:58.4697854Z triton_mm_136 0.0077 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:20:58.4699115Z triton_mm_138 0.0078 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:20:58.4699961Z triton_mm_148 0.0079 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:58.4700794Z triton_mm_151 0.0080 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:58.4701720Z triton_mm_149 0.0080 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:20:58.4702556Z triton_mm_152 0.0080 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:20:58.4703386Z triton_mm_135 0.0081 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:20:58.4704219Z triton_mm_146 0.0082 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:58.4705176Z triton_mm_144 0.0083 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:58.4705914Z SingleProcess AUTOTUNE benchmarking takes 0.2417 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:20:58.7119110Z Autotune Choices Stats: 2025-09-07T11:20:58.7120733Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_164", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.007040000054985285, "best_triton_pos": 0} 2025-09-07T11:20:58.7216326Z AUTOTUNE mm(6272x40, 40x120) 2025-09-07T11:20:58.7216765Z strides: [40, 1], [1, 40] 2025-09-07T11:20:58.7217177Z dtypes: torch.float16, torch.float16 2025-09-07T11:20:58.7218322Z triton_mm_164 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:58.7219410Z triton_mm_157 0.0072 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:58.7220298Z triton_mm_154 0.0073 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:20:58.7221196Z triton_mm_153 0.0073 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T11:20:58.7222152Z triton_mm_161 0.0073 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:58.7223039Z triton_mm_160 0.0075 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:20:58.7224418Z triton_mm_167 0.0076 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:20:58.7225744Z triton_mm_165 0.0076 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:20:58.7226648Z triton_mm_166 0.0079 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:58.7227545Z triton_mm_159 0.0079 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:20:58.7228333Z SingleProcess AUTOTUNE benchmarking takes 0.2508 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:20:58.9493060Z Autotune Choices Stats: 2025-09-07T11:20:58.9495755Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "mm", "best_time": 0.007807999849319458, "best_triton_pos": 1, "best_triton_time": 0.007840000092983246, "best_triton_kernel": "triton_mm_173", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4"} 2025-09-07T11:20:58.9592363Z AUTOTUNE mm(6272x120, 120x40) 2025-09-07T11:20:58.9592620Z strides: [120, 1], [1, 120] 2025-09-07T11:20:58.9592873Z dtypes: torch.float16, torch.float16 2025-09-07T11:20:58.9593142Z mm 0.0078 ms 100.0% 2025-09-07T11:20:58.9593725Z triton_mm_173 0.0078 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:20:58.9594702Z triton_mm_189 0.0080 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:20:58.9595796Z triton_mm_183 0.0081 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:58.9596750Z triton_mm_188 0.0081 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:58.9597708Z triton_mm_179 0.0082 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:20:58.9598681Z triton_mm_186 0.0082 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:20:58.9599619Z triton_mm_175 0.0082 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:20:58.9600511Z triton_mm_181 0.0082 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:58.9601402Z triton_mm_180 0.0084 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:58.9602186Z SingleProcess AUTOTUNE benchmarking takes 0.2371 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:20:59.2075704Z Autotune Choices Stats: 2025-09-07T11:20:59.2076686Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_235", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007455999962985516, "best_triton_pos": 0} 2025-09-07T11:20:59.2174020Z AUTOTUNE mm(6272x40, 40x240) 2025-09-07T11:20:59.2174429Z strides: [40, 1], [1, 40] 2025-09-07T11:20:59.2174833Z dtypes: torch.float16, torch.float16 2025-09-07T11:20:59.2176525Z triton_mm_235 0.0075 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:59.2178211Z triton_mm_238 0.0075 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:59.2179451Z triton_mm_234 0.0076 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:20:59.2180338Z triton_mm_239 0.0078 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:20:59.2181239Z triton_mm_244 0.0078 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:59.2182200Z triton_mm_245 0.0078 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:20:59.2183089Z triton_mm_231 0.0079 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:59.2183647Z mm 0.0082 ms 91.4% 2025-09-07T11:20:59.2184168Z triton_mm_236 0.0083 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:59.2185208Z triton_mm_240 0.0083 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:59.2186000Z SingleProcess AUTOTUNE benchmarking takes 0.2503 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:20:59.4629409Z Autotune Choices Stats: 2025-09-07T11:20:59.4630376Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_272", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007712000049650669, "best_triton_pos": 0} 2025-09-07T11:20:59.4726772Z AUTOTUNE mm(1568x80, 80x480) 2025-09-07T11:20:59.4727222Z strides: [80, 1], [1, 80] 2025-09-07T11:20:59.4727647Z dtypes: torch.float16, torch.float16 2025-09-07T11:20:59.4728738Z triton_mm_272 0.0077 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:20:59.4730345Z triton_mm_274 0.0078 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:59.4731927Z triton_mm_279 0.0078 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:20:59.4733483Z triton_mm_276 0.0078 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:59.4735505Z triton_mm_277 0.0078 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:20:59.4737084Z triton_mm_268 0.0080 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:20:59.4739224Z triton_mm_275 0.0080 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:20:59.4740128Z triton_mm_278 0.0080 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:59.4741018Z triton_mm_271 0.0081 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:20:59.4742019Z triton_mm_266 0.0081 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:20:59.4742808Z SingleProcess AUTOTUNE benchmarking takes 0.2516 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:20:59.7163860Z Autotune Choices Stats: 2025-09-07T11:20:59.7165967Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_288", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008383999578654766, "best_triton_pos": 0} 2025-09-07T11:20:59.7300722Z AUTOTUNE mm(1568x480, 480x80) 2025-09-07T11:20:59.7300984Z strides: [480, 1], [1, 480] 2025-09-07T11:20:59.7301244Z dtypes: torch.float16, torch.float16 2025-09-07T11:20:59.7302005Z triton_mm_288 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:59.7303033Z triton_mm_292 0.0085 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:59.7303660Z mm 0.0086 ms 97.0% 2025-09-07T11:20:59.7304270Z triton_mm_287 0.0093 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:20:59.7305433Z triton_mm_296 0.0094 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:20:59.7306427Z triton_mm_286 0.0095 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:20:59.7307382Z triton_mm_291 0.0095 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:20:59.7308427Z triton_mm_285 0.0101 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:20:59.7309491Z triton_mm_295 0.0101 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:59.7310456Z triton_mm_294 0.0102 ms 82.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:20:59.7311295Z SingleProcess AUTOTUNE benchmarking takes 0.2568 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:20:59.9863935Z Autotune Choices Stats: 2025-09-07T11:20:59.9865796Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_364", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008031999692320824, "best_triton_pos": 0} 2025-09-07T11:20:59.9961720Z AUTOTUNE mm(1568x480, 480x96) 2025-09-07T11:20:59.9962139Z strides: [480, 1], [1, 480] 2025-09-07T11:20:59.9962577Z dtypes: torch.float16, torch.float16 2025-09-07T11:20:59.9963980Z triton_mm_364 0.0080 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:59.9965291Z mm 0.0085 ms 94.7% 2025-09-07T11:20:59.9979251Z triton_mm_368 0.0086 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:20:59.9980171Z triton_mm_363 0.0091 ms 88.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:20:59.9981035Z triton_mm_362 0.0093 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:20:59.9981975Z triton_mm_367 0.0094 ms 85.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:20:59.9982817Z triton_mm_372 0.0094 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:20:59.9983665Z triton_mm_361 0.0099 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:20:59.9984499Z triton_mm_371 0.0100 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:20:59.9985462Z triton_mm_370 0.0101 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:20:59.9986203Z SingleProcess AUTOTUNE benchmarking takes 0.2562 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:21:00.2378814Z Autotune Choices Stats: 2025-09-07T11:21:00.2380407Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_389", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.007327999919652939, "best_triton_pos": 0} 2025-09-07T11:21:00.2476202Z AUTOTUNE mm(1568x96, 96x576) 2025-09-07T11:21:00.2476453Z strides: [96, 1], [1, 96] 2025-09-07T11:21:00.2476694Z dtypes: torch.float16, torch.float16 2025-09-07T11:21:00.2477327Z triton_mm_389 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:21:00.2478339Z triton_mm_393 0.0074 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:21:00.2480076Z triton_mm_388 0.0076 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:21:00.2481638Z triton_mm_390 0.0076 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:21:00.2483175Z triton_mm_392 0.0076 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:21:00.2484691Z triton_mm_386 0.0077 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:21:00.2487358Z triton_mm_391 0.0077 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:21:00.2488646Z mm 0.0079 ms 92.3% 2025-09-07T11:21:00.2489355Z triton_mm_385 0.0079 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:21:00.2490256Z triton_mm_395 0.0080 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:21:00.2491044Z SingleProcess AUTOTUNE benchmarking takes 0.2509 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:21:00.4895558Z Autotune Choices Stats: 2025-09-07T11:21:00.4897545Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.008576000109314919, "best_triton_pos": 1, "best_triton_time": 0.008671999908983707, "best_triton_kernel": "triton_mm_402", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:21:00.4994116Z AUTOTUNE mm(1568x576, 576x96) 2025-09-07T11:21:00.4994375Z strides: [576, 1], [1, 576] 2025-09-07T11:21:00.4994637Z dtypes: torch.float16, torch.float16 2025-09-07T11:21:00.4994900Z mm 0.0086 ms 100.0% 2025-09-07T11:21:00.4995603Z triton_mm_402 0.0087 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:21:00.4996589Z triton_mm_406 0.0090 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:21:00.4997554Z triton_mm_401 0.0096 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:21:00.4998524Z triton_mm_400 0.0098 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:21:00.4999519Z triton_mm_410 0.0098 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:21:00.5000497Z triton_mm_405 0.0100 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:21:00.5001459Z triton_mm_409 0.0103 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:21:00.5002432Z triton_mm_416 0.0105 ms 81.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:21:00.5003394Z triton_mm_399 0.0109 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:21:00.5004237Z SingleProcess AUTOTUNE benchmarking takes 0.2512 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:21:00.7452246Z Autotune Choices Stats: 2025-09-07T11:21:00.7454245Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.007935999892652035, "best_triton_pos": 1, "best_triton_time": 0.00800000037997961, "best_triton_kernel": "triton_mm_440", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:21:00.7552762Z AUTOTUNE mm(392x576, 576x192) 2025-09-07T11:21:00.7553019Z strides: [576, 1], [1, 576] 2025-09-07T11:21:00.7553260Z dtypes: torch.float16, torch.float16 2025-09-07T11:21:00.7553542Z mm 0.0079 ms 100.0% 2025-09-07T11:21:00.7554304Z triton_mm_440 0.0080 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:21:00.7555450Z triton_mm_444 0.0083 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:21:00.7556421Z triton_mm_439 0.0092 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:21:00.7557392Z triton_mm_448 0.0093 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:21:00.7558378Z triton_mm_438 0.0093 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:21:00.7559331Z triton_mm_443 0.0096 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:21:00.7560288Z triton_mm_447 0.0097 ms 82.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:21:00.7561260Z triton_mm_437 0.0099 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:21:00.7562233Z triton_mm_454 0.0102 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:21:00.7563091Z SingleProcess AUTOTUNE benchmarking takes 0.2513 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:21:00.9972941Z Autotune Choices Stats: 2025-09-07T11:21:00.9974562Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_462", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007903999648988247, "best_triton_pos": 0} 2025-09-07T11:21:01.0078643Z AUTOTUNE mm(392x192, 192x1152) 2025-09-07T11:21:01.0079072Z strides: [192, 1], [1, 192] 2025-09-07T11:21:01.0079579Z dtypes: torch.float16, torch.float16 2025-09-07T11:21:01.0080659Z triton_mm_462 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:21:01.0082274Z triton_mm_463 0.0080 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:21:01.0083903Z triton_mm_467 0.0082 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:21:01.0085761Z triton_mm_469 0.0082 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:21:01.0086733Z mm 0.0084 ms 94.6% 2025-09-07T11:21:01.0087633Z triton_mm_466 0.0084 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:21:01.0089265Z triton_mm_465 0.0084 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:21:01.0090469Z triton_mm_457 0.0085 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:21:01.0091498Z triton_mm_464 0.0086 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:21:01.0092404Z triton_mm_456 0.0087 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:21:01.0093197Z SingleProcess AUTOTUNE benchmarking takes 0.2513 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:21:01.2496504Z Autotune Choices Stats: 2025-09-07T11:21:01.2498457Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.008832000195980072, "best_triton_pos": 1, "best_triton_time": 0.00886400043964386, "best_triton_kernel": "triton_mm_478", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:21:01.2603514Z AUTOTUNE mm(392x1152, 1152x192) 2025-09-07T11:21:01.2603931Z strides: [1152, 1], [1, 1152] 2025-09-07T11:21:01.2604349Z dtypes: torch.float16, torch.float16 2025-09-07T11:21:01.2604804Z mm 0.0088 ms 100.0% 2025-09-07T11:21:01.2606229Z triton_mm_478 0.0089 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:21:01.2607780Z triton_mm_482 0.0094 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:21:01.2609464Z triton_mm_486 0.0105 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:21:01.2610366Z triton_mm_477 0.0122 ms 72.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:21:01.2611244Z triton_mm_481 0.0124 ms 71.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:21:01.2612118Z triton_mm_476 0.0126 ms 70.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:21:01.2613019Z triton_mm_492 0.0128 ms 69.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:21:01.2613923Z triton_mm_485 0.0131 ms 67.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:21:01.2614818Z triton_mm_475 0.0132 ms 66.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:21:01.2615717Z SingleProcess AUTOTUNE benchmarking takes 0.2519 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:21:01.5214631Z Autotune Choices Stats: 2025-09-07T11:21:01.5216632Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_592", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009056000038981438, "best_triton_pos": 0} 2025-09-07T11:21:01.5317343Z AUTOTUNE mm(392x1152, 1152x320) 2025-09-07T11:21:01.5317588Z strides: [1152, 1], [1, 1152] 2025-09-07T11:21:01.5317851Z dtypes: torch.float16, torch.float16 2025-09-07T11:21:01.5318695Z triton_mm_592 0.0091 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:21:01.5320066Z mm 0.0094 ms 96.6% 2025-09-07T11:21:01.5320999Z triton_mm_596 0.0097 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:21:01.5322546Z triton_mm_600 0.0108 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:21:01.5324107Z triton_mm_591 0.0124 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:21:01.5325951Z triton_mm_595 0.0126 ms 71.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:21:01.5327552Z triton_mm_606 0.0128 ms 70.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:21:01.5329176Z triton_mm_590 0.0129 ms 70.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:21:01.5330241Z triton_mm_599 0.0132 ms 68.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:21:01.5331133Z triton_mm_589 0.0133 ms 68.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:21:01.5331914Z SingleProcess AUTOTUNE benchmarking takes 0.2533 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:21:01.7725473Z Autotune Choices Stats: 2025-09-07T11:21:01.7727462Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.008224000222980976, "best_triton_pos": 1, "best_triton_time": 0.008736000396311283, "best_triton_kernel": "triton_mm_614", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8"} 2025-09-07T11:21:01.7827422Z AUTOTUNE mm(392x320, 320x1280) 2025-09-07T11:21:01.7827854Z strides: [320, 1], [1, 320] 2025-09-07T11:21:01.7828257Z dtypes: torch.float16, torch.float16 2025-09-07T11:21:01.7828697Z mm 0.0082 ms 100.0% 2025-09-07T11:21:01.7829812Z triton_mm_614 0.0087 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:21:01.7830798Z triton_mm_619 0.0090 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:21:01.7831778Z triton_mm_618 0.0091 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:21:01.7832761Z triton_mm_617 0.0091 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:21:01.7833723Z triton_mm_621 0.0095 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:21:01.7834690Z triton_mm_625 0.0095 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:21:01.7836009Z triton_mm_609 0.0096 ms 85.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:21:01.7837103Z triton_mm_624 0.0097 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:21:01.7838078Z triton_mm_608 0.0098 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:21:01.7838934Z SingleProcess AUTOTUNE benchmarking takes 0.2504 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:21:20.6933937Z W0907 11:21:20.692000 113077 site-packages/torch/_logging/_internal.py:1199] [7/0] Profiler function will be ignored 2025-09-07T11:21:43.3665402Z pass 2025-09-07T11:21:47.8300814Z accuracy pass_rate=100.00% 2025-09-07T11:21:47.8306939Z calls_captured gmean=969.23x mean=1248.750x 2025-09-07T11:21:47.8310442Z unique_graphs gmean=2.67x mean=2.750x 2025-09-07T11:21:47.8314057Z graph_breaks gmean=6.46x mean=6.500x 2025-09-07T11:21:47.8317860Z unique_graph_breaks gmean=4.86x mean=4.875x 2025-09-07T11:21:47.8321241Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T11:21:47.8324684Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T11:21:47.8328400Z cudagraph_skips gmean=0.00x mean=0.250x 2025-09-07T11:21:47.8329551Z compilation_latency mean=107.681 seconds 2025-09-07T11:21:48.8680634Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *cudagraphs_low_precision-true* ]] 2025-09-07T11:21:48.8681960Z + [[ training == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T11:21:48.8682248Z + for target in "${targets[@]}" 2025-09-07T11:21:48.8682612Z + target_flag=('--performance') 2025-09-07T11:21:48.8682916Z + local target_flag 2025-09-07T11:21:48.8683198Z + [[ performance == \p\e\r\f\o\r\m\a\n\c\e ]] 2025-09-07T11:21:48.8683520Z + target_flag+=(--cold-start-latency) 2025-09-07T11:21:48.8684728Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *freezing-true* ]] 2025-09-07T11:21:48.8687282Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *default-true* ]] 2025-09-07T11:21:48.8689462Z + python benchmarks/dynamo/timm_models.py --performance --cold-start-latency --training --amp --backend inductor --disable-cudagraphs --device cuda --total-partitions 7 --partition-id 3 --output /var/lib/jenkins/workspace/test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_performance.csv 2025-09-07T11:21:49.8939387Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:21:49.8940632Z import pynvml # type: ignore[import] 2025-09-07T11:21:54.6686892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:21:54.6688122Z import pynvml # type: ignore[import] 2025-09-07T11:21:57.7348256Z 2025-09-07T11:21:59.8981638Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:21:59.8982188Z loading model: 0it [00:02, ?it/s] 2025-09-07T11:21:59.8983555Z cuda train hrnet_w18 2025-09-07T11:25:07.3263791Z 2025-09-07T11:25:07.6384684Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T11:27:48.6242432Z 2025-09-07T11:27:48.7972894Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T11:29:54.7867816Z 2025-09-07T11:29:54.9260388Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T11:32:27.4222139Z 2025-09-07T11:32:27.6677235Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T11:33:29.6590454Z 2025-09-07T11:33:29.8244933Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T11:36:08.9323886Z 2025-09-07T11:36:09.2143200Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T11:42:29.6388634Z 2025-09-07T11:42:29.8921062Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T11:44:39.4183379Z 2025-09-07T11:44:39.6499982Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T11:47:11.5991866Z 2025-09-07T11:47:11.9764475Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T11:48:14.9979945Z 2025-09-07T11:48:15.2126522Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T11:50:57.7238267Z 2025-09-07T11:50:58.0648690Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T11:57:26.7780377Z 2025-09-07T11:57:27.2147366Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T11:59:39.0281131Z 2025-09-07T11:59:39.3388361Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T12:02:14.7058887Z 2025-09-07T12:02:15.2356452Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T12:03:21.4338691Z 2025-09-07T12:03:21.6626198Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T12:06:08.9578977Z 2025-09-07T12:06:09.4309701Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T12:20:42.6636708Z 2025-09-07T12:20:42.8512857Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T12:24:28.9297118Z 2025-09-07T12:24:29.0785270Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T12:28:45.7940409Z 2025-09-07T12:28:46.0418158Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T12:30:19.7456370Z 2025-09-07T12:30:19.9150270Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T12:34:50.8099602Z 2025-09-07T12:34:51.0958306Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T12:42:16.5359031Z 2025-09-07T12:42:16.7858621Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T12:45:40.8308592Z 2025-09-07T12:45:41.0485343Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T12:50:02.3169285Z 2025-09-07T12:50:02.6706980Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T12:51:40.8410730Z 2025-09-07T12:51:41.0547836Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T12:55:47.1980569Z 2025-09-07T12:55:47.5306917Z running benchmark: 0% 0/30 [00:00 2025-09-07T13:02:57.8056623Z and t.untyped_storage().data_ptr() not in existing_path_data_ptrs 2025-09-07T13:02:57.8057405Z RuntimeError: Error: accessing tensor output of CUDAGraphs that has been overwritten by a subsequent run. Stack trace: File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 436, in forward_pass 2025-09-07T13:02:57.8058092Z return mod(*inputs) 2025-09-07T13:02:57.8058446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 718, in forward 2025-09-07T13:02:57.8058842Z x = self.forward_features(x) 2025-09-07T13:02:57.8059245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 709, in forward_features 2025-09-07T13:02:57.8059653Z x = self.stages(x) 2025-09-07T13:02:57.8059991Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 520, in forward 2025-09-07T13:02:57.8060365Z x = self.blocks(x) 2025-09-07T13:02:57.8060698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 458, in forward 2025-09-07T13:02:57.8061089Z x = x + self.drop_path1(self.attn(x)) 2025-09-07T13:02:57.8061578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 237, in forward 2025-09-07T13:02:57.8062018Z attn = q @ k * self.scale + self.get_attention_biases(x.device) 2025-09-07T13:02:57.8062508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 216, in get_attention_biases 2025-09-07T13:02:57.8063473Z self.attention_bias_cache[device_key] = self.attention_biases[:, self.attention_bias_idxs]. To prevent overwriting, clone the tensor outside of torch.compile() or call torch.compiler.cudagraph_mark_step_begin() before each model invocation. 2025-09-07T13:02:57.8064328Z TorchDynamo optimized model failed to run because of following error 2025-09-07T13:02:57.8254632Z fail_to_run 2025-09-07T13:03:01.8354885Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:03:01.8356789Z import pynvml # type: ignore[import] 2025-09-07T13:03:04.9387420Z 2025-09-07T13:03:06.5449464Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:03:06.5450470Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:03:06.5493568Z cuda eval mixer_b16_224 2025-09-07T13:03:16.0668256Z pass 2025-09-07T13:03:19.3220779Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:03:19.3222729Z import pynvml # type: ignore[import] 2025-09-07T13:03:22.3448417Z 2025-09-07T13:03:23.3721968Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:03:23.3722334Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:03:23.3829923Z cuda eval mixnet_l 2025-09-07T13:03:42.5789937Z pass 2025-09-07T13:03:46.4845173Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:03:46.4846480Z import pynvml # type: ignore[import] 2025-09-07T13:03:49.5181262Z 2025-09-07T13:03:50.3782201Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:03:50.3782562Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:03:50.3844307Z cuda eval mnasnet_100 2025-09-07T13:03:59.9739956Z pass 2025-09-07T13:04:02.2340301Z accuracy pass_rate=87.50% 2025-09-07T13:04:02.2346042Z calls_captured gmean=367.65x mean=508.875x 2025-09-07T13:04:02.2349149Z unique_graphs gmean=1.09x mean=1.125x 2025-09-07T13:04:02.2352244Z graph_breaks gmean=0.00x mean=0.000x 2025-09-07T13:04:02.2355838Z unique_graph_breaks gmean=0.00x mean=0.000x 2025-09-07T13:04:02.2359345Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T13:04:02.2362634Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T13:04:02.2366142Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T13:04:02.2367538Z compilation_latency mean=19.737 seconds 2025-09-07T13:04:03.3161457Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *dynamic-true* ]] 2025-09-07T13:04:03.3163929Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --inference --bfloat16 --backend inductor --dynamic-shapes --dynamic-batch-only --device cuda --total-partitions 7 --partition-id 3 --output /var/lib/jenkins/workspace/test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_accuracy.csv 2025-09-07T13:04:04.3449570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:04:04.3451544Z import pynvml # type: ignore[import] 2025-09-07T13:04:09.0771884Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:04:09.0773681Z import pynvml # type: ignore[import] 2025-09-07T13:04:12.1752114Z 2025-09-07T13:04:14.1749936Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:04:14.1750488Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:04:14.2124235Z cuda eval hrnet_w18 2025-09-07T13:04:31.6244446Z pass 2025-09-07T13:04:35.3618715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:04:35.3620433Z import pynvml # type: ignore[import] 2025-09-07T13:04:38.4076096Z 2025-09-07T13:04:39.6972029Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:04:39.6972389Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:04:39.7089314Z cuda eval inception_v3 2025-09-07T13:04:46.3925979Z pass 2025-09-07T13:04:49.7395801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:04:49.7397033Z import pynvml # type: ignore[import] 2025-09-07T13:04:52.9193375Z 2025-09-07T13:04:54.7701509Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:04:54.7702093Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:04:54.7786056Z cuda eval jx_nest_base 2025-09-07T13:05:00.6162680Z pass 2025-09-07T13:05:03.8603421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:05:03.8605944Z import pynvml # type: ignore[import] 2025-09-07T13:05:06.8930229Z 2025-09-07T13:05:07.6265767Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:05:07.6266136Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:05:07.6307006Z cuda eval lcnet_050 2025-09-07T13:05:11.3376736Z pass 2025-09-07T13:05:14.7216274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:05:14.7217804Z import pynvml # type: ignore[import] 2025-09-07T13:05:17.7600039Z 2025-09-07T13:05:18.5767783Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:05:18.5768299Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:05:18.5851076Z cuda eval levit_128 2025-09-07T13:05:27.3581774Z ERROR:common: 2025-09-07T13:05:27.3582091Z Traceback (most recent call last): 2025-09-07T13:05:27.3582613Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2326, in check_accuracy 2025-09-07T13:05:27.3583126Z new_result = self.run_n_iterations( 2025-09-07T13:05:27.3583627Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2036, in run_n_iterations 2025-09-07T13:05:27.3584178Z return model_iter_fn(mod, inputs, collect_outputs=True) 2025-09-07T13:05:27.3584885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper 2025-09-07T13:05:27.3585881Z return fn(*args, **kwargs) 2025-09-07T13:05:27.3586312Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 434, in forward_pass 2025-09-07T13:05:27.3586842Z def forward_pass(self, mod, inputs, collect_outputs=True): 2025-09-07T13:05:27.3587356Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T13:05:27.3587818Z return fn(*args, **kwargs) 2025-09-07T13:05:27.3588293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1130, in forward 2025-09-07T13:05:27.3588792Z return compiled_fn(full_args) 2025-09-07T13:05:27.3589352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper 2025-09-07T13:05:27.3589942Z all_outs = call_func_at_runtime_with_args( 2025-09-07T13:05:27.3590963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args 2025-09-07T13:05:27.3591545Z out = normalize_as_list(f(args)) 2025-09-07T13:05:27.3592396Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper 2025-09-07T13:05:27.3593059Z return compiled_fn(runtime_args) 2025-09-07T13:05:27.3593534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 613, in __call__ 2025-09-07T13:05:27.3594019Z return self.current_callable(inputs) 2025-09-07T13:05:27.3594559Z File "/tmp/torchinductor_jenkins/w3/cw35segceiexic536byw6izuz6kmfwvshyirt2i4n6wgspnxujnv.py", line 5105, in call 2025-09-07T13:05:27.3595315Z (buf266,) = self.partitions[0](partition0_args) 2025-09-07T13:05:27.3595788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1772, in run 2025-09-07T13:05:27.3596276Z return compiled_fn(new_inputs) # type: ignore[arg-type] 2025-09-07T13:05:27.3596814Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 404, in deferred_cudagraphify 2025-09-07T13:05:27.3597397Z fn, out = cudagraphify(model, inputs, new_static_input_idxs, *args, **kwargs) 2025-09-07T13:05:27.3597947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 463, in cudagraphify 2025-09-07T13:05:27.3598410Z return manager.add_function( 2025-09-07T13:05:27.3598848Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 2316, in add_function 2025-09-07T13:05:27.3599289Z return fn, fn(inputs) 2025-09-07T13:05:27.3599686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 2012, in run 2025-09-07T13:05:27.3600124Z out = self._run(new_inputs, function_id) 2025-09-07T13:05:27.3600564Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 2116, in _run 2025-09-07T13:05:27.3601004Z return self.run_eager(new_inputs, function_id) 2025-09-07T13:05:27.3601464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 2277, in run_eager 2025-09-07T13:05:27.3601921Z return node.run(new_inputs) 2025-09-07T13:05:27.3602322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 671, in run 2025-09-07T13:05:27.3602784Z non_cudagraph_inps_storages = get_non_cudagraph_inps() 2025-09-07T13:05:27.3603294Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 663, in get_non_cudagraph_inps 2025-09-07T13:05:27.3603764Z non_cudagraph_inps = [ 2025-09-07T13:05:27.3604168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 667, in 2025-09-07T13:05:27.3604681Z and t.untyped_storage().data_ptr() not in existing_path_data_ptrs 2025-09-07T13:05:27.3605888Z RuntimeError: Error: accessing tensor output of CUDAGraphs that has been overwritten by a subsequent run. Stack trace: File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 436, in forward_pass 2025-09-07T13:05:27.3606585Z return mod(*inputs) 2025-09-07T13:05:27.3606939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 718, in forward 2025-09-07T13:05:27.3607328Z x = self.forward_features(x) 2025-09-07T13:05:27.3607726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 709, in forward_features 2025-09-07T13:05:27.3608126Z x = self.stages(x) 2025-09-07T13:05:27.3608467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 520, in forward 2025-09-07T13:05:27.3608857Z x = self.blocks(x) 2025-09-07T13:05:27.3609353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 458, in forward 2025-09-07T13:05:27.3609754Z x = x + self.drop_path1(self.attn(x)) 2025-09-07T13:05:27.3610302Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 237, in forward 2025-09-07T13:05:27.3610809Z attn = q @ k * self.scale + self.get_attention_biases(x.device) 2025-09-07T13:05:27.3611290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 216, in get_attention_biases 2025-09-07T13:05:27.3612251Z self.attention_bias_cache[device_key] = self.attention_biases[:, self.attention_bias_idxs]. To prevent overwriting, clone the tensor outside of torch.compile() or call torch.compiler.cudagraph_mark_step_begin() before each model invocation. 2025-09-07T13:05:27.3613109Z TorchDynamo optimized model failed to run because of following error 2025-09-07T13:05:27.3801269Z fail_to_run 2025-09-07T13:05:31.1145911Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:05:31.1147657Z import pynvml # type: ignore[import] 2025-09-07T13:05:34.3333976Z 2025-09-07T13:05:35.7050539Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:05:35.7051040Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:05:35.7095275Z cuda eval mixer_b16_224 2025-09-07T13:05:39.3751817Z pass 2025-09-07T13:05:43.0393731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:05:43.0395440Z import pynvml # type: ignore[import] 2025-09-07T13:05:46.0746161Z 2025-09-07T13:05:47.4128098Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:05:47.4128611Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:05:47.4248513Z cuda eval mixnet_l 2025-09-07T13:05:54.6898319Z pass 2025-09-07T13:05:57.8757927Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:05:57.8759176Z import pynvml # type: ignore[import] 2025-09-07T13:06:00.9046798Z 2025-09-07T13:06:02.1187650Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:06:02.1187981Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:06:02.1254184Z cuda eval mnasnet_100 2025-09-07T13:06:06.7515596Z pass 2025-09-07T13:06:09.0486147Z accuracy pass_rate=87.50% 2025-09-07T13:06:09.0490686Z calls_captured gmean=367.65x mean=508.875x 2025-09-07T13:06:09.0494577Z unique_graphs gmean=1.09x mean=1.125x 2025-09-07T13:06:09.0498179Z graph_breaks gmean=0.00x mean=0.000x 2025-09-07T13:06:09.0501516Z unique_graph_breaks gmean=0.00x mean=0.000x 2025-09-07T13:06:09.0505227Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T13:06:09.0508544Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T13:06:09.0511781Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T13:06:09.0512957Z compilation_latency mean=6.431 seconds 2025-09-07T13:06:10.2645946Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *cppwrapper-true* ]] 2025-09-07T13:06:10.2647244Z + TORCHINDUCTOR_CPP_WRAPPER=1 2025-09-07T13:06:10.2648604Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --inference --bfloat16 --backend inductor --disable-cudagraphs --device cuda --total-partitions 7 --partition-id 3 --output /var/lib/jenkins/workspace/test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_accuracy.csv 2025-09-07T13:06:11.3497755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:06:11.3499128Z import pynvml # type: ignore[import] 2025-09-07T13:06:16.1074871Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:06:16.1076411Z import pynvml # type: ignore[import] 2025-09-07T13:06:19.7810528Z 2025-09-07T13:06:21.7530975Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:06:21.7531604Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:06:21.8208054Z cuda eval hrnet_w18 2025-09-07T13:07:45.8608416Z pass 2025-09-07T13:07:50.8111463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:07:50.8113057Z import pynvml # type: ignore[import] 2025-09-07T13:07:53.9071160Z 2025-09-07T13:07:55.6032734Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:07:55.6033126Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:07:55.6149284Z cuda eval inception_v3 2025-09-07T13:08:27.9772128Z pass 2025-09-07T13:08:32.3690731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:08:32.3691941Z import pynvml # type: ignore[import] 2025-09-07T13:08:35.3882948Z 2025-09-07T13:08:37.3268665Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:08:37.3269003Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:08:37.3352561Z cuda eval jx_nest_base 2025-09-07T13:09:14.8599131Z pass 2025-09-07T13:09:19.0288471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:09:22.4438370Z import pynvml # type: ignore[import] 2025-09-07T13:09:22.4438640Z 2025-09-07T13:09:23.3561639Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:09:23.3561999Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:09:23.3641540Z cuda eval lcnet_050 2025-09-07T13:09:35.6168765Z pass 2025-09-07T13:09:39.0333724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:09:39.0335807Z import pynvml # type: ignore[import] 2025-09-07T13:09:42.1160430Z 2025-09-07T13:09:43.4885250Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:09:43.4885705Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:09:43.4972875Z cuda eval levit_128 2025-09-07T13:10:30.7487070Z pass 2025-09-07T13:10:35.0537903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:10:35.0539602Z import pynvml # type: ignore[import] 2025-09-07T13:10:38.0344145Z 2025-09-07T13:10:39.3955178Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:10:39.3956169Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:10:39.3998498Z cuda eval mixer_b16_224 2025-09-07T13:10:55.5038977Z pass 2025-09-07T13:10:58.9744007Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:10:58.9748854Z import pynvml # type: ignore[import] 2025-09-07T13:11:01.9833705Z 2025-09-07T13:11:02.9604593Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:11:02.9605492Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:11:02.9713902Z cuda eval mixnet_l 2025-09-07T13:11:35.5259319Z pass 2025-09-07T13:11:39.3333597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:11:39.3334842Z import pynvml # type: ignore[import] 2025-09-07T13:11:42.3432988Z 2025-09-07T13:11:43.5396465Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:11:43.5396829Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:11:43.5462264Z cuda eval mnasnet_100 2025-09-07T13:11:58.5810769Z pass 2025-09-07T13:12:01.1032998Z accuracy pass_rate=100.00% 2025-09-07T13:12:01.1039951Z calls_captured gmean=367.65x mean=508.875x 2025-09-07T13:12:01.1043168Z unique_graphs gmean=1.09x mean=1.125x 2025-09-07T13:12:01.1046884Z graph_breaks gmean=0.00x mean=0.000x 2025-09-07T13:12:01.1050296Z unique_graph_breaks gmean=0.00x mean=0.000x 2025-09-07T13:12:01.1053667Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T13:12:01.1058373Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T13:12:01.1061836Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T13:12:01.1063127Z compilation_latency mean=33.868 seconds 2025-09-07T13:12:02.3644559Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *freezing_cudagraphs-true* ]] 2025-09-07T13:12:02.3646392Z + [[ inference == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T13:12:02.3647797Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --inference --bfloat16 --backend inductor --device cuda --total-partitions 7 --partition-id 3 --freezing --output /var/lib/jenkins/workspace/test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_accuracy.csv 2025-09-07T13:12:03.3872193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:12:03.3873232Z import pynvml # type: ignore[import] 2025-09-07T13:12:08.1599981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:12:08.1601042Z import pynvml # type: ignore[import] 2025-09-07T13:12:11.1991030Z 2025-09-07T13:12:13.2406063Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:12:13.2409554Z loading model: 0it [00:02, ?it/s] 2025-09-07T13:12:13.2778253Z cuda eval hrnet_w18 2025-09-07T13:13:07.8651049Z pass 2025-09-07T13:13:12.1019510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:13:12.1021003Z import pynvml # type: ignore[import] 2025-09-07T13:13:15.1971118Z 2025-09-07T13:13:16.4019391Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:13:16.4019747Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:13:16.4138591Z cuda eval inception_v3 2025-09-07T13:13:37.7625729Z pass 2025-09-07T13:13:41.4767124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:13:41.4768371Z import pynvml # type: ignore[import] 2025-09-07T13:13:44.5005211Z 2025-09-07T13:13:47.3643462Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:13:47.3643843Z loading model: 0it [00:02, ?it/s] 2025-09-07T13:13:47.3730351Z cuda eval jx_nest_base 2025-09-07T13:14:12.5265626Z pass 2025-09-07T13:14:16.4376094Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:14:16.4377419Z import pynvml # type: ignore[import] 2025-09-07T13:14:19.5485840Z 2025-09-07T13:14:20.2818166Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:14:20.2818632Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:14:20.2856354Z cuda eval lcnet_050 2025-09-07T13:14:29.5273528Z pass 2025-09-07T13:14:32.9069964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:14:32.9071225Z import pynvml # type: ignore[import] 2025-09-07T13:14:36.0062022Z 2025-09-07T13:14:37.0956462Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:14:37.0956966Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:14:37.1034034Z cuda eval levit_128 2025-09-07T13:15:11.8028305Z pass 2025-09-07T13:15:15.5719452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:15:15.5720684Z import pynvml # type: ignore[import] 2025-09-07T13:15:18.5766625Z 2025-09-07T13:15:19.9437993Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:15:19.9438325Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:15:19.9480340Z cuda eval mixer_b16_224 2025-09-07T13:15:30.2686625Z pass 2025-09-07T13:15:33.6154801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:15:33.6156410Z import pynvml # type: ignore[import] 2025-09-07T13:15:36.6899459Z 2025-09-07T13:15:37.7394584Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:15:37.7395582Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:15:37.7509832Z cuda eval mixnet_l 2025-09-07T13:15:59.1137956Z pass 2025-09-07T13:16:02.7529098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:16:02.7530895Z import pynvml # type: ignore[import] 2025-09-07T13:16:05.7590333Z 2025-09-07T13:16:06.8638629Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:16:06.8639187Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:16:06.8700653Z cuda eval mnasnet_100 2025-09-07T13:16:18.1333151Z pass 2025-09-07T13:16:20.4315807Z accuracy pass_rate=100.00% 2025-09-07T13:16:20.4321268Z calls_captured gmean=367.65x mean=508.875x 2025-09-07T13:16:20.4325205Z unique_graphs gmean=1.09x mean=1.125x 2025-09-07T13:16:20.4329004Z graph_breaks gmean=0.00x mean=0.000x 2025-09-07T13:16:20.4332574Z unique_graph_breaks gmean=0.00x mean=0.000x 2025-09-07T13:16:20.4336529Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T13:16:20.4340104Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T13:16:20.4343846Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T13:16:20.4345267Z compilation_latency mean=22.720 seconds 2025-09-07T13:16:21.4941567Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *freeze_autotune_cudagraphs-true* ]] 2025-09-07T13:16:21.4955225Z + [[ inference == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T13:16:21.4955548Z + TORCHINDUCTOR_MAX_AUTOTUNE=1 2025-09-07T13:16:21.4956943Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --inference --bfloat16 --backend inductor --device cuda --total-partitions 7 --partition-id 3 --freezing --output /var/lib/jenkins/workspace/test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.csv 2025-09-07T13:16:22.5166419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:16:22.5167690Z import pynvml # type: ignore[import] 2025-09-07T13:16:27.3320283Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:16:27.3321516Z import pynvml # type: ignore[import] 2025-09-07T13:16:30.3652913Z 2025-09-07T13:16:32.3292992Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:16:32.3293489Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:16:32.3675279Z cuda eval hrnet_w18 2025-09-07T13:17:36.3946850Z Autotune Choices Stats: 2025-09-07T13:17:36.3948247Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.013248000293970108, "best_triton_pos": 1, "best_triton_time": 0.01360000018030405, "best_triton_kernel": "triton_mm_75", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T13:17:36.4056488Z AUTOTUNE addmm(25088x256, 25088x64, 64x256) 2025-09-07T13:17:36.4056848Z strides: [0, 1], [64, 1], [1, 64] 2025-09-07T13:17:36.4057201Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:36.4057552Z bias_addmm 0.0132 ms 100.0% 2025-09-07T13:17:36.4058199Z triton_mm_75 0.0136 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:36.4059182Z triton_mm_65 0.0137 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:36.4060468Z triton_mm_67 0.0143 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:36.4061860Z triton_mm_70 0.0144 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:36.4062855Z triton_mm_68 0.0144 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:36.4063733Z triton_mm_69 0.0144 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:17:36.4064621Z triton_mm_74 0.0145 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:36.4065778Z triton_mm_63 0.0146 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:36.4066666Z triton_mm_71 0.0146 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:36.4067454Z SingleProcess AUTOTUNE benchmarking takes 0.2920 seconds and 0.0004 seconds precompiling for 21 choices 2025-09-07T13:17:37.4533731Z Autotune Choices Stats: 2025-09-07T13:17:37.4534824Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_2807", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.008383999578654766, "best_triton_pos": 0} 2025-09-07T13:17:37.4638483Z AUTOTUNE addmm(25088x128, 25088x18, 18x128) 2025-09-07T13:17:37.4638827Z strides: [0, 1], [18, 1], [1, 18] 2025-09-07T13:17:37.4639137Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:37.4639855Z triton_mm_2807 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:37.4640872Z triton_mm_2800 0.0084 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:37.4641875Z triton_mm_2804 0.0084 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:37.4642839Z triton_mm_2802 0.0085 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:37.4643779Z triton_mm_2805 0.0086 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:17:37.4644700Z triton_mm_2803 0.0087 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:37.4645776Z triton_mm_2806 0.0087 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:37.4646687Z triton_mm_2799 0.0093 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:37.4647601Z triton_mm_2809 0.0093 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:37.4648790Z triton_mm_2808 0.0093 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:17:37.4649813Z SingleProcess AUTOTUNE benchmarking takes 0.2541 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:17:38.3502455Z Autotune Choices Stats: 2025-09-07T13:17:38.3503503Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_22", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.008832000195980072, "best_triton_pos": 0} 2025-09-07T13:17:38.3603164Z AUTOTUNE addmm(25088x64, 25088x64, 64x64) 2025-09-07T13:17:38.3603502Z strides: [0, 1], [64, 1], [1, 64] 2025-09-07T13:17:38.3603822Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:38.3604543Z triton_mm_22 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:38.3605876Z triton_mm_24 0.0090 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:38.3606840Z triton_mm_20 0.0091 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:17:38.3607782Z triton_mm_25 0.0092 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:17:38.3608725Z triton_mm_23 0.0092 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:38.3609675Z triton_mm_30 0.0092 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:38.3610621Z triton_mm_17 0.0094 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:38.3611634Z triton_mm_19 0.0094 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:38.3612683Z triton_mm_29 0.0094 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:38.3613628Z triton_mm_21 0.0094 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:38.3614466Z SingleProcess AUTOTUNE benchmarking takes 0.2663 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:17:38.8926902Z Autotune Choices Stats: 2025-09-07T13:17:38.8928001Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_87", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.013024000450968742, "best_triton_pos": 0} 2025-09-07T13:17:38.9030199Z AUTOTUNE addmm(25088x64, 25088x256, 256x64) 2025-09-07T13:17:38.9030515Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T13:17:38.9030839Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:38.9031601Z triton_mm_87 0.0130 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:38.9032699Z triton_mm_83 0.0135 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:17:38.9033651Z bias_addmm 0.0140 ms 92.7% 2025-09-07T13:17:38.9034463Z triton_mm_92 0.0140 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:38.9035903Z triton_mm_86 0.0144 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:38.9036856Z triton_mm_89 0.0148 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:38.9037809Z triton_mm_90 0.0150 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:38.9038759Z triton_mm_85 0.0150 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:38.9039697Z triton_mm_82 0.0153 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:38.9040640Z triton_mm_77 0.0155 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:38.9041476Z SingleProcess AUTOTUNE benchmarking takes 0.2736 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:17:39.2245721Z Autotune Choices Stats: 2025-09-07T13:17:39.2246823Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_2693", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.00774399982765317, "best_triton_pos": 0} 2025-09-07T13:17:39.2348557Z AUTOTUNE addmm(6272x256, 6272x36, 36x256) 2025-09-07T13:17:39.2348873Z strides: [0, 1], [36, 1], [1, 36] 2025-09-07T13:17:39.2349201Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:39.2349963Z triton_mm_2693 0.0077 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:39.2350954Z triton_mm_2692 0.0078 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:17:39.2352074Z triton_mm_2697 0.0079 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:17:39.2353069Z triton_mm_2696 0.0079 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:39.2354054Z triton_mm_2690 0.0081 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:39.2355166Z triton_mm_2694 0.0083 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:39.2356135Z triton_mm_2691 0.0084 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:39.2357099Z triton_mm_2689 0.0085 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:39.2358344Z triton_mm_2702 0.0085 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:39.2359553Z triton_mm_2703 0.0085 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:39.2360508Z SingleProcess AUTOTUNE benchmarking takes 0.2829 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T13:17:40.2082044Z Autotune Choices Stats: 2025-09-07T13:17:40.2083304Z {"num_choices": 17, "num_triton_choices": 15, "best_kernel": "triton_mm_2765", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.006976000033318996, "best_triton_pos": 0} 2025-09-07T13:17:40.2189076Z AUTOTUNE addmm(25088x32, 25088x18, 18x32) 2025-09-07T13:17:40.2189434Z strides: [0, 1], [18, 1], [1, 18] 2025-09-07T13:17:40.2189768Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:40.2190510Z triton_mm_2765 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:17:40.2191546Z triton_mm_2762 0.0070 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:17:40.2192674Z triton_mm_2761 0.0071 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:40.2193640Z triton_mm_2758 0.0072 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:40.2194607Z triton_mm_2757 0.0072 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:40.2195729Z triton_mm_2764 0.0072 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:40.2196694Z triton_mm_2756 0.0073 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:40.2197655Z triton_mm_2759 0.0073 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:40.2198612Z triton_mm_2763 0.0073 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:40.2199576Z triton_mm_2760 0.0075 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:40.2200429Z SingleProcess AUTOTUNE benchmarking takes 0.2386 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T13:17:40.7787673Z Autotune Choices Stats: 2025-09-07T13:17:40.7788751Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_2593", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.007615999784320593, "best_triton_pos": 0} 2025-09-07T13:17:40.7888932Z AUTOTUNE addmm(1568x512, 1568x72, 72x512) 2025-09-07T13:17:40.7889221Z strides: [0, 1], [72, 1], [1, 72] 2025-09-07T13:17:40.7889548Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:40.7890316Z triton_mm_2593 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:40.7891947Z triton_mm_2588 0.0077 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:40.7893044Z triton_mm_2591 0.0077 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:17:40.7893955Z triton_mm_2592 0.0078 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:40.7894853Z triton_mm_2587 0.0078 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:40.7896141Z triton_mm_2590 0.0079 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:40.7897047Z triton_mm_2589 0.0080 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:40.7897946Z triton_mm_2585 0.0080 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:40.7898861Z triton_mm_2580 0.0082 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:40.7899771Z triton_mm_2586 0.0082 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:17:40.7900562Z SingleProcess AUTOTUNE benchmarking takes 0.2863 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:17:41.8048902Z Autotune Choices Stats: 2025-09-07T13:17:41.8050040Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_2647", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006943999789655209, "best_triton_pos": 0} 2025-09-07T13:17:41.8152389Z AUTOTUNE addmm(6272x64, 6272x36, 36x64) 2025-09-07T13:17:41.8152746Z strides: [0, 1], [36, 1], [1, 36] 2025-09-07T13:17:41.8153090Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:41.8153778Z triton_mm_2647 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:41.8154816Z triton_mm_2649 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:41.8156110Z triton_mm_2648 0.0070 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:17:41.8157091Z triton_mm_2652 0.0070 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:41.8158060Z triton_mm_2646 0.0070 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:41.8159027Z triton_mm_2653 0.0071 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:17:41.8160322Z triton_mm_2641 0.0071 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:17:41.8161496Z triton_mm_2651 0.0072 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:41.8162583Z triton_mm_2650 0.0072 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:41.8163549Z triton_mm_2642 0.0072 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:41.8164248Z SingleProcess AUTOTUNE benchmarking takes 0.2633 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:17:42.3490805Z Autotune Choices Stats: 2025-09-07T13:17:42.3491925Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_2482", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007519999984651804, "best_triton_pos": 0} 2025-09-07T13:17:42.3595084Z AUTOTUNE addmm(392x1024, 392x144, 144x1024) 2025-09-07T13:17:42.3595396Z strides: [0, 1], [144, 1], [1, 144] 2025-09-07T13:17:42.3595705Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:42.3596397Z triton_mm_2482 0.0075 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:17:42.3597386Z triton_mm_2483 0.0075 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:42.3598021Z bias_addmm 0.0079 ms 95.1% 2025-09-07T13:17:42.3598623Z triton_mm_2486 0.0080 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:42.3599598Z triton_mm_2489 0.0080 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:42.3600576Z triton_mm_2478 0.0081 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:42.3601535Z triton_mm_2484 0.0081 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:42.3602497Z triton_mm_2485 0.0081 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:42.3603487Z triton_mm_2488 0.0082 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:42.3604401Z triton_mm_2477 0.0084 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:42.3605312Z SingleProcess AUTOTUNE benchmarking takes 0.2816 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:17:43.3434468Z Autotune Choices Stats: 2025-09-07T13:17:43.3435880Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_2537", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.007135999854654074, "best_triton_pos": 0} 2025-09-07T13:17:43.3539645Z AUTOTUNE addmm(1568x128, 1568x72, 72x128) 2025-09-07T13:17:43.3539957Z strides: [0, 1], [72, 1], [1, 72] 2025-09-07T13:17:43.3540297Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:43.3541378Z triton_mm_2537 0.0071 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:43.3542617Z triton_mm_2535 0.0074 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:43.3543614Z triton_mm_2541 0.0074 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:17:43.3544446Z triton_mm_2536 0.0075 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:43.3545477Z triton_mm_2542 0.0075 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:43.3546318Z triton_mm_2540 0.0076 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:43.3547160Z triton_mm_2543 0.0077 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:43.3548009Z triton_mm_2546 0.0077 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:17:43.3548861Z triton_mm_2545 0.0078 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:43.3549711Z triton_mm_2544 0.0079 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:43.3550448Z SingleProcess AUTOTUNE benchmarking takes 0.2887 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:17:43.9053819Z Autotune Choices Stats: 2025-09-07T13:17:43.9054796Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_2433", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.007071999832987785, "best_triton_pos": 0} 2025-09-07T13:17:43.9169248Z AUTOTUNE addmm(392x256, 392x144, 144x256) 2025-09-07T13:17:43.9169566Z strides: [0, 1], [144, 1], [1, 144] 2025-09-07T13:17:43.9169925Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:43.9170676Z triton_mm_2433 0.0071 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:43.9171714Z triton_mm_2432 0.0071 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:43.9172711Z triton_mm_2434 0.0072 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:43.9173832Z triton_mm_2431 0.0072 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:43.9174801Z triton_mm_2437 0.0074 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:17:43.9176263Z triton_mm_2438 0.0074 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:43.9177114Z bias_addmm 0.0077 ms 91.7% 2025-09-07T13:17:43.9177809Z triton_mm_2441 0.0077 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:43.9178796Z triton_mm_2440 0.0080 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:43.9179763Z triton_mm_2444 0.0080 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:43.9180608Z SingleProcess AUTOTUNE benchmarking takes 0.2933 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:17:44.7262822Z Autotune Choices Stats: 2025-09-07T13:17:44.7264060Z {"num_choices": 7, "num_triton_choices": 6, "best_kernel": "triton_convolution2d_4", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.028095999732613564, "best_triton_pos": 0} 2025-09-07T13:17:44.7371253Z AUTOTUNE convolution(8x3x224x224, 64x3x3x3) 2025-09-07T13:17:44.7371568Z strides: [150528, 1, 672, 3], [27, 1, 9, 3] 2025-09-07T13:17:44.7371892Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:44.7372677Z triton_convolution2d_4 0.0281 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:44.7373904Z triton_convolution2d_0 0.0315 ms 89.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:44.7375293Z triton_convolution2d_3 0.0320 ms 87.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:44.7376532Z triton_convolution2d_5 0.0377 ms 74.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:44.7377779Z triton_convolution2d_2 0.0404 ms 69.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:44.7378561Z convolution 0.0426 ms 65.9% 2025-09-07T13:17:44.7379309Z triton_convolution2d_1 0.0684 ms 41.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:44.7380292Z SingleProcess AUTOTUNE benchmarking takes 0.1081 seconds and 0.0002 seconds precompiling for 7 choices 2025-09-07T13:17:44.8424864Z Autotune Choices Stats: 2025-09-07T13:17:44.8427596Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01727999933063984, "best_triton_pos": 1, "best_triton_time": 0.021215999498963356, "best_triton_kernel": "triton_convolution2d_11", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T13:17:44.8527659Z AUTOTUNE convolution(8x64x112x112, 64x64x3x3) 2025-09-07T13:17:44.8527991Z strides: [802816, 1, 7168, 64], [576, 1, 192, 64] 2025-09-07T13:17:44.8528721Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:44.8529003Z convolution 0.0173 ms 100.0% 2025-09-07T13:17:44.8529989Z triton_convolution2d_11 0.0212 ms 81.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:44.8531357Z triton_convolution2d_10 0.0214 ms 80.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:44.8532570Z triton_convolution2d_9 0.0221 ms 78.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:44.8533832Z triton_convolution2d_12 0.0259 ms 66.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:44.8535116Z triton_convolution2d_6 0.0288 ms 60.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:44.8536245Z triton_convolution2d_7 0.0332 ms 52.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:44.8537373Z triton_convolution2d_8 0.0573 ms 30.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:44.8538263Z SingleProcess AUTOTUNE benchmarking takes 0.1152 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:44.9577686Z Autotune Choices Stats: 2025-09-07T13:17:44.9579112Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.016063999384641647, "best_triton_pos": 1, "best_triton_time": 0.018783999606966972, "best_triton_kernel": "triton_convolution2d_36", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T13:17:44.9679518Z AUTOTUNE convolution(8x64x56x56, 64x64x3x3) 2025-09-07T13:17:44.9679834Z strides: [200704, 1, 3584, 64], [576, 1, 192, 64] 2025-09-07T13:17:44.9680135Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:44.9680412Z convolution 0.0161 ms 100.0% 2025-09-07T13:17:44.9681140Z triton_convolution2d_36 0.0188 ms 85.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:44.9682372Z triton_convolution2d_35 0.0190 ms 84.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:44.9683600Z triton_convolution2d_34 0.0200 ms 80.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:44.9684819Z triton_convolution2d_31 0.0241 ms 66.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:44.9686252Z triton_convolution2d_37 0.0253 ms 63.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:44.9687385Z triton_convolution2d_32 0.0317 ms 50.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:44.9688871Z triton_convolution2d_33 0.0509 ms 31.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:44.9689848Z SingleProcess AUTOTUNE benchmarking takes 0.1141 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:45.1275275Z Autotune Choices Stats: 2025-09-07T13:17:45.1276647Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.04291199892759323, "best_triton_pos": 1, "best_triton_time": 0.053279999643564224, "best_triton_kernel": "triton_convolution2d_212", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:17:45.1380816Z AUTOTUNE convolution(8x256x56x56, 18x256x3x3) 2025-09-07T13:17:45.1381181Z strides: [802816, 1, 14336, 256], [2304, 1, 768, 256] 2025-09-07T13:17:45.1381602Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:45.1381916Z convolution 0.0429 ms 100.0% 2025-09-07T13:17:45.1382666Z triton_convolution2d_212 0.0533 ms 80.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:45.1383899Z triton_convolution2d_213 0.0541 ms 79.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:45.1385155Z triton_convolution2d_211 0.0558 ms 76.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:45.1386308Z triton_convolution2d_214 0.0669 ms 64.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:45.1387452Z triton_convolution2d_208 0.0695 ms 61.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:45.1388588Z triton_convolution2d_209 0.0898 ms 47.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:45.1389735Z triton_convolution2d_210 0.2112 ms 20.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:45.1390797Z SingleProcess AUTOTUNE benchmarking takes 0.1544 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:45.2452761Z Autotune Choices Stats: 2025-09-07T13:17:45.2453959Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_220", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.012223999947309494, "best_triton_pos": 0} 2025-09-07T13:17:45.2559034Z AUTOTUNE convolution(8x18x56x56, 18x18x3x3) 2025-09-07T13:17:45.2559356Z strides: [56448, 1, 1008, 18], [162, 1, 54, 18] 2025-09-07T13:17:45.2559650Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:45.2560426Z triton_convolution2d_220 0.0122 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:45.2562068Z triton_convolution2d_219 0.0139 ms 87.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:45.2563371Z triton_convolution2d_218 0.0154 ms 79.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:45.2564603Z triton_convolution2d_215 0.0176 ms 69.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:45.2565883Z triton_convolution2d_221 0.0196 ms 62.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:45.2567023Z triton_convolution2d_216 0.0227 ms 54.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:45.2567722Z convolution 0.0341 ms 35.8% 2025-09-07T13:17:45.2568399Z triton_convolution2d_217 0.0468 ms 26.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:45.2569303Z SingleProcess AUTOTUNE benchmarking takes 0.1169 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:45.4256786Z Autotune Choices Stats: 2025-09-07T13:17:45.4258175Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.026335999369621277, "best_triton_pos": 1, "best_triton_time": 0.05379199981689453, "best_triton_kernel": "triton_convolution2d_275", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:17:45.4360766Z AUTOTUNE convolution(8x256x56x56, 36x256x3x3) 2025-09-07T13:17:45.4361115Z strides: [802816, 1, 14336, 256], [2304, 1, 768, 256] 2025-09-07T13:17:45.4361435Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:45.4361715Z convolution 0.0263 ms 100.0% 2025-09-07T13:17:45.4362451Z triton_convolution2d_275 0.0538 ms 49.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:45.4363772Z triton_convolution2d_276 0.0555 ms 47.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:45.4365334Z triton_convolution2d_274 0.0601 ms 43.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:45.4366615Z triton_convolution2d_277 0.0667 ms 39.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:45.4367894Z triton_convolution2d_271 0.0770 ms 34.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:45.4369170Z triton_convolution2d_272 0.1031 ms 25.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:45.4370604Z triton_convolution2d_273 0.2012 ms 13.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:45.4371775Z SingleProcess AUTOTUNE benchmarking takes 0.1568 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:45.5432845Z Autotune Choices Stats: 2025-09-07T13:17:45.5434152Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_283", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.017472000792622566, "best_triton_pos": 0} 2025-09-07T13:17:45.5538815Z AUTOTUNE convolution(8x36x28x28, 36x36x3x3) 2025-09-07T13:17:45.5539169Z strides: [28224, 1, 1008, 36], [324, 1, 108, 36] 2025-09-07T13:17:45.5539496Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:45.5540286Z triton_convolution2d_283 0.0175 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:45.5541659Z triton_convolution2d_282 0.0180 ms 97.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:45.5542900Z triton_convolution2d_281 0.0196 ms 89.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:45.5544150Z triton_convolution2d_278 0.0206 ms 84.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:45.5545431Z triton_convolution2d_284 0.0239 ms 73.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:45.5546587Z triton_convolution2d_279 0.0266 ms 65.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:45.5547287Z convolution 0.0285 ms 61.2% 2025-09-07T13:17:45.5547965Z triton_convolution2d_280 0.0471 ms 37.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:45.5548865Z SingleProcess AUTOTUNE benchmarking takes 0.1169 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:45.8205330Z Autotune Choices Stats: 2025-09-07T13:17:45.8206343Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_335", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006752000190317631, "best_triton_pos": 0} 2025-09-07T13:17:45.8314188Z AUTOTUNE addmm(6272x18, 6272x36, 36x18) 2025-09-07T13:17:45.8314474Z strides: [0, 1], [36, 1], [1, 36] 2025-09-07T13:17:45.8314780Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:45.8315624Z triton_mm_335 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:45.8316608Z triton_mm_336 0.0070 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:45.8317572Z triton_mm_340 0.0070 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:45.8318855Z triton_mm_341 0.0071 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:17:45.8319883Z triton_mm_342 0.0071 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:45.8320841Z triton_mm_334 0.0071 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:17:45.8321802Z triton_mm_339 0.0071 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:45.8322761Z triton_mm_343 0.0071 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:45.8323731Z triton_mm_337 0.0071 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:45.8324697Z triton_mm_348 0.0073 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:17:45.8325569Z SingleProcess AUTOTUNE benchmarking takes 0.2539 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:17:45.9635895Z Autotune Choices Stats: 2025-09-07T13:17:45.9637000Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_412", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.011296000331640244, "best_triton_pos": 0} 2025-09-07T13:17:45.9742791Z AUTOTUNE convolution(8x18x56x56, 36x18x3x3) 2025-09-07T13:17:45.9743132Z strides: [56448, 1, 1008, 18], [162, 1, 54, 18] 2025-09-07T13:17:45.9743465Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:45.9744272Z triton_convolution2d_412 0.0113 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:45.9745829Z triton_convolution2d_411 0.0131 ms 86.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:45.9747094Z triton_convolution2d_410 0.0141 ms 80.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:45.9748343Z triton_convolution2d_407 0.0161 ms 70.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:45.9749598Z triton_convolution2d_413 0.0186 ms 60.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:45.9750834Z triton_convolution2d_408 0.0233 ms 48.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:45.9751593Z convolution 0.0276 ms 40.9% 2025-09-07T13:17:45.9752343Z triton_convolution2d_409 0.0395 ms 28.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:45.9753563Z SingleProcess AUTOTUNE benchmarking takes 0.1159 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:46.1096308Z Autotune Choices Stats: 2025-09-07T13:17:46.1097672Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_491", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.016896000131964684, "best_triton_pos": 0} 2025-09-07T13:17:46.1201863Z AUTOTUNE convolution(8x36x28x28, 72x36x3x3) 2025-09-07T13:17:46.1202175Z strides: [28224, 1, 1008, 36], [324, 1, 108, 36] 2025-09-07T13:17:46.1202469Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:46.1203237Z triton_convolution2d_491 0.0169 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:46.1204731Z triton_convolution2d_492 0.0180 ms 94.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:46.1206312Z triton_convolution2d_490 0.0204 ms 82.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:46.1207565Z triton_convolution2d_487 0.0218 ms 77.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:46.1208806Z triton_convolution2d_493 0.0231 ms 73.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:46.1209583Z convolution 0.0260 ms 64.9% 2025-09-07T13:17:46.1210349Z triton_convolution2d_488 0.0271 ms 62.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:46.1211597Z triton_convolution2d_489 0.0418 ms 40.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:46.1212588Z SingleProcess AUTOTUNE benchmarking takes 0.1156 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:46.2250128Z Autotune Choices Stats: 2025-09-07T13:17:46.2251392Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.016383999958634377, "best_triton_pos": 1, "best_triton_time": 0.023072000592947006, "best_triton_kernel": "triton_convolution2d_498", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:17:46.2355928Z AUTOTUNE convolution(8x72x14x14, 72x72x3x3) 2025-09-07T13:17:46.2356244Z strides: [14112, 1, 1008, 72], [648, 1, 216, 72] 2025-09-07T13:17:46.2356556Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:46.2356828Z convolution 0.0164 ms 100.0% 2025-09-07T13:17:46.2357561Z triton_convolution2d_498 0.0231 ms 71.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:46.2358812Z triton_convolution2d_499 0.0255 ms 64.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:46.2360183Z triton_convolution2d_497 0.0277 ms 59.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:46.2361540Z triton_convolution2d_500 0.0298 ms 55.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:46.2362826Z triton_convolution2d_494 0.0339 ms 48.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:46.2364051Z triton_convolution2d_495 0.0358 ms 45.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:46.2365526Z triton_convolution2d_496 0.0536 ms 30.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:46.2366431Z SingleProcess AUTOTUNE benchmarking takes 0.1149 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:46.5069433Z Autotune Choices Stats: 2025-09-07T13:17:46.5070433Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_558", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.006719999946653843, "best_triton_pos": 0} 2025-09-07T13:17:46.5182322Z AUTOTUNE addmm(1568x18, 1568x72, 72x18) 2025-09-07T13:17:46.5182786Z strides: [0, 1], [72, 1], [1, 72] 2025-09-07T13:17:46.5183281Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:46.5184466Z triton_mm_558 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:46.5185691Z triton_mm_553 0.0068 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:46.5186665Z triton_mm_552 0.0068 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:46.5187617Z triton_mm_560 0.0068 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:46.5188571Z triton_mm_559 0.0069 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:46.5189520Z triton_mm_557 0.0069 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:17:46.5190482Z triton_mm_551 0.0071 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:46.5191447Z triton_mm_563 0.0072 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:46.5192413Z triton_mm_565 0.0072 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:46.5193376Z triton_mm_566 0.0072 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:46.5194442Z SingleProcess AUTOTUNE benchmarking takes 0.2549 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:17:46.8037607Z Autotune Choices Stats: 2025-09-07T13:17:46.8038771Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_631", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.0066559999249875546, "best_triton_pos": 0} 2025-09-07T13:17:46.8153971Z AUTOTUNE addmm(1568x36, 1568x72, 72x36) 2025-09-07T13:17:46.8154540Z strides: [0, 1], [72, 1], [1, 72] 2025-09-07T13:17:46.8155091Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:46.8155782Z triton_mm_631 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:46.8156779Z triton_mm_633 0.0068 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:46.8157745Z triton_mm_639 0.0070 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:46.8158695Z triton_mm_640 0.0071 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:46.8159643Z triton_mm_632 0.0071 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:46.8160597Z triton_mm_637 0.0071 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:17:46.8161555Z triton_mm_641 0.0072 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:46.8162538Z triton_mm_634 0.0073 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:46.8163515Z triton_mm_636 0.0075 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:46.8164487Z triton_mm_642 0.0075 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:17:46.8165406Z SingleProcess AUTOTUNE benchmarking takes 0.2665 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:17:46.9533871Z Autotune Choices Stats: 2025-09-07T13:17:46.9535338Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_726", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.010784000158309937, "best_triton_pos": 0} 2025-09-07T13:17:46.9651077Z AUTOTUNE convolution(8x18x56x56, 18x18x3x3) 2025-09-07T13:17:46.9651425Z strides: [56448, 1, 1008, 18], [162, 1, 54, 18] 2025-09-07T13:17:46.9651750Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:46.9652564Z triton_convolution2d_726 0.0108 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:46.9653862Z triton_convolution2d_725 0.0128 ms 84.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:46.9655667Z triton_convolution2d_724 0.0135 ms 80.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:46.9657041Z triton_convolution2d_721 0.0150 ms 71.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:46.9658344Z triton_convolution2d_727 0.0169 ms 63.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:46.9659616Z triton_convolution2d_722 0.0218 ms 49.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:46.9660400Z convolution 0.0272 ms 39.7% 2025-09-07T13:17:46.9661168Z triton_convolution2d_723 0.0451 ms 23.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:46.9662276Z SingleProcess AUTOTUNE benchmarking takes 0.1186 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:47.0709689Z Autotune Choices Stats: 2025-09-07T13:17:47.0710827Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_732", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.01283199992030859, "best_triton_pos": 0} 2025-09-07T13:17:47.0835121Z AUTOTUNE convolution(8x18x28x28, 72x18x3x3) 2025-09-07T13:17:47.0835488Z strides: [14112, 1, 504, 18], [162, 1, 54, 18] 2025-09-07T13:17:47.0835977Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:47.0836782Z triton_convolution2d_732 0.0128 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:47.0838064Z triton_convolution2d_733 0.0131 ms 97.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:47.0839328Z triton_convolution2d_731 0.0150 ms 85.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:47.0840573Z triton_convolution2d_728 0.0174 ms 73.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:47.0841825Z triton_convolution2d_734 0.0179 ms 71.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:47.0842576Z convolution 0.0235 ms 54.6% 2025-09-07T13:17:47.0843303Z triton_convolution2d_729 0.0262 ms 49.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:47.0844541Z triton_convolution2d_730 0.0286 ms 44.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:47.0845622Z SingleProcess AUTOTUNE benchmarking takes 0.1179 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:47.5562812Z Autotune Choices Stats: 2025-09-07T13:17:47.5564599Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.018880000337958336, "best_triton_pos": 1, "best_triton_time": 0.023264000192284584, "best_triton_kernel": "triton_convolution2d_1563", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:17:47.5672582Z AUTOTUNE convolution(8x72x14x14, 144x72x3x3) 2025-09-07T13:17:47.5672907Z strides: [14112, 1, 1008, 72], [648, 1, 216, 72] 2025-09-07T13:17:47.5673225Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:47.5673540Z convolution 0.0189 ms 100.0% 2025-09-07T13:17:47.5674450Z triton_convolution2d_1563 0.0233 ms 81.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:47.5675855Z triton_convolution2d_1564 0.0290 ms 65.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:47.5677108Z triton_convolution2d_1562 0.0293 ms 64.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:47.5678350Z triton_convolution2d_1565 0.0299 ms 63.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:47.5679577Z triton_convolution2d_1559 0.0393 ms 48.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:47.5680801Z triton_convolution2d_1560 0.0396 ms 47.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:47.5682042Z triton_convolution2d_1561 0.0522 ms 36.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:47.5683031Z SingleProcess AUTOTUNE benchmarking takes 0.1180 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:47.6872586Z Autotune Choices Stats: 2025-09-07T13:17:47.6874002Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.013279999606311321, "best_triton_pos": 1, "best_triton_time": 0.03526400029659271, "best_triton_kernel": "triton_convolution2d_1570", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:17:47.6979993Z AUTOTUNE convolution(8x144x7x7, 144x144x3x3) 2025-09-07T13:17:47.6980324Z strides: [7056, 1, 1008, 144], [1296, 1, 432, 144] 2025-09-07T13:17:47.6980637Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:47.6980925Z convolution 0.0133 ms 100.0% 2025-09-07T13:17:47.6981757Z triton_convolution2d_1570 0.0353 ms 37.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:47.6983012Z triton_convolution2d_1571 0.0450 ms 29.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:47.6984323Z triton_convolution2d_1569 0.0462 ms 28.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:47.6985898Z triton_convolution2d_1572 0.0479 ms 27.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:47.6987104Z triton_convolution2d_1567 0.0655 ms 20.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:47.6988242Z triton_convolution2d_1566 0.0681 ms 19.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:47.6989399Z triton_convolution2d_1568 0.0710 ms 18.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:47.6990311Z SingleProcess AUTOTUNE benchmarking takes 0.1296 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:47.9698648Z Autotune Choices Stats: 2025-09-07T13:17:47.9699636Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_1624", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006976000033318996, "best_triton_pos": 0} 2025-09-07T13:17:47.9810753Z AUTOTUNE addmm(392x18, 392x144, 144x18) 2025-09-07T13:17:47.9811039Z strides: [0, 1], [144, 1], [1, 144] 2025-09-07T13:17:47.9811392Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:47.9812108Z triton_mm_1624 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:47.9813143Z triton_mm_1625 0.0071 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:47.9814173Z triton_mm_1632 0.0071 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:47.9815550Z triton_mm_1633 0.0071 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:17:47.9816540Z triton_mm_1626 0.0072 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:47.9817500Z triton_mm_1629 0.0072 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:17:47.9818471Z triton_mm_1623 0.0073 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:47.9819434Z triton_mm_1631 0.0074 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:47.9820391Z triton_mm_1630 0.0075 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:47.9821414Z triton_mm_1635 0.0075 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:47.9822258Z SingleProcess AUTOTUNE benchmarking takes 0.2541 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:17:48.2696538Z Autotune Choices Stats: 2025-09-07T13:17:48.2697805Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_1722", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.007104000076651573, "best_triton_pos": 0} 2025-09-07T13:17:48.2808049Z AUTOTUNE addmm(392x36, 392x144, 144x36) 2025-09-07T13:17:48.2808342Z strides: [0, 1], [144, 1], [1, 144] 2025-09-07T13:17:48.2808673Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:48.2809403Z triton_mm_1722 0.0071 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:48.2810409Z triton_mm_1723 0.0073 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:48.2811403Z triton_mm_1727 0.0073 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:17:48.2812374Z triton_mm_1730 0.0073 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:48.2813343Z triton_mm_1731 0.0073 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:48.2817012Z triton_mm_1721 0.0073 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:48.2818004Z triton_mm_1724 0.0074 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:48.2818928Z triton_mm_1728 0.0077 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:48.2819833Z triton_mm_1732 0.0077 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:17:48.2820734Z triton_mm_1729 0.0077 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:48.2821595Z SingleProcess AUTOTUNE benchmarking takes 0.2654 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:17:48.5944829Z Autotune Choices Stats: 2025-09-07T13:17:48.5946014Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_1835", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.0072639998979866505, "best_triton_pos": 0} 2025-09-07T13:17:48.6062117Z AUTOTUNE addmm(392x72, 392x144, 144x72) 2025-09-07T13:17:48.6062440Z strides: [0, 1], [144, 1], [1, 144] 2025-09-07T13:17:48.6062782Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:48.6063512Z triton_mm_1835 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:48.6064506Z triton_mm_1836 0.0073 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:48.6065681Z triton_mm_1833 0.0074 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:48.6066958Z triton_mm_1834 0.0074 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:48.6067710Z bias_addmm 0.0076 ms 96.2% 2025-09-07T13:17:48.6068341Z triton_mm_1839 0.0076 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:17:48.6069333Z triton_mm_1840 0.0077 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:48.6070318Z triton_mm_1842 0.0080 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:48.6071300Z triton_mm_1841 0.0082 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:48.6072272Z triton_mm_1846 0.0082 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:48.6073125Z SingleProcess AUTOTUNE benchmarking takes 0.2845 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:17:48.7539331Z Autotune Choices Stats: 2025-09-07T13:17:48.7540922Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_1936", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.01017600018531084, "best_triton_pos": 0} 2025-09-07T13:17:48.7655603Z AUTOTUNE convolution(8x18x28x28, 18x18x3x3) 2025-09-07T13:17:48.7655965Z strides: [14112, 1, 504, 18], [162, 1, 54, 18] 2025-09-07T13:17:48.7656279Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:48.7657076Z triton_convolution2d_1936 0.0102 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:48.7658349Z triton_convolution2d_1935 0.0125 ms 81.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:48.7659616Z triton_convolution2d_1934 0.0132 ms 77.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:48.7660862Z triton_convolution2d_1931 0.0149 ms 68.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:48.7662208Z triton_convolution2d_1937 0.0169 ms 60.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:48.7663445Z triton_convolution2d_1932 0.0219 ms 46.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:48.7664210Z convolution 0.0259 ms 39.3% 2025-09-07T13:17:48.7665383Z triton_convolution2d_1933 0.0454 ms 22.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:48.7666288Z SingleProcess AUTOTUNE benchmarking takes 0.1190 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:48.8714080Z Autotune Choices Stats: 2025-09-07T13:17:48.8715829Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_1942", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.012640000320971012, "best_triton_pos": 0} 2025-09-07T13:17:48.8829388Z AUTOTUNE convolution(8x18x14x14, 144x18x3x3) 2025-09-07T13:17:48.8829714Z strides: [3528, 1, 252, 18], [162, 1, 54, 18] 2025-09-07T13:17:48.8830030Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:48.8830816Z triton_convolution2d_1942 0.0126 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:48.8832072Z triton_convolution2d_1943 0.0147 ms 86.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:48.8833324Z triton_convolution2d_1941 0.0154 ms 82.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:48.8834712Z triton_convolution2d_1944 0.0178 ms 70.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:48.8836315Z triton_convolution2d_1938 0.0184 ms 68.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:48.8837083Z convolution 0.0248 ms 51.0% 2025-09-07T13:17:48.8837811Z triton_convolution2d_1939 0.0253 ms 49.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:48.8839055Z triton_convolution2d_1940 0.0279 ms 45.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:48.8840034Z SingleProcess AUTOTUNE benchmarking takes 0.1169 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:48.9893510Z Autotune Choices Stats: 2025-09-07T13:17:48.9894645Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_1950", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.015296000055968761, "best_triton_pos": 0} 2025-09-07T13:17:49.0003253Z AUTOTUNE convolution(8x36x28x28, 36x36x3x3) 2025-09-07T13:17:49.0003562Z strides: [28224, 1, 1008, 36], [324, 1, 108, 36] 2025-09-07T13:17:49.0003863Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:49.0004714Z triton_convolution2d_1950 0.0153 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:49.0006374Z triton_convolution2d_1949 0.0176 ms 86.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:49.0007647Z triton_convolution2d_1948 0.0196 ms 77.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:49.0008887Z triton_convolution2d_1945 0.0208 ms 73.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:49.0010453Z triton_convolution2d_1951 0.0237 ms 64.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:49.0011321Z convolution 0.0266 ms 57.6% 2025-09-07T13:17:49.0012065Z triton_convolution2d_1946 0.0274 ms 55.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:49.0013317Z triton_convolution2d_1947 0.0424 ms 36.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:49.0014316Z SingleProcess AUTOTUNE benchmarking takes 0.1169 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:49.1050235Z Autotune Choices Stats: 2025-09-07T13:17:49.1051245Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_1956", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.017311999574303627, "best_triton_pos": 0} 2025-09-07T13:17:49.1160954Z AUTOTUNE convolution(8x36x14x14, 144x36x3x3) 2025-09-07T13:17:49.1161267Z strides: [7056, 1, 504, 36], [324, 1, 108, 36] 2025-09-07T13:17:49.1161577Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:49.1162467Z triton_convolution2d_1956 0.0173 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:49.1163717Z triton_convolution2d_1957 0.0216 ms 80.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:49.1165276Z triton_convolution2d_1955 0.0220 ms 78.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:49.1166450Z triton_convolution2d_1958 0.0232 ms 74.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:49.1167583Z triton_convolution2d_1952 0.0249 ms 69.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:49.1168277Z convolution 0.0268 ms 64.6% 2025-09-07T13:17:49.1168950Z triton_convolution2d_1953 0.0271 ms 63.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:49.1170111Z triton_convolution2d_1954 0.0378 ms 45.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:49.1171015Z SingleProcess AUTOTUNE benchmarking takes 0.1153 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:49.4963689Z Autotune Choices Stats: 2025-09-07T13:17:49.4964886Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.013663999736309052, "best_triton_pos": 1, "best_triton_time": 0.053727999329566956, "best_triton_kernel": "triton_convolution2d_2453", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:17:49.5076455Z AUTOTUNE convolution(8x256x7x7, 256x256x3x3) 2025-09-07T13:17:49.5077130Z strides: [12544, 1, 1792, 256], [2304, 1, 768, 256] 2025-09-07T13:17:49.5077552Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:49.5077822Z convolution 0.0137 ms 100.0% 2025-09-07T13:17:49.5078572Z triton_convolution2d_2453 0.0537 ms 25.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:49.5079850Z triton_convolution2d_2452 0.0706 ms 19.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:49.5081103Z triton_convolution2d_2455 0.0720 ms 19.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:49.5082344Z triton_convolution2d_2454 0.0757 ms 18.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:49.5083572Z triton_convolution2d_2450 0.0941 ms 14.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:49.5084902Z triton_convolution2d_2449 0.1057 ms 12.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:49.5086220Z triton_convolution2d_2451 0.1261 ms 10.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:49.5087129Z SingleProcess AUTOTUNE benchmarking takes 0.1530 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:49.7763396Z Autotune Choices Stats: 2025-09-07T13:17:49.7764374Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_2464", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007872000336647034, "best_triton_pos": 0} 2025-09-07T13:17:49.7875717Z AUTOTUNE addmm(392x1024, 392x256, 256x1024) 2025-09-07T13:17:49.7876035Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T13:17:49.7876369Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:49.7877094Z triton_mm_2464 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:49.7878113Z triton_mm_2463 0.0082 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:17:49.7878742Z bias_addmm 0.0082 ms 95.7% 2025-09-07T13:17:49.7879362Z triton_mm_2468 0.0085 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:17:49.7880339Z triton_mm_2467 0.0086 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:49.7881317Z triton_mm_2459 0.0088 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:49.7882544Z triton_mm_2457 0.0089 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:49.7883618Z triton_mm_2466 0.0090 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:49.7884690Z triton_mm_2458 0.0092 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:49.7886004Z triton_mm_2470 0.0092 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:49.7886746Z SingleProcess AUTOTUNE benchmarking takes 0.2786 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:17:49.9156193Z Autotune Choices Stats: 2025-09-07T13:17:49.9157477Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.013439999893307686, "best_triton_pos": 1, "best_triton_time": 0.028960000723600388, "best_triton_kernel": "triton_convolution2d_2557", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:17:49.9265779Z AUTOTUNE convolution(8x128x14x14, 128x128x3x3) 2025-09-07T13:17:49.9266117Z strides: [25088, 1, 1792, 128], [1152, 1, 384, 128] 2025-09-07T13:17:49.9266411Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:49.9266681Z convolution 0.0134 ms 100.0% 2025-09-07T13:17:49.9267507Z triton_convolution2d_2557 0.0290 ms 46.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:49.9268689Z triton_convolution2d_2558 0.0318 ms 42.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:49.9269869Z triton_convolution2d_2556 0.0368 ms 36.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:49.9271029Z triton_convolution2d_2559 0.0385 ms 34.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:49.9272176Z triton_convolution2d_2553 0.0452 ms 29.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:49.9273315Z triton_convolution2d_2554 0.0473 ms 28.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:49.9274461Z triton_convolution2d_2555 0.0999 ms 13.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:49.9275643Z SingleProcess AUTOTUNE benchmarking takes 0.1246 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:50.1946712Z Autotune Choices Stats: 2025-09-07T13:17:50.1948158Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_2571", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.007615999784320593, "best_triton_pos": 0} 2025-09-07T13:17:50.2059620Z AUTOTUNE addmm(1568x512, 1568x128, 128x512) 2025-09-07T13:17:50.2060694Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T13:17:50.2061163Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:50.2062382Z triton_mm_2571 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:50.2063950Z triton_mm_2569 0.0078 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:50.2065948Z triton_mm_2572 0.0079 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:17:50.2067371Z triton_mm_2567 0.0080 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:17:50.2068747Z triton_mm_2568 0.0081 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:50.2070154Z triton_mm_2570 0.0082 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:50.2071564Z triton_mm_2573 0.0083 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:50.2072947Z triton_mm_2574 0.0083 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:50.2074002Z bias_addmm 0.0085 ms 89.5% 2025-09-07T13:17:50.2074883Z triton_mm_2561 0.0087 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:50.2076274Z SingleProcess AUTOTUNE benchmarking takes 0.2779 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:17:50.3206276Z Autotune Choices Stats: 2025-09-07T13:17:50.3208300Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.010239999741315842, "best_triton_pos": 1, "best_triton_time": 0.017376000061631203, "best_triton_kernel": "triton_convolution2d_2664", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T13:17:50.3315945Z AUTOTUNE convolution(8x64x28x28, 64x64x3x3) 2025-09-07T13:17:50.3316436Z strides: [50176, 1, 1792, 64], [576, 1, 192, 64] 2025-09-07T13:17:50.3316891Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:50.3317298Z convolution 0.0102 ms 100.0% 2025-09-07T13:17:50.3318366Z triton_convolution2d_2664 0.0174 ms 58.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:50.3320114Z triton_convolution2d_2663 0.0176 ms 58.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:50.3321855Z triton_convolution2d_2662 0.0177 ms 58.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:50.3323606Z triton_convolution2d_2659 0.0221 ms 46.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:50.3326521Z triton_convolution2d_2665 0.0240 ms 42.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:50.3328471Z triton_convolution2d_2660 0.0280 ms 36.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:50.3330377Z triton_convolution2d_2661 0.0522 ms 19.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:50.3331776Z SingleProcess AUTOTUNE benchmarking takes 0.1146 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:50.6242867Z Autotune Choices Stats: 2025-09-07T13:17:50.6243794Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_2674", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007872000336647034, "best_triton_pos": 0} 2025-09-07T13:17:50.6356428Z AUTOTUNE addmm(6272x256, 6272x64, 64x256) 2025-09-07T13:17:50.6356719Z strides: [0, 1], [64, 1], [1, 64] 2025-09-07T13:17:50.6357009Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:50.6357656Z triton_mm_2674 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:50.6358627Z triton_mm_2677 0.0082 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:50.6359770Z triton_mm_2678 0.0082 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:17:50.6360689Z triton_mm_2684 0.0084 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:50.6361608Z triton_mm_2679 0.0085 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:50.6362503Z triton_mm_2675 0.0086 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:50.6363400Z triton_mm_2683 0.0086 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:50.6364292Z triton_mm_2676 0.0087 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:50.6364860Z bias_addmm 0.0088 ms 89.5% 2025-09-07T13:17:50.6365797Z triton_mm_2680 0.0089 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:50.6366647Z SingleProcess AUTOTUNE benchmarking takes 0.3026 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:17:50.7376232Z Autotune Choices Stats: 2025-09-07T13:17:50.7377416Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.011648000217974186, "best_triton_pos": 1, "best_triton_time": 0.011744000017642975, "best_triton_kernel": "triton_convolution2d_2774", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:17:50.7492285Z AUTOTUNE convolution(8x32x56x56, 32x32x3x3) 2025-09-07T13:17:50.7492570Z strides: [100352, 1, 1792, 32], [288, 1, 96, 32] 2025-09-07T13:17:50.7492839Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:50.7493249Z convolution 0.0116 ms 100.0% 2025-09-07T13:17:50.7494013Z triton_convolution2d_2774 0.0117 ms 99.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:50.7495400Z triton_convolution2d_2775 0.0123 ms 94.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:50.7496481Z triton_convolution2d_2773 0.0124 ms 94.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:50.7497551Z triton_convolution2d_2776 0.0135 ms 86.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:50.7498608Z triton_convolution2d_2770 0.0139 ms 84.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:50.7499665Z triton_convolution2d_2771 0.0178 ms 65.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:50.7500827Z triton_convolution2d_2772 0.0245 ms 47.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:50.7501755Z SingleProcess AUTOTUNE benchmarking takes 0.1131 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:50.9981108Z Autotune Choices Stats: 2025-09-07T13:17:50.9982204Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_2785", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.00886400043964386, "best_triton_pos": 0} 2025-09-07T13:17:51.0101686Z AUTOTUNE addmm(25088x128, 25088x32, 32x128) 2025-09-07T13:17:51.0101979Z strides: [0, 1], [32, 1], [1, 32] 2025-09-07T13:17:51.0102317Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:51.0103056Z triton_mm_2785 0.0089 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:51.0104043Z triton_mm_2783 0.0089 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:17:51.0105431Z triton_mm_2789 0.0090 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:51.0106409Z triton_mm_2787 0.0091 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:51.0107377Z triton_mm_2790 0.0092 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:51.0108343Z triton_mm_2788 0.0092 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:17:51.0109326Z triton_mm_2786 0.0092 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:51.0110652Z triton_mm_2791 0.0094 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:17:51.0111755Z triton_mm_2793 0.0095 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:51.0112740Z triton_mm_2792 0.0097 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:51.0113596Z SingleProcess AUTOTUNE benchmarking takes 0.2595 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:17:51.1278461Z Autotune Choices Stats: 2025-09-07T13:17:51.1279875Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.0180479995906353, "best_triton_pos": 1, "best_triton_time": 0.03187200054526329, "best_triton_kernel": "triton_convolution2d_2815", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:17:51.1396323Z AUTOTUNE convolution(8x128x56x56, 256x128x3x3) 2025-09-07T13:17:51.1396688Z strides: [401408, 1, 7168, 128], [1152, 1, 384, 128] 2025-09-07T13:17:51.1397017Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:51.1397308Z convolution 0.0180 ms 100.0% 2025-09-07T13:17:51.1398404Z triton_convolution2d_2815 0.0319 ms 56.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:51.1399694Z triton_convolution2d_2817 0.0393 ms 45.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:51.1400957Z triton_convolution2d_2814 0.0396 ms 45.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:51.1402226Z triton_convolution2d_2816 0.0444 ms 40.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:51.1403492Z triton_convolution2d_2812 0.0523 ms 34.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:51.1404724Z triton_convolution2d_2811 0.0575 ms 31.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:51.1406345Z triton_convolution2d_2813 0.1017 ms 17.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:51.1407246Z SingleProcess AUTOTUNE benchmarking takes 0.1289 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:51.2891877Z Autotune Choices Stats: 2025-09-07T13:17:51.2893339Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.018624000251293182, "best_triton_pos": 1, "best_triton_time": 0.055135998874902725, "best_triton_kernel": "triton_convolution2d_2822", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:17:51.3009948Z AUTOTUNE convolution(8x256x28x28, 512x256x3x3) 2025-09-07T13:17:51.3010290Z strides: [200704, 1, 7168, 256], [2304, 1, 768, 256] 2025-09-07T13:17:51.3010834Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:51.3011260Z convolution 0.0186 ms 100.0% 2025-09-07T13:17:51.3012015Z triton_convolution2d_2822 0.0551 ms 33.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:51.3013262Z triton_convolution2d_2821 0.0689 ms 27.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:51.3014507Z triton_convolution2d_2824 0.0692 ms 26.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:51.3015911Z triton_convolution2d_2823 0.0741 ms 25.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:51.3016987Z triton_convolution2d_2819 0.0929 ms 20.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:51.3018047Z triton_convolution2d_2818 0.0994 ms 18.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:51.3019249Z triton_convolution2d_2820 0.1976 ms 9.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:51.3020104Z SingleProcess AUTOTUNE benchmarking takes 0.1600 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:51.5036277Z Autotune Choices Stats: 2025-09-07T13:17:51.5037695Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.024960000067949295, "best_triton_pos": 1, "best_triton_time": 0.10966400057077408, "best_triton_kernel": "triton_convolution2d_2829", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:17:51.5153521Z AUTOTUNE convolution(8x512x14x14, 1024x512x3x3) 2025-09-07T13:17:51.5153850Z strides: [100352, 1, 7168, 512], [4608, 1, 1536, 512] 2025-09-07T13:17:51.5154167Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:17:51.5154441Z convolution 0.0250 ms 100.0% 2025-09-07T13:17:51.5155637Z triton_convolution2d_2829 0.1097 ms 22.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:51.5157022Z triton_convolution2d_2831 0.1324 ms 18.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:51.5158289Z triton_convolution2d_2828 0.1337 ms 18.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:51.5159537Z triton_convolution2d_2830 0.1429 ms 17.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:17:51.5160779Z triton_convolution2d_2826 0.1884 ms 13.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:51.5162343Z triton_convolution2d_2825 0.2027 ms 12.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:17:51.5163700Z triton_convolution2d_2827 0.2730 ms 9.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:17:51.5164691Z SingleProcess AUTOTUNE benchmarking takes 0.2131 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:17:51.7896788Z Autotune Choices Stats: 2025-09-07T13:17:51.7898067Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.011615999974310398, "best_triton_pos": 1, "best_triton_time": 0.01196799986064434, "best_triton_kernel": "triton_mm_2844", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T13:17:51.8016320Z AUTOTUNE addmm(392x2048, 392x1024, 1024x2048) 2025-09-07T13:17:51.8016629Z strides: [0, 1], [1024, 1], [1, 1024] 2025-09-07T13:17:51.8016970Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:51.8017313Z bias_addmm 0.0116 ms 100.0% 2025-09-07T13:17:51.8017984Z triton_mm_2844 0.0120 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:17:51.8019175Z triton_mm_2839 0.0136 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:17:51.8020165Z triton_mm_2850 0.0138 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:51.8021144Z triton_mm_2843 0.0143 ms 81.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:51.8022215Z triton_mm_2840 0.0145 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:51.8022827Z addmm 0.0157 ms 73.8% 2025-09-07T13:17:51.8023424Z triton_mm_2849 0.0158 ms 73.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:51.8024466Z triton_mm_2846 0.0162 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:51.8025926Z triton_mm_2842 0.0162 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:17:51.8026783Z SingleProcess AUTOTUNE benchmarking takes 0.2849 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:17:52.0594002Z Autotune Choices Stats: 2025-09-07T13:17:52.0595975Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_2855", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.010879999957978725, "best_triton_pos": 0} 2025-09-07T13:17:52.0712880Z AUTOTUNE addmm(8x1000, 8x2048, 2048x1000) 2025-09-07T13:17:52.0713177Z strides: [0, 1], [2048, 1], [1, 2048] 2025-09-07T13:17:52.0713492Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:17:52.0714388Z triton_mm_2855 0.0109 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:17:52.0715287Z bias_addmm 0.0112 ms 97.1% 2025-09-07T13:17:52.0716215Z triton_mm_2859 0.0118 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:52.0717281Z triton_mm_2863 0.0139 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:17:52.0717925Z addmm 0.0146 ms 74.7% 2025-09-07T13:17:52.0718530Z triton_mm_2867 0.0152 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:17:52.0719519Z triton_mm_2854 0.0171 ms 63.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:17:52.0720488Z triton_mm_2858 0.0180 ms 60.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:17:52.0721448Z triton_mm_2853 0.0181 ms 60.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:17:52.0722416Z triton_mm_2852 0.0188 ms 57.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:17:52.0723387Z SingleProcess AUTOTUNE benchmarking takes 0.2684 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:18:00.6219321Z pass 2025-09-07T13:18:05.9609169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:18:05.9610732Z import pynvml # type: ignore[import] 2025-09-07T13:18:08.9547407Z 2025-09-07T13:18:11.8933374Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:18:11.8933734Z loading model: 0it [00:02, ?it/s] 2025-09-07T13:18:11.9055662Z cuda eval inception_v3 2025-09-07T13:18:39.3261719Z Autotune Choices Stats: 2025-09-07T13:18:39.3262868Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_37", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.012768000364303589, "best_triton_pos": 0} 2025-09-07T13:18:39.3388083Z AUTOTUNE addmm(42632x80, 42632x64, 64x80) 2025-09-07T13:18:39.3388440Z strides: [0, 1], [64, 1], [1, 64] 2025-09-07T13:18:39.3388828Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:39.3389652Z triton_mm_37 0.0128 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:39.3390653Z triton_mm_31 0.0131 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:39.3391613Z triton_mm_32 0.0131 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:18:39.3392220Z bias_addmm 0.0132 ms 96.8% 2025-09-07T13:18:39.3392810Z triton_mm_27 0.0132 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:18:39.3394095Z triton_mm_28 0.0133 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:39.3396506Z triton_mm_33 0.0135 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:39.3397619Z triton_mm_36 0.0136 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:39.3398579Z triton_mm_24 0.0137 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:39.3399546Z triton_mm_38 0.0137 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:39.3400304Z SingleProcess AUTOTUNE benchmarking takes 0.2848 seconds and 0.0004 seconds precompiling for 21 choices 2025-09-07T13:18:39.8854902Z Autotune Choices Stats: 2025-09-07T13:18:39.8856351Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_100", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.009375999681651592, "best_triton_pos": 0} 2025-09-07T13:18:39.8972620Z AUTOTUNE addmm(9800x64, 9800x192, 192x64) 2025-09-07T13:18:39.8972930Z strides: [0, 1], [192, 1], [1, 192] 2025-09-07T13:18:39.8973278Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:39.8974292Z triton_mm_100 0.0094 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:39.8975522Z triton_mm_96 0.0096 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:18:39.8976152Z bias_addmm 0.0098 ms 95.4% 2025-09-07T13:18:39.8976766Z triton_mm_99 0.0098 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:18:39.8977749Z triton_mm_105 0.0098 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:39.8978727Z triton_mm_106 0.0099 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:39.8979784Z triton_mm_103 0.0101 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:18:39.8980681Z triton_mm_98 0.0101 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:39.8981693Z triton_mm_92 0.0102 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:39.8982580Z triton_mm_90 0.0104 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:18:39.8983363Z SingleProcess AUTOTUNE benchmarking takes 0.2695 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:18:40.4139537Z Autotune Choices Stats: 2025-09-07T13:18:40.4140657Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_192", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.010208000428974628, "best_triton_pos": 0} 2025-09-07T13:18:40.4252875Z AUTOTUNE addmm(9800x64, 9800x256, 256x64) 2025-09-07T13:18:40.4253523Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T13:18:40.4254034Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:40.4254793Z triton_mm_192 0.0102 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:40.4256190Z triton_mm_198 0.0102 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:40.4257173Z triton_mm_188 0.0103 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:18:40.4257802Z bias_addmm 0.0104 ms 98.2% 2025-09-07T13:18:40.4258413Z triton_mm_197 0.0105 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:40.4259468Z triton_mm_191 0.0106 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:18:40.4260395Z triton_mm_182 0.0106 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:18:40.4261493Z triton_mm_190 0.0109 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:40.4262415Z triton_mm_195 0.0109 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:18:40.4263325Z triton_mm_184 0.0112 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:40.4264109Z SingleProcess AUTOTUNE benchmarking takes 0.2671 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:18:40.6943010Z Autotune Choices Stats: 2025-09-07T13:18:40.6944041Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_285", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.010847999714314938, "best_triton_pos": 0} 2025-09-07T13:18:40.7138882Z AUTOTUNE addmm(9800x64, 9800x288, 288x64) 2025-09-07T13:18:40.7139211Z strides: [0, 1], [288, 1], [1, 288] 2025-09-07T13:18:40.7139554Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:40.7140277Z triton_mm_285 0.0108 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:40.7140936Z bias_addmm 0.0109 ms 99.7% 2025-09-07T13:18:40.7141642Z triton_mm_284 0.0110 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:18:40.7142624Z triton_mm_291 0.0111 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:40.7143597Z triton_mm_281 0.0111 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:18:40.7144563Z triton_mm_277 0.0113 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:40.7146009Z triton_mm_283 0.0116 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:40.7147136Z triton_mm_288 0.0116 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:18:40.7148154Z triton_mm_290 0.0116 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:40.7149196Z triton_mm_276 0.0119 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:40.7150126Z SingleProcess AUTOTUNE benchmarking takes 0.2690 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:18:41.0209871Z Autotune Choices Stats: 2025-09-07T13:18:41.0210938Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_75", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.00940799992531538, "best_triton_pos": 0} 2025-09-07T13:18:41.0334656Z AUTOTUNE addmm(9800x48, 9800x192, 192x48) 2025-09-07T13:18:41.0335334Z strides: [0, 1], [192, 1], [1, 192] 2025-09-07T13:18:41.0335690Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:41.0336419Z triton_mm_75 0.0094 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:41.0337654Z triton_mm_81 0.0097 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:41.0338653Z triton_mm_74 0.0098 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:18:41.0339669Z triton_mm_80 0.0099 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:41.0340227Z bias_addmm 0.0099 ms 95.1% 2025-09-07T13:18:41.0340734Z triton_mm_78 0.0101 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:18:41.0341671Z triton_mm_73 0.0102 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:41.0342494Z triton_mm_66 0.0103 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:41.0343321Z triton_mm_67 0.0103 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:41.0344146Z triton_mm_65 0.0104 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:18:41.0344877Z SingleProcess AUTOTUNE benchmarking takes 0.2829 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:18:41.5554115Z Autotune Choices Stats: 2025-09-07T13:18:41.5555268Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_167", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.010015999898314476, "best_triton_pos": 0} 2025-09-07T13:18:41.5674915Z AUTOTUNE addmm(9800x48, 9800x256, 256x48) 2025-09-07T13:18:41.5675450Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T13:18:41.5676045Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:41.5676914Z triton_mm_167 0.0100 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:41.5677928Z triton_mm_163 0.0101 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:18:41.5678931Z triton_mm_173 0.0101 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:41.5679927Z triton_mm_157 0.0103 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:18:41.5680563Z bias_addmm 0.0104 ms 96.3% 2025-09-07T13:18:41.5681129Z triton_mm_172 0.0105 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:41.5682021Z triton_mm_166 0.0107 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:18:41.5682943Z triton_mm_170 0.0108 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:18:41.5683953Z triton_mm_159 0.0110 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:41.5684854Z triton_mm_165 0.0111 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:41.5685840Z SingleProcess AUTOTUNE benchmarking takes 0.2712 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:18:41.8456085Z Autotune Choices Stats: 2025-09-07T13:18:41.8457366Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "bias_addmm", "best_time": 0.010495999827980995, "best_triton_pos": 1, "best_triton_time": 0.01065600011497736, "best_triton_kernel": "triton_mm_266", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T13:18:41.8569608Z AUTOTUNE addmm(9800x48, 9800x288, 288x48) 2025-09-07T13:18:41.8569923Z strides: [0, 1], [288, 1], [1, 288] 2025-09-07T13:18:41.8570263Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:41.8570609Z bias_addmm 0.0105 ms 100.0% 2025-09-07T13:18:41.8571252Z triton_mm_266 0.0107 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:41.8572240Z triton_mm_260 0.0107 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:41.8573199Z triton_mm_259 0.0108 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:18:41.8574150Z triton_mm_256 0.0109 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:18:41.8575422Z triton_mm_263 0.0110 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:18:41.8576801Z triton_mm_252 0.0112 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:41.8577865Z triton_mm_265 0.0114 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:41.8578825Z triton_mm_258 0.0115 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:41.8579807Z triton_mm_251 0.0119 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:41.8580571Z SingleProcess AUTOTUNE benchmarking takes 0.2699 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:18:42.1507777Z Autotune Choices Stats: 2025-09-07T13:18:42.1508777Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_744", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009855999611318111, "best_triton_pos": 0} 2025-09-07T13:18:42.1620633Z AUTOTUNE addmm(2312x192, 2312x768, 768x192) 2025-09-07T13:18:42.1620934Z strides: [0, 1], [768, 1], [1, 768] 2025-09-07T13:18:42.1621268Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:42.1622112Z triton_mm_744 0.0099 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:42.1622965Z bias_addmm 0.0102 ms 96.2% 2025-09-07T13:18:42.1623586Z triton_mm_748 0.0111 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:18:42.1624560Z triton_mm_743 0.0118 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:18:42.1625873Z triton_mm_747 0.0125 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:42.1626826Z triton_mm_740 0.0125 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:42.1627785Z triton_mm_754 0.0125 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:42.1628752Z triton_mm_739 0.0127 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:42.1629807Z triton_mm_737 0.0133 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:18:42.1630776Z triton_mm_746 0.0134 ms 73.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:18:42.1631617Z SingleProcess AUTOTUNE benchmarking takes 0.2862 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T13:18:42.7508500Z Autotune Choices Stats: 2025-09-07T13:18:42.7509676Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_508", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.010080000385642052, "best_triton_pos": 0} 2025-09-07T13:18:42.7622024Z AUTOTUNE addmm(2312x160, 2312x768, 768x160) 2025-09-07T13:18:42.7622338Z strides: [0, 1], [768, 1], [1, 768] 2025-09-07T13:18:42.7622929Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:42.7623791Z triton_mm_508 0.0101 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:42.7624441Z bias_addmm 0.0108 ms 92.9% 2025-09-07T13:18:42.7625464Z triton_mm_512 0.0112 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:18:42.7626463Z triton_mm_507 0.0117 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:18:42.7627442Z triton_mm_504 0.0124 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:42.7628416Z triton_mm_511 0.0125 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:42.7629380Z triton_mm_518 0.0129 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:42.7630557Z triton_mm_501 0.0131 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:18:42.7631634Z triton_mm_503 0.0131 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:42.7632249Z addmm 0.0133 ms 75.5% 2025-09-07T13:18:42.7632700Z SingleProcess AUTOTUNE benchmarking takes 0.2872 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T13:18:43.3480050Z Autotune Choices Stats: 2025-09-07T13:18:43.3481493Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_390", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009759999811649323, "best_triton_pos": 0} 2025-09-07T13:18:43.3599580Z AUTOTUNE addmm(2312x128, 2312x768, 768x128) 2025-09-07T13:18:43.3600046Z strides: [0, 1], [768, 1], [1, 768] 2025-09-07T13:18:43.3600479Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:43.3601456Z triton_mm_390 0.0098 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:43.3602328Z bias_addmm 0.0100 ms 97.1% 2025-09-07T13:18:43.3603156Z triton_mm_394 0.0109 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:18:43.3604493Z triton_mm_389 0.0119 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:18:43.3606298Z triton_mm_386 0.0120 ms 81.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:43.3607629Z triton_mm_393 0.0124 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:43.3608938Z triton_mm_385 0.0126 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:43.3610719Z triton_mm_383 0.0127 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:18:43.3612201Z triton_mm_400 0.0128 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:43.3613166Z addmm 0.0132 ms 74.0% 2025-09-07T13:18:43.3613791Z SingleProcess AUTOTUNE benchmarking takes 0.2867 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T13:18:43.9280035Z Autotune Choices Stats: 2025-09-07T13:18:43.9281322Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_957", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.010111999697983265, "best_triton_pos": 0} 2025-09-07T13:18:43.9405469Z AUTOTUNE addmm(512x448, 512x1280, 1280x448) 2025-09-07T13:18:43.9405816Z strides: [0, 1], [1280, 1], [1, 1280] 2025-09-07T13:18:43.9406153Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:43.9406894Z triton_mm_957 0.0101 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:43.9407898Z triton_mm_961 0.0105 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:43.9408540Z bias_addmm 0.0105 ms 96.3% 2025-09-07T13:18:43.9409571Z triton_mm_965 0.0120 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:18:43.9410272Z addmm 0.0133 ms 75.8% 2025-09-07T13:18:43.9410859Z triton_mm_956 0.0140 ms 72.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:43.9411744Z triton_mm_960 0.0143 ms 70.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:18:43.9412636Z triton_mm_955 0.0143 ms 70.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:43.9413523Z triton_mm_971 0.0144 ms 70.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:43.9414413Z triton_mm_964 0.0148 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:43.9415355Z SingleProcess AUTOTUNE benchmarking takes 0.2904 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T13:18:44.5236947Z Autotune Choices Stats: 2025-09-07T13:18:44.5238285Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.011296000331640244, "best_triton_pos": 1, "best_triton_time": 0.012768000364303589, "best_triton_kernel": "triton_mm_1072", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T13:18:44.5372169Z AUTOTUNE addmm(512x448, 512x2048, 2048x448) 2025-09-07T13:18:44.5372633Z strides: [0, 1], [2048, 1], [1, 2048] 2025-09-07T13:18:44.5373124Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:44.5373619Z bias_addmm 0.0113 ms 100.0% 2025-09-07T13:18:44.5374579Z triton_mm_1072 0.0128 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:44.5377259Z triton_mm_1076 0.0144 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:18:44.5378375Z addmm 0.0150 ms 75.3% 2025-09-07T13:18:44.5379279Z triton_mm_1082 0.0184 ms 61.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:44.5380814Z triton_mm_1071 0.0196 ms 57.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:18:44.5381801Z triton_mm_1075 0.0199 ms 56.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:44.5382712Z triton_mm_1081 0.0223 ms 50.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:44.5383611Z triton_mm_1074 0.0236 ms 48.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:18:44.5384506Z triton_mm_1078 0.0238 ms 47.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:18:44.5385417Z SingleProcess AUTOTUNE benchmarking takes 0.3376 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T13:18:44.8376359Z Autotune Choices Stats: 2025-09-07T13:18:44.8378255Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_924", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009855999611318111, "best_triton_pos": 0} 2025-09-07T13:18:44.8497772Z AUTOTUNE addmm(512x384, 512x1280, 1280x384) 2025-09-07T13:18:44.8498227Z strides: [0, 1], [1280, 1], [1, 1280] 2025-09-07T13:18:44.8498728Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:44.8499905Z triton_mm_924 0.0099 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:44.8500870Z bias_addmm 0.0101 ms 97.5% 2025-09-07T13:18:44.8501554Z triton_mm_928 0.0103 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:44.8502527Z triton_mm_932 0.0121 ms 81.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:18:44.8503147Z addmm 0.0136 ms 72.6% 2025-09-07T13:18:44.8503724Z triton_mm_923 0.0136 ms 72.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:44.8504671Z triton_mm_927 0.0139 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:18:44.8506062Z triton_mm_922 0.0142 ms 69.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:44.8507025Z triton_mm_938 0.0144 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:44.8507990Z triton_mm_921 0.0148 ms 66.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:18:44.8509024Z SingleProcess AUTOTUNE benchmarking takes 0.2870 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T13:18:45.3833124Z Autotune Choices Stats: 2025-09-07T13:18:45.3834621Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.011744000017642975, "best_triton_pos": 1, "best_triton_time": 0.012160000391304493, "best_triton_kernel": "triton_mm_1035", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T13:18:45.3950128Z AUTOTUNE addmm(512x384, 512x2048, 2048x384) 2025-09-07T13:18:45.3950492Z strides: [0, 1], [2048, 1], [1, 2048] 2025-09-07T13:18:45.3950862Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:45.3951191Z bias_addmm 0.0117 ms 100.0% 2025-09-07T13:18:45.3951804Z triton_mm_1035 0.0122 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:45.3952805Z triton_mm_1039 0.0128 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:45.3953787Z triton_mm_1043 0.0146 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:18:45.3954414Z addmm 0.0149 ms 78.8% 2025-09-07T13:18:45.3955148Z triton_mm_1049 0.0186 ms 63.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:45.3956304Z triton_mm_1034 0.0188 ms 62.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:45.3957327Z triton_mm_1033 0.0189 ms 62.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:45.3958308Z triton_mm_1038 0.0197 ms 59.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:18:45.3959277Z triton_mm_1032 0.0197 ms 59.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:18:45.3960123Z SingleProcess AUTOTUNE benchmarking takes 0.2878 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T13:18:45.9134389Z Autotune Choices Stats: 2025-09-07T13:18:45.9136614Z {"num_choices": 7, "num_triton_choices": 6, "best_kernel": "triton_convolution2d_4", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.033695999532938004, "best_triton_pos": 0} 2025-09-07T13:18:45.9255546Z AUTOTUNE convolution(8x3x299x299, 32x3x3x3) 2025-09-07T13:18:45.9256074Z strides: [268203, 1, 897, 3], [27, 1, 9, 3] 2025-09-07T13:18:45.9256549Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:45.9257791Z triton_convolution2d_4 0.0337 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:45.9259770Z triton_convolution2d_0 0.0357 ms 94.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:45.9261599Z triton_convolution2d_2 0.0361 ms 93.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:45.9263069Z triton_convolution2d_3 0.0397 ms 84.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:45.9263829Z convolution 0.0397 ms 84.9% 2025-09-07T13:18:45.9264455Z triton_convolution2d_5 0.0487 ms 69.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:45.9265666Z triton_convolution2d_1 0.0627 ms 53.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:45.9266512Z SingleProcess AUTOTUNE benchmarking takes 0.1105 seconds and 0.0002 seconds precompiling for 7 choices 2025-09-07T13:18:46.0406852Z Autotune Choices Stats: 2025-09-07T13:18:46.0408651Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_9", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.02908799983561039, "best_triton_pos": 0} 2025-09-07T13:18:46.0530686Z AUTOTUNE convolution(8x32x149x149, 32x32x3x3) 2025-09-07T13:18:46.0531275Z strides: [710432, 1, 4768, 32], [288, 1, 96, 32] 2025-09-07T13:18:46.0531779Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:46.0533433Z triton_convolution2d_9 0.0291 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:46.0537361Z triton_convolution2d_12 0.0304 ms 95.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:46.0539330Z triton_convolution2d_10 0.0319 ms 91.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:46.0541325Z triton_convolution2d_11 0.0324 ms 89.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:46.0542530Z triton_convolution2d_6 0.0398 ms 73.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:46.0543658Z triton_convolution2d_7 0.0412 ms 70.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:46.0544798Z triton_convolution2d_8 0.0635 ms 45.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:46.0545619Z convolution 0.1619 ms 18.0% 2025-09-07T13:18:46.0546068Z SingleProcess AUTOTUNE benchmarking takes 0.1270 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:46.1760040Z Autotune Choices Stats: 2025-09-07T13:18:46.1762130Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.031199999153614044, "best_triton_pos": 1, "best_triton_time": 0.03574400022625923, "best_triton_kernel": "triton_convolution2d_16", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T13:18:46.1881831Z AUTOTUNE convolution(8x32x147x147, 64x32x3x3) 2025-09-07T13:18:46.1882413Z strides: [691488, 1, 4704, 32], [288, 1, 96, 32] 2025-09-07T13:18:46.1883159Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:46.1883768Z convolution 0.0312 ms 100.0% 2025-09-07T13:18:46.1885260Z triton_convolution2d_16 0.0357 ms 87.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:46.1887224Z triton_convolution2d_19 0.0378 ms 82.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:46.1889144Z triton_convolution2d_17 0.0397 ms 78.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:46.1891167Z triton_convolution2d_18 0.0422 ms 73.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:46.1892563Z triton_convolution2d_14 0.0543 ms 57.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:46.1893678Z triton_convolution2d_13 0.0823 ms 37.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:46.1894894Z triton_convolution2d_15 0.1182 ms 26.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:46.1895940Z SingleProcess AUTOTUNE benchmarking takes 0.1347 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:46.3358987Z Autotune Choices Stats: 2025-09-07T13:18:46.3360455Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.05632000043988228, "best_triton_pos": 1, "best_triton_time": 0.06047999858856201, "best_triton_kernel": "triton_convolution2d_45", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T13:18:46.3482251Z AUTOTUNE convolution(8x80x73x73, 192x80x3x3) 2025-09-07T13:18:46.3482952Z strides: [426320, 1, 5840, 80], [720, 1, 240, 80] 2025-09-07T13:18:46.3483455Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:46.3483909Z convolution 0.0563 ms 100.0% 2025-09-07T13:18:46.3485583Z triton_convolution2d_45 0.0605 ms 93.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:46.3487643Z triton_convolution2d_44 0.0721 ms 78.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:46.3489600Z triton_convolution2d_43 0.0727 ms 77.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:46.3491607Z triton_convolution2d_42 0.0732 ms 77.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:46.3492733Z triton_convolution2d_39 0.0753 ms 74.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:46.3494196Z triton_convolution2d_40 0.0810 ms 69.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:46.3495565Z triton_convolution2d_41 0.1851 ms 30.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:46.3496472Z SingleProcess AUTOTUNE benchmarking takes 0.1591 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:46.4693043Z Autotune Choices Stats: 2025-09-07T13:18:46.4694519Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.022143999114632607, "best_triton_pos": 1, "best_triton_time": 0.03776000067591667, "best_triton_kernel": "triton_convolution2d_86", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=5, KERNEL_W=5, PADDING_H=2, PADDING_W=2, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:18:46.4817027Z AUTOTUNE convolution(8x48x35x35, 64x48x5x5) 2025-09-07T13:18:46.4817707Z strides: [58800, 1, 1680, 48], [1200, 1, 240, 48] 2025-09-07T13:18:46.4818224Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:46.4818683Z convolution 0.0221 ms 100.0% 2025-09-07T13:18:46.4819926Z triton_convolution2d_86 0.0378 ms 58.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=5, KERNEL_W=5, PADDING_H=2, PADDING_W=2, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:46.4822130Z triton_convolution2d_87 0.0389 ms 57.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=5, KERNEL_W=5, PADDING_H=2, PADDING_W=2, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:46.4823209Z triton_convolution2d_85 0.0420 ms 52.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=5, KERNEL_W=5, PADDING_H=2, PADDING_W=2, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:46.4824279Z triton_convolution2d_88 0.0496 ms 44.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=5, KERNEL_W=5, PADDING_H=2, PADDING_W=2, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:46.4825471Z triton_convolution2d_82 0.0497 ms 44.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=5, KERNEL_W=5, PADDING_H=2, PADDING_W=2, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:46.4826525Z triton_convolution2d_83 0.0562 ms 39.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=5, KERNEL_W=5, PADDING_H=2, PADDING_W=2, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:46.4827582Z triton_convolution2d_84 0.0834 ms 26.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=5, KERNEL_W=5, PADDING_H=2, PADDING_W=2, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:46.4828423Z SingleProcess AUTOTUNE benchmarking takes 0.1290 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:46.5834407Z Autotune Choices Stats: 2025-09-07T13:18:46.5958397Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_111", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.017055999487638474, "best_triton_pos": 0} 2025-09-07T13:18:46.5959520Z AUTOTUNE convolution(8x64x35x35, 96x64x3x3) 2025-09-07T13:18:46.5959827Z strides: [78400, 1, 2240, 64], [576, 1, 192, 64] 2025-09-07T13:18:46.5960262Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:46.5961124Z triton_convolution2d_111 0.0171 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:46.5961949Z convolution 0.0171 ms 99.8% 2025-09-07T13:18:46.5962627Z triton_convolution2d_112 0.0177 ms 96.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:46.5963769Z triton_convolution2d_110 0.0184 ms 92.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:46.5964910Z triton_convolution2d_107 0.0232 ms 73.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:46.5966219Z triton_convolution2d_113 0.0238 ms 71.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:46.5967352Z triton_convolution2d_108 0.0262 ms 65.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:46.5968485Z triton_convolution2d_109 0.0503 ms 33.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:46.5969469Z SingleProcess AUTOTUNE benchmarking takes 0.1133 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:46.6999862Z Autotune Choices Stats: 2025-09-07T13:18:46.7001314Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.0163199994713068, "best_triton_pos": 1, "best_triton_time": 0.023296000435948372, "best_triton_kernel": "triton_convolution2d_119", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T13:18:46.7126283Z AUTOTUNE convolution(8x96x35x35, 96x96x3x3) 2025-09-07T13:18:46.7126858Z strides: [117600, 1, 3360, 96], [864, 1, 288, 96] 2025-09-07T13:18:46.7127381Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:46.7127870Z convolution 0.0163 ms 100.0% 2025-09-07T13:18:46.7129130Z triton_convolution2d_119 0.0233 ms 70.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:46.7131300Z triton_convolution2d_118 0.0243 ms 67.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:46.7132758Z triton_convolution2d_117 0.0288 ms 56.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:46.7133996Z triton_convolution2d_120 0.0298 ms 54.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:46.7135463Z triton_convolution2d_114 0.0347 ms 47.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:46.7136693Z triton_convolution2d_115 0.0371 ms 44.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:46.7138256Z triton_convolution2d_116 0.0607 ms 26.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:46.7139305Z SingleProcess AUTOTUNE benchmarking takes 0.1163 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:46.9524609Z Autotune Choices Stats: 2025-09-07T13:18:46.9526540Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_131", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.008895999751985073, "best_triton_pos": 0} 2025-09-07T13:18:46.9655472Z AUTOTUNE addmm(9800x32, 9800x192, 192x32) 2025-09-07T13:18:46.9655986Z strides: [0, 1], [192, 1], [1, 192] 2025-09-07T13:18:46.9656511Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:46.9657653Z triton_mm_131 0.0089 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:46.9659203Z triton_mm_128 0.0090 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:18:46.9660729Z triton_mm_137 0.0091 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:46.9662286Z triton_mm_123 0.0092 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:46.9662873Z bias_addmm 0.0093 ms 95.5% 2025-09-07T13:18:46.9663440Z triton_mm_132 0.0094 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:18:46.9664354Z triton_mm_134 0.0094 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:18:46.9665395Z triton_mm_124 0.0095 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:46.9666270Z triton_mm_129 0.0095 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:46.9667156Z triton_mm_136 0.0095 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:46.9667940Z SingleProcess AUTOTUNE benchmarking takes 0.2521 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:18:47.1607357Z Autotune Choices Stats: 2025-09-07T13:18:47.1609536Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.02396799996495247, "best_triton_pos": 1, "best_triton_time": 0.062272001057863235, "best_triton_kernel": "triton_convolution2d_328", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:18:47.1733171Z AUTOTUNE convolution(8x288x35x35, 384x288x3x3) 2025-09-07T13:18:47.1733726Z strides: [352800, 1, 10080, 288], [2592, 1, 864, 288] 2025-09-07T13:18:47.1734240Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:47.1734699Z convolution 0.0240 ms 100.0% 2025-09-07T13:18:47.1736603Z triton_convolution2d_328 0.0623 ms 38.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.1738839Z triton_convolution2d_327 0.0774 ms 31.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.1741011Z triton_convolution2d_330 0.0793 ms 30.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.1742606Z triton_convolution2d_329 0.0808 ms 29.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.1743748Z triton_convolution2d_325 0.1032 ms 23.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.1744880Z triton_convolution2d_324 0.1148 ms 20.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.1746152Z triton_convolution2d_326 0.2211 ms 10.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:47.1747054Z SingleProcess AUTOTUNE benchmarking takes 0.1695 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:47.2830718Z Autotune Choices Stats: 2025-09-07T13:18:47.2832223Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.013472000136971474, "best_triton_pos": 1, "best_triton_time": 0.02252800017595291, "best_triton_kernel": "triton_convolution2d_361", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T13:18:47.2949408Z AUTOTUNE convolution(8x96x35x35, 96x96x3x3) 2025-09-07T13:18:47.2949736Z strides: [117600, 1, 3360, 96], [864, 1, 288, 96] 2025-09-07T13:18:47.2950049Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:47.2950331Z convolution 0.0135 ms 100.0% 2025-09-07T13:18:47.2951186Z triton_convolution2d_361 0.0225 ms 59.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.2952492Z triton_convolution2d_360 0.0232 ms 58.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.2953719Z triton_convolution2d_359 0.0276 ms 48.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.2955191Z triton_convolution2d_362 0.0299 ms 45.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.2956411Z triton_convolution2d_356 0.0337 ms 39.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.2957624Z triton_convolution2d_357 0.0391 ms 34.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.2959051Z triton_convolution2d_358 0.0793 ms 17.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:47.2960148Z SingleProcess AUTOTUNE benchmarking takes 0.1172 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:47.4052132Z Autotune Choices Stats: 2025-09-07T13:18:47.4054270Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01283199992030859, "best_triton_pos": 1, "best_triton_time": 0.024000000208616257, "best_triton_kernel": "triton_convolution2d_405", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:18:47.4172203Z AUTOTUNE convolution(8x128x17x17, 128x128x1x7) 2025-09-07T13:18:47.4172729Z strides: [36992, 1, 2176, 128], [896, 1, 896, 128] 2025-09-07T13:18:47.4173233Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:47.4173680Z convolution 0.0128 ms 100.0% 2025-09-07T13:18:47.4174883Z triton_convolution2d_405 0.0240 ms 53.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.4177109Z triton_convolution2d_406 0.0258 ms 49.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.4179311Z triton_convolution2d_404 0.0299 ms 42.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.4181495Z triton_convolution2d_407 0.0299 ms 42.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.4182776Z triton_convolution2d_401 0.0332 ms 38.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.4183909Z triton_convolution2d_402 0.0362 ms 35.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.4185161Z triton_convolution2d_403 0.0724 ms 17.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:47.4186077Z SingleProcess AUTOTUNE benchmarking takes 0.1178 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:47.5230677Z Autotune Choices Stats: 2025-09-07T13:18:47.5232584Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.013024000450968742, "best_triton_pos": 1, "best_triton_time": 0.024032000452280045, "best_triton_kernel": "triton_convolution2d_412", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:18:47.5350319Z AUTOTUNE convolution(8x128x17x17, 192x128x7x1) 2025-09-07T13:18:47.5350621Z strides: [36992, 1, 2176, 128], [896, 1, 128, 128] 2025-09-07T13:18:47.5350921Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:47.5351195Z convolution 0.0130 ms 100.0% 2025-09-07T13:18:47.5351929Z triton_convolution2d_412 0.0240 ms 54.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.5353417Z triton_convolution2d_411 0.0275 ms 47.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.5354715Z triton_convolution2d_413 0.0294 ms 44.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.5356119Z triton_convolution2d_414 0.0298 ms 43.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.5357346Z triton_convolution2d_409 0.0358 ms 36.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.5358561Z triton_convolution2d_408 0.0363 ms 35.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.5359783Z triton_convolution2d_410 0.0712 ms 18.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:47.5360765Z SingleProcess AUTOTUNE benchmarking takes 0.1171 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:47.6409958Z Autotune Choices Stats: 2025-09-07T13:18:47.6412480Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.013088000006973743, "best_triton_pos": 1, "best_triton_time": 0.024159999564290047, "best_triton_kernel": "triton_convolution2d_438", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:18:47.6531183Z AUTOTUNE convolution(8x128x17x17, 128x128x7x1) 2025-09-07T13:18:47.6531744Z strides: [36992, 1, 2176, 128], [896, 1, 128, 128] 2025-09-07T13:18:47.6532245Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:47.6532691Z convolution 0.0131 ms 100.0% 2025-09-07T13:18:47.6533887Z triton_convolution2d_438 0.0242 ms 54.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.6536155Z triton_convolution2d_439 0.0261 ms 50.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.6538167Z triton_convolution2d_437 0.0289 ms 45.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.6540120Z triton_convolution2d_440 0.0297 ms 44.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.6542139Z triton_convolution2d_434 0.0325 ms 40.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.6543270Z triton_convolution2d_435 0.0347 ms 37.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.6544399Z triton_convolution2d_436 0.0722 ms 18.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:47.6545622Z SingleProcess AUTOTUNE benchmarking takes 0.1173 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:47.7747053Z Autotune Choices Stats: 2025-09-07T13:18:47.7748835Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_459", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.02409599907696247, "best_triton_pos": 0} 2025-09-07T13:18:47.7873570Z AUTOTUNE convolution(8x128x17x17, 192x128x1x7) 2025-09-07T13:18:47.7873899Z strides: [36992, 1, 2176, 128], [896, 1, 896, 128] 2025-09-07T13:18:47.7874213Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:47.7875237Z triton_convolution2d_459 0.0241 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.7876483Z triton_convolution2d_458 0.0286 ms 84.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.7877715Z triton_convolution2d_460 0.0299 ms 80.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.7879070Z triton_convolution2d_461 0.0300 ms 80.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.7879828Z convolution 0.0303 ms 79.6% 2025-09-07T13:18:47.7880552Z triton_convolution2d_456 0.0368 ms 65.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.7881784Z triton_convolution2d_455 0.0375 ms 64.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.7882962Z triton_convolution2d_457 0.0721 ms 33.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:47.7883854Z SingleProcess AUTOTUNE benchmarking takes 0.1255 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:47.9070466Z Autotune Choices Stats: 2025-09-07T13:18:47.9071925Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.016672000288963318, "best_triton_pos": 1, "best_triton_time": 0.028704000636935234, "best_triton_kernel": "triton_convolution2d_523", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:18:47.9282950Z AUTOTUNE convolution(8x160x17x17, 160x160x1x7) 2025-09-07T13:18:47.9283340Z strides: [46240, 1, 2720, 160], [1120, 1, 1120, 160] 2025-09-07T13:18:47.9283679Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:47.9283981Z convolution 0.0167 ms 100.0% 2025-09-07T13:18:47.9284773Z triton_convolution2d_523 0.0287 ms 58.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.9288905Z triton_convolution2d_525 0.0344 ms 48.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.9290535Z triton_convolution2d_522 0.0351 ms 47.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.9292035Z triton_convolution2d_524 0.0354 ms 47.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:47.9293345Z triton_convolution2d_520 0.0449 ms 37.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.9294655Z triton_convolution2d_519 0.0463 ms 36.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:47.9296144Z triton_convolution2d_521 0.0891 ms 18.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:47.9297173Z SingleProcess AUTOTUNE benchmarking takes 0.1326 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:48.0636641Z Autotune Choices Stats: 2025-09-07T13:18:48.0637895Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_532", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.03404799848794937, "best_triton_pos": 0} 2025-09-07T13:18:48.0766422Z AUTOTUNE convolution(8x160x17x17, 192x160x7x1) 2025-09-07T13:18:48.0766995Z strides: [46240, 1, 2720, 160], [1120, 1, 160, 160] 2025-09-07T13:18:48.0767511Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:48.0768812Z triton_convolution2d_532 0.0340 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.0770809Z triton_convolution2d_530 0.0350 ms 97.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.0772444Z triton_convolution2d_531 0.0351 ms 97.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.0773597Z triton_convolution2d_529 0.0431 ms 79.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.0774747Z triton_convolution2d_527 0.0439 ms 77.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.0776206Z triton_convolution2d_526 0.0477 ms 71.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.0776900Z convolution 0.0828 ms 41.1% 2025-09-07T13:18:48.0777584Z triton_convolution2d_528 0.0859 ms 39.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:48.0778484Z SingleProcess AUTOTUNE benchmarking takes 0.1468 seconds and 0.0004 seconds precompiling for 8 choices 2025-09-07T13:18:48.1889601Z Autotune Choices Stats: 2025-09-07T13:18:48.1892159Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.015552000142633915, "best_triton_pos": 1, "best_triton_time": 0.02828799933195114, "best_triton_kernel": "triton_convolution2d_556", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:18:48.2021238Z AUTOTUNE convolution(8x160x17x17, 160x160x7x1) 2025-09-07T13:18:48.2021699Z strides: [46240, 1, 2720, 160], [1120, 1, 160, 160] 2025-09-07T13:18:48.2022215Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:48.2034332Z convolution 0.0156 ms 100.0% 2025-09-07T13:18:48.2035305Z triton_convolution2d_556 0.0283 ms 55.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.2036583Z triton_convolution2d_555 0.0342 ms 45.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.2037842Z triton_convolution2d_557 0.0345 ms 45.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.2039060Z triton_convolution2d_558 0.0345 ms 45.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.2040416Z triton_convolution2d_553 0.0429 ms 36.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.2041669Z triton_convolution2d_552 0.0454 ms 34.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.2042781Z triton_convolution2d_554 0.0879 ms 17.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:48.2043574Z SingleProcess AUTOTUNE benchmarking takes 0.1249 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:48.3323534Z Autotune Choices Stats: 2025-09-07T13:18:48.3325812Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_577", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.028672000393271446, "best_triton_pos": 0} 2025-09-07T13:18:48.3460122Z AUTOTUNE convolution(8x160x17x17, 192x160x1x7) 2025-09-07T13:18:48.3460456Z strides: [46240, 1, 2720, 160], [1120, 1, 1120, 160] 2025-09-07T13:18:48.3460781Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:48.3461913Z triton_convolution2d_577 0.0287 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.3463120Z convolution 0.0314 ms 91.3% 2025-09-07T13:18:48.3464324Z triton_convolution2d_576 0.0359 ms 79.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.3466589Z triton_convolution2d_578 0.0360 ms 79.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.3468952Z triton_convolution2d_579 0.0375 ms 76.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.3471056Z triton_convolution2d_574 0.0452 ms 63.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.3472787Z triton_convolution2d_573 0.0480 ms 59.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.3474014Z triton_convolution2d_575 0.0884 ms 32.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:48.3475119Z SingleProcess AUTOTUNE benchmarking takes 0.1343 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:48.5077096Z Autotune Choices Stats: 2025-09-07T13:18:48.5078462Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.03311999887228012, "best_triton_pos": 1, "best_triton_time": 0.03440000116825104, "best_triton_kernel": "triton_convolution2d_759", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:18:48.5200747Z AUTOTUNE convolution(8x192x17x17, 192x192x1x7) 2025-09-07T13:18:48.5201087Z strides: [55488, 1, 3264, 192], [1344, 1, 1344, 192] 2025-09-07T13:18:48.5201693Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:48.5202257Z convolution 0.0331 ms 100.0% 2025-09-07T13:18:48.5203511Z triton_convolution2d_759 0.0344 ms 96.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.5205753Z triton_convolution2d_758 0.0418 ms 79.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.5207701Z triton_convolution2d_760 0.0422 ms 78.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.5209620Z triton_convolution2d_761 0.0435 ms 76.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.5211642Z triton_convolution2d_756 0.0532 ms 62.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.5212889Z triton_convolution2d_755 0.0545 ms 60.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.5214014Z triton_convolution2d_757 0.1074 ms 30.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=7, PADDING_H=0, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:48.5214909Z SingleProcess AUTOTUNE benchmarking takes 0.1335 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:48.6383916Z Autotune Choices Stats: 2025-09-07T13:18:48.6386692Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.015359999611973763, "best_triton_pos": 1, "best_triton_time": 0.0331839993596077, "best_triton_kernel": "triton_convolution2d_766", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:18:48.6509582Z AUTOTUNE convolution(8x192x17x17, 192x192x7x1) 2025-09-07T13:18:48.6509919Z strides: [55488, 1, 3264, 192], [1344, 1, 192, 192] 2025-09-07T13:18:48.6510233Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:48.6510521Z convolution 0.0154 ms 100.0% 2025-09-07T13:18:48.6511335Z triton_convolution2d_766 0.0332 ms 46.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.6512686Z triton_convolution2d_768 0.0402 ms 38.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.6513929Z triton_convolution2d_765 0.0413 ms 37.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.6515302Z triton_convolution2d_767 0.0421 ms 36.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.6516537Z triton_convolution2d_763 0.0530 ms 29.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.6517875Z triton_convolution2d_762 0.0538 ms 28.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.6519107Z triton_convolution2d_764 0.1042 ms 14.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=1, PADDING_H=3, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:48.6520093Z SingleProcess AUTOTUNE benchmarking takes 0.1300 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:48.8010917Z Autotune Choices Stats: 2025-09-07T13:18:48.8013091Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.012927999719977379, "best_triton_pos": 1, "best_triton_time": 0.04371200129389763, "best_triton_kernel": "triton_convolution2d_858", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:18:48.8138617Z AUTOTUNE convolution(8x192x17x17, 320x192x3x3) 2025-09-07T13:18:48.8139179Z strides: [55488, 1, 3264, 192], [1728, 1, 576, 192] 2025-09-07T13:18:48.8139678Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:48.8140124Z convolution 0.0129 ms 100.0% 2025-09-07T13:18:48.8141332Z triton_convolution2d_858 0.0437 ms 29.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.8143010Z triton_convolution2d_857 0.0540 ms 23.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.8144259Z triton_convolution2d_860 0.0542 ms 23.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.8145819Z triton_convolution2d_859 0.0543 ms 23.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.8147139Z triton_convolution2d_855 0.0768 ms 16.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.8148429Z triton_convolution2d_854 0.0797 ms 16.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.8149646Z triton_convolution2d_856 0.1230 ms 10.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:48.8150627Z SingleProcess AUTOTUNE benchmarking takes 0.1421 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:48.9528366Z Autotune Choices Stats: 2025-09-07T13:18:48.9530550Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.012768000364303589, "best_triton_pos": 1, "best_triton_time": 0.04358400031924248, "best_triton_kernel": "triton_convolution2d_898", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:18:48.9657724Z AUTOTUNE convolution(8x192x17x17, 192x192x3x3) 2025-09-07T13:18:48.9658080Z strides: [55488, 1, 3264, 192], [1728, 1, 576, 192] 2025-09-07T13:18:48.9658405Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:48.9658694Z convolution 0.0128 ms 100.0% 2025-09-07T13:18:48.9659581Z triton_convolution2d_898 0.0436 ms 29.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.9660842Z triton_convolution2d_897 0.0540 ms 23.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.9662207Z triton_convolution2d_899 0.0551 ms 23.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.9663260Z triton_convolution2d_900 0.0551 ms 23.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:48.9664308Z triton_convolution2d_894 0.0750 ms 17.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.9666818Z triton_convolution2d_895 0.0768 ms 16.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:48.9667900Z triton_convolution2d_896 0.1229 ms 10.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:48.9668757Z SingleProcess AUTOTUNE benchmarking takes 0.1429 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:49.2330254Z Autotune Choices Stats: 2025-09-07T13:18:49.2331892Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_905", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009824000298976898, "best_triton_pos": 0} 2025-09-07T13:18:49.2459090Z AUTOTUNE addmm(512x320, 512x1280, 1280x320) 2025-09-07T13:18:49.2459587Z strides: [0, 1], [1280, 1], [1, 1280] 2025-09-07T13:18:49.2460361Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:49.2461930Z triton_mm_905 0.0098 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:49.2462904Z bias_addmm 0.0100 ms 97.8% 2025-09-07T13:18:49.2463507Z triton_mm_909 0.0102 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:49.2464477Z triton_mm_913 0.0119 ms 82.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:18:49.2465386Z addmm 0.0128 ms 76.9% 2025-09-07T13:18:49.2465959Z triton_mm_904 0.0135 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:49.2466912Z triton_mm_908 0.0138 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:18:49.2467857Z triton_mm_903 0.0141 ms 69.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:49.2468820Z triton_mm_919 0.0143 ms 68.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:49.2469878Z triton_mm_912 0.0147 ms 66.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:18:49.2470729Z SingleProcess AUTOTUNE benchmarking takes 0.2792 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:18:49.3587772Z Autotune Choices Stats: 2025-09-07T13:18:49.3589973Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.012128000147640705, "best_triton_pos": 1, "best_triton_time": 0.02956799976527691, "best_triton_kernel": "triton_convolution2d_943", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=3, PADDING_H=0, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:18:49.3717309Z AUTOTUNE convolution(8x384x8x8, 384x384x1x3) 2025-09-07T13:18:49.3717626Z strides: [24576, 1, 3072, 384], [1152, 1, 1152, 384] 2025-09-07T13:18:49.3717938Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:49.3718216Z convolution 0.0121 ms 100.0% 2025-09-07T13:18:49.3718941Z triton_convolution2d_943 0.0296 ms 41.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=3, PADDING_H=0, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:49.3720177Z triton_convolution2d_942 0.0370 ms 32.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=3, PADDING_H=0, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:49.3721407Z triton_convolution2d_945 0.0387 ms 31.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=3, PADDING_H=0, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:49.3722636Z triton_convolution2d_944 0.0389 ms 31.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=3, PADDING_H=0, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:49.3723856Z triton_convolution2d_940 0.0515 ms 23.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=3, PADDING_H=0, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:49.3725459Z triton_convolution2d_939 0.0554 ms 21.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=3, PADDING_H=0, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:49.3726750Z triton_convolution2d_941 0.0710 ms 17.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=3, PADDING_H=0, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:49.3727709Z SingleProcess AUTOTUNE benchmarking takes 0.1252 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:49.4832622Z Autotune Choices Stats: 2025-09-07T13:18:49.4833978Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.011935999616980553, "best_triton_pos": 1, "best_triton_time": 0.028960000723600388, "best_triton_kernel": "triton_convolution2d_950", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=1, PADDING_H=1, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:18:49.4959868Z AUTOTUNE convolution(8x384x8x8, 384x384x3x1) 2025-09-07T13:18:49.4960182Z strides: [24576, 1, 3072, 384], [1152, 1, 384, 384] 2025-09-07T13:18:49.4960489Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:49.4960757Z convolution 0.0119 ms 100.0% 2025-09-07T13:18:49.4961493Z triton_convolution2d_950 0.0290 ms 41.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=1, PADDING_H=1, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:49.4963852Z triton_convolution2d_949 0.0371 ms 32.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=1, PADDING_H=1, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:49.4966105Z triton_convolution2d_952 0.0372 ms 32.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=1, PADDING_H=1, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:49.4968090Z triton_convolution2d_951 0.0385 ms 31.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=1, PADDING_H=1, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:49.4970039Z triton_convolution2d_947 0.0518 ms 23.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=1, PADDING_H=1, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:49.4972095Z triton_convolution2d_946 0.0524 ms 22.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=1, PADDING_H=1, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:49.4973370Z triton_convolution2d_948 0.0681 ms 17.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=1, PADDING_H=1, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:49.4974265Z SingleProcess AUTOTUNE benchmarking takes 0.1235 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:49.6849255Z Autotune Choices Stats: 2025-09-07T13:18:49.6851440Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.016256000846624374, "best_triton_pos": 1, "best_triton_time": 0.09708800166845322, "best_triton_kernel": "triton_convolution2d_976", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:18:49.6977325Z AUTOTUNE convolution(8x448x8x8, 384x448x3x3) 2025-09-07T13:18:49.6978135Z strides: [28672, 1, 3584, 448], [4032, 1, 1344, 448] 2025-09-07T13:18:49.6978644Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:18:49.6979098Z convolution 0.0163 ms 100.0% 2025-09-07T13:18:49.6980428Z triton_convolution2d_976 0.0971 ms 16.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:49.6982676Z triton_convolution2d_975 0.1177 ms 13.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:49.6983829Z triton_convolution2d_977 0.1219 ms 13.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:49.6985283Z triton_convolution2d_978 0.1255 ms 13.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:18:49.6986433Z triton_convolution2d_973 0.1704 ms 9.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:49.6987573Z triton_convolution2d_972 0.1807 ms 9.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:18:49.6988801Z triton_convolution2d_974 0.2226 ms 7.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:18:49.6989702Z SingleProcess AUTOTUNE benchmarking takes 0.2008 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:18:49.9713026Z Autotune Choices Stats: 2025-09-07T13:18:49.9714265Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.009375999681651592, "best_triton_pos": 1, "best_triton_time": 0.009440000168979168, "best_triton_kernel": "triton_mm_997", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T13:18:49.9839085Z AUTOTUNE addmm(512x192, 512x1280, 1280x192) 2025-09-07T13:18:49.9839376Z strides: [0, 1], [1280, 1], [1, 1280] 2025-09-07T13:18:49.9839685Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:49.9840020Z bias_addmm 0.0094 ms 100.0% 2025-09-07T13:18:49.9840625Z triton_mm_997 0.0094 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:49.9841602Z triton_mm_1001 0.0101 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:49.9843322Z triton_mm_1005 0.0117 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:18:49.9844314Z addmm 0.0126 ms 74.4% 2025-09-07T13:18:49.9845515Z triton_mm_996 0.0132 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:49.9847051Z triton_mm_1000 0.0136 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:18:49.9848567Z triton_mm_995 0.0137 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:49.9850555Z triton_mm_1011 0.0141 ms 66.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:49.9852354Z triton_mm_994 0.0144 ms 65.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:18:49.9853235Z SingleProcess AUTOTUNE benchmarking takes 0.2771 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:18:50.2568275Z Autotune Choices Stats: 2025-09-07T13:18:50.2570287Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.010847999714314938, "best_triton_pos": 1, "best_triton_time": 0.011615999974310398, "best_triton_kernel": "triton_mm_1016", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T13:18:50.2693501Z AUTOTUNE addmm(512x320, 512x2048, 2048x320) 2025-09-07T13:18:50.2693971Z strides: [0, 1], [2048, 1], [1, 2048] 2025-09-07T13:18:50.2694496Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:50.2695542Z bias_addmm 0.0108 ms 100.0% 2025-09-07T13:18:50.2696544Z triton_mm_1016 0.0116 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:50.2698099Z triton_mm_1020 0.0127 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:50.2699986Z triton_mm_1024 0.0143 ms 75.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:18:50.2701002Z addmm 0.0143 ms 75.7% 2025-09-07T13:18:50.2702162Z triton_mm_1015 0.0180 ms 60.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:50.2703308Z triton_mm_1030 0.0183 ms 59.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:50.2704208Z triton_mm_1014 0.0185 ms 58.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:50.2705218Z triton_mm_1013 0.0189 ms 57.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:18:50.2706110Z triton_mm_1019 0.0193 ms 56.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:18:50.2706895Z SingleProcess AUTOTUNE benchmarking takes 0.2845 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T13:18:50.5629589Z Autotune Choices Stats: 2025-09-07T13:18:50.5631590Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.010495999827980995, "best_triton_pos": 1, "best_triton_time": 0.011615999974310398, "best_triton_kernel": "triton_mm_1108", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T13:18:50.5755330Z AUTOTUNE addmm(512x192, 512x2048, 2048x192) 2025-09-07T13:18:50.5755627Z strides: [0, 1], [2048, 1], [1, 2048] 2025-09-07T13:18:50.5755940Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:18:50.5756257Z bias_addmm 0.0105 ms 100.0% 2025-09-07T13:18:50.5756869Z triton_mm_1108 0.0116 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:50.5758125Z triton_mm_1112 0.0123 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:18:50.5758854Z addmm 0.0138 ms 76.3% 2025-09-07T13:18:50.5759444Z triton_mm_1116 0.0140 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:18:50.5760425Z triton_mm_1107 0.0177 ms 59.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:50.5761389Z triton_mm_1106 0.0182 ms 57.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:50.5762367Z triton_mm_1122 0.0183 ms 57.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:18:50.5763325Z triton_mm_1105 0.0190 ms 55.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:18:50.5764217Z triton_mm_1111 0.0190 ms 55.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:18:50.5765129Z SingleProcess AUTOTUNE benchmarking takes 0.2845 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T13:18:56.9278587Z pass 2025-09-07T13:19:01.2838287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:19:01.2840191Z import pynvml # type: ignore[import] 2025-09-07T13:19:04.2903189Z 2025-09-07T13:19:06.9405966Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:19:06.9406346Z loading model: 0it [00:02, ?it/s] 2025-09-07T13:19:06.9492531Z cuda eval jx_nest_base 2025-09-07T13:19:34.2132281Z Autotune Choices Stats: 2025-09-07T13:19:34.2134376Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01897599920630455, "best_triton_pos": 1, "best_triton_time": 0.020255999639630318, "best_triton_kernel": "triton_mm_94", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T13:19:34.2266676Z AUTOTUNE mm(25088x128, 128x512) 2025-09-07T13:19:34.2266954Z strides: [128, 1], [1, 128] 2025-09-07T13:19:34.2267230Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:19:34.2267524Z mm 0.0190 ms 100.0% 2025-09-07T13:19:34.2268195Z triton_mm_94 0.0203 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:34.2269194Z triton_mm_95 0.0224 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:34.2270164Z triton_mm_87 0.0229 ms 82.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:34.2271114Z triton_mm_89 0.0233 ms 81.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:34.2272080Z triton_mm_91 0.0233 ms 81.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:34.2273612Z triton_mm_88 0.0236 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:34.2274853Z triton_mm_92 0.0237 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:34.2275996Z triton_mm_90 0.0241 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:19:34.2276956Z triton_mm_84 0.0242 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:19:34.2277795Z SingleProcess AUTOTUNE benchmarking takes 0.2717 seconds and 0.0005 seconds precompiling for 20 choices 2025-09-07T13:19:34.8408681Z Autotune Choices Stats: 2025-09-07T13:19:34.8410037Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01398400031030178, "best_triton_pos": 1, "best_triton_time": 0.014944000169634819, "best_triton_kernel": "triton_mm_319", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T13:19:34.8539429Z AUTOTUNE mm(6272x256, 256x1024) 2025-09-07T13:19:34.8539722Z strides: [256, 1], [1, 256] 2025-09-07T13:19:34.8539993Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:19:34.8540276Z mm 0.0140 ms 100.0% 2025-09-07T13:19:34.8541418Z triton_mm_319 0.0149 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:34.8542430Z triton_mm_314 0.0162 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:34.8543398Z triton_mm_320 0.0169 ms 82.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:34.8544383Z triton_mm_321 0.0181 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:19:34.8549737Z triton_mm_317 0.0183 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:34.8550702Z triton_mm_313 0.0189 ms 73.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:34.8551601Z triton_mm_312 0.0193 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:34.8552492Z triton_mm_316 0.0194 ms 72.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:34.8553379Z triton_mm_310 0.0196 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:19:34.8554155Z SingleProcess AUTOTUNE benchmarking takes 0.2614 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:19:35.4331025Z Autotune Choices Stats: 2025-09-07T13:19:35.4332124Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_545", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.013344000093638897, "best_triton_pos": 0} 2025-09-07T13:19:35.4460379Z AUTOTUNE mm(1568x512, 512x2048) 2025-09-07T13:19:35.4460849Z strides: [512, 1], [1, 512] 2025-09-07T13:19:35.4461261Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:19:35.4462037Z triton_mm_545 0.0133 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:35.4462680Z mm 0.0135 ms 98.6% 2025-09-07T13:19:35.4463278Z triton_mm_539 0.0149 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:35.4464253Z triton_mm_544 0.0151 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:35.4465715Z triton_mm_537 0.0156 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:35.4466691Z triton_mm_546 0.0159 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:19:35.4467654Z triton_mm_541 0.0164 ms 81.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:35.4468642Z triton_mm_535 0.0176 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:19:35.4469749Z triton_mm_538 0.0176 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:35.4470722Z triton_mm_542 0.0184 ms 72.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:35.4471569Z SingleProcess AUTOTUNE benchmarking takes 0.2621 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:19:38.2597095Z Autotune Choices Stats: 2025-09-07T13:19:38.2598296Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_6", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=4, KERNEL_W=4, PADDING_H=0, PADDING_W=0, STRIDE_H=4, STRIDE_W=4, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.016992000862956047, "best_triton_pos": 0} 2025-09-07T13:19:38.2726546Z AUTOTUNE convolution(8x3x224x224, 128x3x4x4) 2025-09-07T13:19:38.2727129Z strides: [150528, 50176, 224, 1], [48, 16, 4, 1] 2025-09-07T13:19:38.2727633Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:19:38.2728904Z triton_convolution2d_6 0.0170 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=4, KERNEL_W=4, PADDING_H=0, PADDING_W=0, STRIDE_H=4, STRIDE_W=4, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:19:38.2730886Z triton_convolution2d_1 0.0172 ms 98.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=4, KERNEL_W=4, PADDING_H=0, PADDING_W=0, STRIDE_H=4, STRIDE_W=4, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:19:38.2732837Z triton_convolution2d_3 0.0174 ms 97.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=4, KERNEL_W=4, PADDING_H=0, PADDING_W=0, STRIDE_H=4, STRIDE_W=4, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:19:38.2734781Z triton_convolution2d_0 0.0175 ms 97.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=4, KERNEL_W=4, PADDING_H=0, PADDING_W=0, STRIDE_H=4, STRIDE_W=4, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:19:38.2737191Z triton_convolution2d_5 0.0181 ms 93.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=4, KERNEL_W=4, PADDING_H=0, PADDING_W=0, STRIDE_H=4, STRIDE_W=4, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:19:38.2738461Z triton_convolution2d_4 0.0190 ms 89.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=4, KERNEL_W=4, PADDING_H=0, PADDING_W=0, STRIDE_H=4, STRIDE_W=4, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:19:38.2739280Z convolution 0.0267 ms 63.7% 2025-09-07T13:19:38.2739952Z triton_convolution2d_2 0.0589 ms 28.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=4, KERNEL_W=4, PADDING_H=0, PADDING_W=0, STRIDE_H=4, STRIDE_W=4, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:19:38.2740848Z SingleProcess AUTOTUNE benchmarking takes 0.1183 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:19:38.5165324Z Autotune Choices Stats: 2025-09-07T13:19:38.5167347Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01539199985563755, "best_triton_pos": 1, "best_triton_time": 0.017855999991297722, "best_triton_kernel": "triton_mm_23", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T13:19:38.5293682Z AUTOTUNE mm(25088x128, 128x384) 2025-09-07T13:19:38.5294134Z strides: [128, 1], [1, 128] 2025-09-07T13:19:38.5294564Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:19:38.5295274Z mm 0.0154 ms 100.0% 2025-09-07T13:19:38.5296170Z triton_mm_23 0.0179 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:38.5297362Z triton_mm_20 0.0186 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:38.5298376Z triton_mm_16 0.0186 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:38.5299342Z triton_mm_18 0.0196 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:38.5300307Z triton_mm_24 0.0196 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:38.5301257Z triton_mm_17 0.0204 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:38.5302303Z triton_mm_21 0.0211 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:38.5303267Z triton_mm_19 0.0217 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:19:38.5304224Z triton_mm_13 0.0218 ms 70.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:19:38.5305236Z SingleProcess AUTOTUNE benchmarking takes 0.2562 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:19:38.7786448Z Autotune Choices Stats: 2025-09-07T13:19:38.7787416Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_70", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.010463999584317207, "best_triton_pos": 0} 2025-09-07T13:19:38.7913669Z AUTOTUNE mm(25088x128, 128x128) 2025-09-07T13:19:38.7914061Z strides: [128, 1], [1, 128] 2025-09-07T13:19:38.7914315Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:19:38.7915383Z triton_mm_70 0.0105 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:38.7916578Z mm 0.0113 ms 92.4% 2025-09-07T13:19:38.7917546Z triton_mm_76 0.0114 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:38.7919167Z triton_mm_72 0.0114 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:38.7920754Z triton_mm_75 0.0114 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:38.7922347Z triton_mm_65 0.0115 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:19:38.7923881Z triton_mm_69 0.0116 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:38.7925690Z triton_mm_73 0.0116 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:38.7926835Z triton_mm_71 0.0119 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:19:38.7927808Z triton_mm_67 0.0120 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:19:38.7928596Z SingleProcess AUTOTUNE benchmarking takes 0.2527 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:19:39.0345314Z Autotune Choices Stats: 2025-09-07T13:19:39.0346818Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_108", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.019551999866962433, "best_triton_pos": 0} 2025-09-07T13:19:39.0472498Z AUTOTUNE mm(25088x512, 512x128) 2025-09-07T13:19:39.0472753Z strides: [512, 1], [1, 512] 2025-09-07T13:19:39.0473009Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:19:39.0473673Z triton_mm_108 0.0196 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:39.0474665Z triton_mm_114 0.0203 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:39.0475436Z mm 0.0205 ms 95.3% 2025-09-07T13:19:39.0476026Z triton_mm_115 0.0226 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:19:39.0476988Z triton_mm_109 0.0230 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:19:39.0477890Z triton_mm_104 0.0236 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:19:39.0478796Z triton_mm_113 0.0237 ms 82.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:39.0479835Z triton_mm_107 0.0239 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:39.0480832Z triton_mm_106 0.0241 ms 81.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:39.0481805Z triton_mm_110 0.0244 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:39.0482582Z SingleProcess AUTOTUNE benchmarking takes 0.2549 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:19:39.2240379Z Autotune Choices Stats: 2025-09-07T13:19:39.2242596Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.03129599988460541, "best_triton_pos": 1, "best_triton_time": 0.06297600269317627, "best_triton_kernel": "triton_convolution2d_228", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T13:19:39.2368618Z AUTOTUNE convolution(8x128x56x56, 256x128x3x3) 2025-09-07T13:19:39.2369201Z strides: [401408, 1, 7168, 128], [1152, 1, 384, 128] 2025-09-07T13:19:39.2369744Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:19:39.2370214Z convolution 0.0313 ms 100.0% 2025-09-07T13:19:39.2371428Z triton_convolution2d_228 0.0630 ms 49.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:19:39.2373654Z triton_convolution2d_225 0.0638 ms 49.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:19:39.2375974Z triton_convolution2d_229 0.0744 ms 42.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:19:39.2377376Z triton_convolution2d_230 0.0752 ms 41.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:19:39.2378519Z triton_convolution2d_231 0.0804 ms 38.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:19:39.2379652Z triton_convolution2d_226 0.1053 ms 29.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:19:39.2380796Z triton_convolution2d_227 0.3460 ms 9.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:19:39.2381774Z SingleProcess AUTOTUNE benchmarking takes 0.1685 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:19:39.4784601Z Autotune Choices Stats: 2025-09-07T13:19:39.4786921Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.012927999719977379, "best_triton_pos": 1, "best_triton_time": 0.013632000423967838, "best_triton_kernel": "triton_mm_248", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T13:19:39.4917080Z AUTOTUNE mm(6272x256, 256x768) 2025-09-07T13:19:39.4917500Z strides: [256, 1], [1, 256] 2025-09-07T13:19:39.4917933Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:19:39.4918621Z mm 0.0129 ms 100.0% 2025-09-07T13:19:39.4919719Z triton_mm_248 0.0136 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:39.4921439Z triton_mm_243 0.0149 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:39.4922991Z triton_mm_249 0.0151 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:39.4924545Z triton_mm_245 0.0152 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:39.4926485Z triton_mm_241 0.0154 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:39.4927389Z triton_mm_242 0.0160 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:39.4928281Z triton_mm_246 0.0162 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:39.4929162Z triton_mm_239 0.0172 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:19:39.4930127Z triton_mm_250 0.0177 ms 72.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:19:39.4930919Z SingleProcess AUTOTUNE benchmarking takes 0.2539 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:19:39.7404788Z Autotune Choices Stats: 2025-09-07T13:19:39.7406093Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_291", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.009440000168979168, "best_triton_pos": 0} 2025-09-07T13:19:39.7538952Z AUTOTUNE mm(6272x256, 256x256) 2025-09-07T13:19:39.7539394Z strides: [256, 1], [1, 256] 2025-09-07T13:19:39.7539821Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:19:39.7540860Z triton_mm_291 0.0094 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:19:39.7541977Z mm 0.0095 ms 99.0% 2025-09-07T13:19:39.7542901Z triton_mm_295 0.0097 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:39.7544450Z triton_mm_294 0.0101 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:39.7546486Z triton_mm_298 0.0102 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:39.7547526Z triton_mm_302 0.0102 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:19:39.7548495Z triton_mm_301 0.0103 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:39.7549462Z triton_mm_300 0.0105 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:39.7550636Z triton_mm_297 0.0106 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:39.7551664Z triton_mm_293 0.0106 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:39.7552502Z SingleProcess AUTOTUNE benchmarking takes 0.2531 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:19:39.9956167Z Autotune Choices Stats: 2025-09-07T13:19:39.9957360Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.014175999909639359, "best_triton_pos": 1, "best_triton_time": 0.015359999611973763, "best_triton_kernel": "triton_mm_340", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T13:19:40.0092904Z AUTOTUNE mm(6272x1024, 1024x256) 2025-09-07T13:19:40.0093183Z strides: [1024, 1], [1, 1024] 2025-09-07T13:19:40.0093457Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:19:40.0093741Z mm 0.0142 ms 100.0% 2025-09-07T13:19:40.0094365Z triton_mm_340 0.0154 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:19:40.0095495Z triton_mm_333 0.0162 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:40.0096598Z triton_mm_329 0.0170 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:19:40.0097452Z triton_mm_339 0.0175 ms 81.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:40.0098293Z triton_mm_334 0.0181 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:19:40.0099120Z triton_mm_332 0.0189 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:40.0099948Z triton_mm_336 0.0192 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:40.0100775Z triton_mm_330 0.0197 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:19:40.0101691Z triton_mm_331 0.0216 ms 65.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:40.0102429Z SingleProcess AUTOTUNE benchmarking takes 0.2544 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:19:40.1865654Z Autotune Choices Stats: 2025-09-07T13:19:40.1867539Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.033215999603271484, "best_triton_pos": 1, "best_triton_time": 0.05958399921655655, "best_triton_kernel": "triton_convolution2d_454", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T13:19:40.1992389Z AUTOTUNE convolution(8x256x28x28, 512x256x3x3) 2025-09-07T13:19:40.1992705Z strides: [200704, 1, 7168, 256], [2304, 1, 768, 256] 2025-09-07T13:19:40.1993177Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:19:40.1993463Z convolution 0.0332 ms 100.0% 2025-09-07T13:19:40.1994269Z triton_convolution2d_454 0.0596 ms 55.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:19:40.1995736Z triton_convolution2d_453 0.0700 ms 47.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:19:40.1996964Z triton_convolution2d_455 0.0750 ms 44.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:19:40.1998123Z triton_convolution2d_456 0.0761 ms 43.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:19:40.1999252Z triton_convolution2d_451 0.1011 ms 32.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:19:40.2000385Z triton_convolution2d_450 0.1046 ms 31.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:19:40.2001510Z triton_convolution2d_452 0.3700 ms 9.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:19:40.2002486Z SingleProcess AUTOTUNE benchmarking takes 0.1677 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:19:40.4413456Z Autotune Choices Stats: 2025-09-07T13:19:40.4415799Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.010975999757647514, "best_triton_pos": 1, "best_triton_time": 0.01235199999064207, "best_triton_kernel": "triton_mm_468", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T13:19:40.4544187Z AUTOTUNE mm(1568x512, 512x1536) 2025-09-07T13:19:40.4544445Z strides: [512, 1], [1, 512] 2025-09-07T13:19:40.4544713Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:19:40.4545141Z mm 0.0110 ms 100.0% 2025-09-07T13:19:40.4545769Z triton_mm_468 0.0124 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:40.4546849Z triton_mm_474 0.0130 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:40.4547820Z triton_mm_470 0.0140 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:40.4548782Z triton_mm_466 0.0141 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:40.4549744Z triton_mm_467 0.0141 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:40.4550712Z triton_mm_473 0.0145 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:40.4551679Z triton_mm_471 0.0145 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:40.4552828Z triton_mm_464 0.0150 ms 73.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:19:40.4553859Z triton_mm_475 0.0152 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:19:40.4554703Z SingleProcess AUTOTUNE benchmarking takes 0.2542 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:19:40.7029401Z Autotune Choices Stats: 2025-09-07T13:19:40.7030606Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.008927999995648861, "best_triton_pos": 1, "best_triton_time": 0.008991999551653862, "best_triton_kernel": "triton_mm_521", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T13:19:40.7159303Z AUTOTUNE mm(1568x512, 512x512) 2025-09-07T13:19:40.7159564Z strides: [512, 1], [1, 512] 2025-09-07T13:19:40.7159832Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:19:40.7160131Z mm 0.0089 ms 100.0% 2025-09-07T13:19:40.7160740Z triton_mm_521 0.0090 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:19:40.7161735Z triton_mm_520 0.0099 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:40.7163026Z triton_mm_516 0.0100 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:19:40.7164000Z triton_mm_527 0.0103 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:19:40.7165271Z triton_mm_523 0.0104 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:40.7166300Z triton_mm_519 0.0105 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:40.7167268Z triton_mm_526 0.0106 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:40.7168165Z triton_mm_517 0.0108 ms 82.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:19:40.7169051Z triton_mm_510 0.0121 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:19:40.7169832Z SingleProcess AUTOTUNE benchmarking takes 0.2521 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:19:40.9671294Z Autotune Choices Stats: 2025-09-07T13:19:40.9672503Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.014175999909639359, "best_triton_pos": 1, "best_triton_time": 0.014336000196635723, "best_triton_kernel": "triton_mm_559", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T13:19:40.9801880Z AUTOTUNE mm(1568x2048, 2048x512) 2025-09-07T13:19:40.9802318Z strides: [2048, 1], [1, 2048] 2025-09-07T13:19:40.9802757Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:19:40.9803206Z mm 0.0142 ms 100.0% 2025-09-07T13:19:40.9804427Z triton_mm_559 0.0143 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:19:40.9807366Z triton_mm_565 0.0181 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:19:40.9808347Z triton_mm_555 0.0192 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:19:40.9809240Z triton_mm_558 0.0203 ms 70.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:40.9810131Z triton_mm_554 0.0206 ms 68.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:19:40.9811028Z triton_mm_564 0.0223 ms 63.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:40.9811923Z triton_mm_557 0.0239 ms 59.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:40.9812826Z triton_mm_561 0.0242 ms 58.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:19:40.9813716Z triton_mm_551 0.0266 ms 53.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:19:40.9814576Z SingleProcess AUTOTUNE benchmarking takes 0.2633 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:19:41.6365762Z Autotune Choices Stats: 2025-09-07T13:19:41.6366803Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_2641", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.00774399982765317, "best_triton_pos": 0} 2025-09-07T13:19:41.6503105Z AUTOTUNE addmm(8x1000, 8x512, 512x1000) 2025-09-07T13:19:41.6503402Z strides: [0, 1], [512, 1], [1, 512] 2025-09-07T13:19:41.6503735Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:19:41.6504468Z triton_mm_2641 0.0077 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:19:41.6505769Z triton_mm_2645 0.0081 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:19:41.6506479Z bias_addmm 0.0085 ms 91.3% 2025-09-07T13:19:41.6507178Z triton_mm_2653 0.0087 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:19:41.6508152Z triton_mm_2640 0.0089 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:19:41.6509119Z triton_mm_2639 0.0091 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:19:41.6510080Z triton_mm_2644 0.0092 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:41.6511056Z triton_mm_2649 0.0094 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:19:41.6512320Z triton_mm_2638 0.0096 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:19:41.6513362Z triton_mm_2648 0.0098 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:19:41.6514233Z SingleProcess AUTOTUNE benchmarking takes 0.2599 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:19:49.8645161Z pass 2025-09-07T13:19:54.6319970Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:19:54.6321459Z import pynvml # type: ignore[import] 2025-09-07T13:19:57.7187479Z 2025-09-07T13:19:58.4517710Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:19:58.4518046Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:19:58.4556925Z cuda eval lcnet_050 2025-09-07T13:20:09.9215340Z Autotune Choices Stats: 2025-09-07T13:20:09.9216959Z {"num_choices": 13, "num_triton_choices": 11, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007872000336647034, "best_triton_pos": 0} 2025-09-07T13:20:09.9358336Z AUTOTUNE addmm(100352x16, 100352x8, 8x16) 2025-09-07T13:20:09.9358669Z strides: [0, 1], [8, 1], [1, 8] 2025-09-07T13:20:09.9358977Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:20:09.9359986Z triton_mm_8 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:09.9361001Z triton_mm_11 0.0079 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:09.9361985Z triton_mm_13 0.0079 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:09.9362960Z triton_mm_14 0.0079 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:20:09.9363925Z triton_mm_10 0.0081 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:09.9364895Z triton_mm_15 0.0081 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:09.9366096Z triton_mm_9 0.0081 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:09.9366985Z triton_mm_12 0.0082 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:09.9367874Z triton_mm_5 0.0083 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:20:09.9368785Z triton_mm_6 0.0083 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:20:09.9369566Z SingleProcess AUTOTUNE benchmarking takes 0.2010 seconds and 0.0003 seconds precompiling for 13 choices 2025-09-07T13:20:10.4593458Z Autotune Choices Stats: 2025-09-07T13:20:10.4594719Z {"num_choices": 16, "num_triton_choices": 14, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.0066559999249875546, "best_triton_pos": 0} 2025-09-07T13:20:10.4736041Z AUTOTUNE addmm(25088x32, 25088x16, 16x32) 2025-09-07T13:20:10.4736365Z strides: [0, 1], [16, 1], [1, 16] 2025-09-07T13:20:10.4736642Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:20:10.4737236Z triton_mm_17 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:10.4738061Z triton_mm_20 0.0067 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:10.4738886Z triton_mm_23 0.0067 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:10.4739701Z triton_mm_18 0.0067 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:10.4740484Z triton_mm_24 0.0068 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:10.4741262Z triton_mm_25 0.0068 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:10.4742309Z triton_mm_21 0.0068 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:10.4743101Z triton_mm_22 0.0068 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:20:10.4743896Z triton_mm_19 0.0068 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:10.4744681Z triton_mm_27 0.0068 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:10.4745526Z SingleProcess AUTOTUNE benchmarking takes 0.2320 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T13:20:10.9774128Z Autotune Choices Stats: 2025-09-07T13:20:10.9775584Z {"num_choices": 17, "num_triton_choices": 15, "best_kernel": "triton_mm_33", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.0074880002066493034, "best_triton_pos": 0} 2025-09-07T13:20:10.9904022Z AUTOTUNE addmm(25088x32, 25088x32, 32x32) 2025-09-07T13:20:10.9904358Z strides: [0, 1], [32, 1], [1, 32] 2025-09-07T13:20:10.9904722Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:20:10.9905578Z triton_mm_33 0.0075 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:10.9906507Z triton_mm_31 0.0076 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:10.9907405Z triton_mm_32 0.0077 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:10.9908529Z triton_mm_39 0.0077 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:10.9909525Z triton_mm_34 0.0079 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:10.9910522Z triton_mm_44 0.0079 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:10.9911409Z triton_mm_43 0.0080 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:20:10.9912293Z triton_mm_41 0.0081 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:10.9913173Z triton_mm_37 0.0084 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:20:10.9914042Z triton_mm_40 0.0084 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:10.9914813Z SingleProcess AUTOTUNE benchmarking takes 0.2313 seconds and 0.0003 seconds precompiling for 17 choices 2025-09-07T13:20:11.2424214Z Autotune Choices Stats: 2025-09-07T13:20:11.2426098Z {"num_choices": 18, "num_triton_choices": 16, "best_kernel": "triton_mm_55", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.00687999976798892, "best_triton_pos": 0} 2025-09-07T13:20:11.2559499Z AUTOTUNE addmm(6272x64, 6272x32, 32x64) 2025-09-07T13:20:11.2559824Z strides: [0, 1], [32, 1], [1, 32] 2025-09-07T13:20:11.2560133Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:20:11.2560829Z triton_mm_55 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:11.2561833Z triton_mm_47 0.0069 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:11.2562796Z triton_mm_56 0.0069 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:11.2563739Z triton_mm_48 0.0071 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:11.2564673Z triton_mm_53 0.0071 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:11.2565741Z triton_mm_54 0.0071 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:11.2566614Z triton_mm_57 0.0071 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:11.2567485Z triton_mm_51 0.0072 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:11.2568347Z triton_mm_46 0.0072 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:11.2569422Z triton_mm_45 0.0073 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:20:11.2570338Z SingleProcess AUTOTUNE benchmarking takes 0.2418 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T13:20:11.7943588Z Autotune Choices Stats: 2025-09-07T13:20:11.7944681Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_62", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006912000011652708, "best_triton_pos": 0} 2025-09-07T13:20:11.8083515Z AUTOTUNE addmm(6272x64, 6272x64, 64x64) 2025-09-07T13:20:11.8083824Z strides: [0, 1], [64, 1], [1, 64] 2025-09-07T13:20:11.8084191Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:20:11.8084923Z triton_mm_62 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:11.8086089Z triton_mm_69 0.0069 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:11.8087045Z triton_mm_72 0.0070 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:11.8088051Z triton_mm_78 0.0070 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:11.8089274Z triton_mm_73 0.0071 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:11.8090248Z triton_mm_70 0.0072 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:11.8091188Z triton_mm_65 0.0072 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:11.8092121Z triton_mm_71 0.0072 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:11.8093050Z triton_mm_63 0.0073 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:11.8093997Z triton_mm_77 0.0073 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:11.8094863Z SingleProcess AUTOTUNE benchmarking takes 0.2694 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:20:12.1002849Z Autotune Choices Stats: 2025-09-07T13:20:12.1003891Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_80", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006560000125318766, "best_triton_pos": 0} 2025-09-07T13:20:12.1140767Z AUTOTUNE addmm(1568x128, 1568x64, 64x128) 2025-09-07T13:20:12.1141087Z strides: [0, 1], [64, 1], [1, 64] 2025-09-07T13:20:12.1141485Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:20:12.1142207Z triton_mm_80 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:12.1143170Z triton_mm_86 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:20:12.1144516Z triton_mm_81 0.0068 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:12.1145999Z triton_mm_82 0.0068 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:12.1146972Z triton_mm_87 0.0069 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:12.1147933Z triton_mm_83 0.0069 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:12.1148880Z triton_mm_91 0.0070 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:12.1149830Z triton_mm_90 0.0071 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:12.1150776Z triton_mm_85 0.0073 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:12.1151722Z triton_mm_92 0.0074 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:12.1152562Z SingleProcess AUTOTUNE benchmarking takes 0.2821 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T13:20:12.6660875Z Autotune Choices Stats: 2025-09-07T13:20:12.6662033Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_100", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.006976000033318996, "best_triton_pos": 0} 2025-09-07T13:20:12.6793133Z AUTOTUNE addmm(1568x128, 1568x128, 128x128) 2025-09-07T13:20:12.6793442Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T13:20:12.6793754Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:20:12.6794432Z triton_mm_100 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:12.6795844Z triton_mm_101 0.0071 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:12.6796917Z triton_mm_105 0.0075 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:20:12.6797890Z triton_mm_99 0.0075 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:12.6798869Z triton_mm_106 0.0077 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:12.6799882Z triton_mm_109 0.0078 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:12.6800856Z triton_mm_110 0.0079 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:12.6801814Z triton_mm_107 0.0080 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:12.6803067Z triton_mm_112 0.0080 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:12.6804129Z triton_mm_108 0.0081 ms 86.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:12.6805134Z SingleProcess AUTOTUNE benchmarking takes 0.2823 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T13:20:13.0702936Z Autotune Choices Stats: 2025-09-07T13:20:13.0703967Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_219", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.007104000076651573, "best_triton_pos": 0} 2025-09-07T13:20:13.0834376Z AUTOTUNE addmm(392x256, 392x128, 128x256) 2025-09-07T13:20:13.0834687Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T13:20:13.0835137Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:20:13.0835994Z triton_mm_219 0.0071 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:13.0837085Z triton_mm_220 0.0071 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:13.0838074Z triton_mm_224 0.0073 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:20:13.0839252Z triton_mm_218 0.0073 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:13.0840253Z triton_mm_225 0.0076 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:13.0840878Z bias_addmm 0.0077 ms 92.1% 2025-09-07T13:20:13.0841474Z triton_mm_226 0.0078 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:13.0842446Z triton_mm_227 0.0080 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:13.0843419Z triton_mm_221 0.0081 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:13.0844380Z triton_mm_223 0.0085 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:13.0845413Z SingleProcess AUTOTUNE benchmarking takes 0.3051 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:20:13.6152586Z Autotune Choices Stats: 2025-09-07T13:20:13.6153593Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_293", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007391999941319227, "best_triton_pos": 0} 2025-09-07T13:20:13.6287479Z AUTOTUNE addmm(8x1280, 8x256, 256x1280) 2025-09-07T13:20:13.6287841Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T13:20:13.6288187Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:20:13.6288936Z triton_mm_293 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:13.6290409Z triton_mm_288 0.0075 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:20:13.6291502Z triton_mm_289 0.0075 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:20:13.6292454Z triton_mm_287 0.0076 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:13.6293066Z bias_addmm 0.0077 ms 95.5% 2025-09-07T13:20:13.6293684Z triton_mm_297 0.0077 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:13.6294638Z triton_mm_292 0.0078 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:13.6295937Z triton_mm_286 0.0079 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:20:13.6296833Z triton_mm_299 0.0080 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:13.6297715Z triton_mm_301 0.0080 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:13.6298618Z SingleProcess AUTOTUNE benchmarking takes 0.2613 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:20:14.1049234Z Autotune Choices Stats: 2025-09-07T13:20:14.1050314Z {"num_choices": 15, "num_triton_choices": 13, "best_kernel": "triton_mm_248", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006752000190317631, "best_triton_pos": 0} 2025-09-07T13:20:14.1188577Z AUTOTUNE addmm(8x64, 8x256, 256x64) 2025-09-07T13:20:14.1188854Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T13:20:14.1189187Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:20:14.1189880Z triton_mm_248 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:14.1190862Z triton_mm_240 0.0068 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:20:14.1191824Z triton_mm_247 0.0069 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:14.1192788Z triton_mm_244 0.0070 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:14.1193743Z triton_mm_239 0.0071 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:20:14.1194690Z triton_mm_238 0.0072 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:14.1196017Z triton_mm_243 0.0072 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:14.1196952Z triton_mm_237 0.0073 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:20:14.1197840Z bias_addmm 0.0075 ms 90.2% 2025-09-07T13:20:14.1198534Z triton_mm_246 0.0076 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:14.1199422Z SingleProcess AUTOTUNE benchmarking takes 0.2120 seconds and 0.0003 seconds precompiling for 15 choices 2025-09-07T13:20:14.5486478Z Autotune Choices Stats: 2025-09-07T13:20:14.5487516Z {"num_choices": 13, "num_triton_choices": 11, "best_kernel": "triton_mm_195", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006399999838322401, "best_triton_pos": 0} 2025-09-07T13:20:14.5628305Z AUTOTUNE addmm(8x32, 8x128, 128x32) 2025-09-07T13:20:14.5628585Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T13:20:14.5628895Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:20:14.5629610Z triton_mm_195 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:20:14.5630591Z triton_mm_199 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T13:20:14.5631554Z triton_mm_203 0.0066 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:20:14.5632714Z triton_mm_200 0.0067 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T13:20:14.5633680Z triton_mm_201 0.0067 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T13:20:14.5634639Z triton_mm_194 0.0073 ms 88.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:20:14.5635929Z triton_mm_202 0.0073 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T13:20:14.5636882Z triton_mm_196 0.0073 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:20:14.5637454Z bias_addmm 0.0075 ms 85.8% 2025-09-07T13:20:14.5637997Z triton_mm_198 0.0078 ms 81.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:20:14.5638774Z SingleProcess AUTOTUNE benchmarking takes 0.1886 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T13:20:15.1859778Z Autotune Choices Stats: 2025-09-07T13:20:15.1860996Z {"num_choices": 6, "num_triton_choices": 5, "best_kernel": "triton_convolution2d_4", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.011807999573647976, "best_triton_pos": 0} 2025-09-07T13:20:15.2000028Z AUTOTUNE convolution(8x3x224x224, 8x3x3x3) 2025-09-07T13:20:15.2000357Z strides: [150528, 1, 672, 3], [27, 1, 9, 3] 2025-09-07T13:20:15.2010042Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:20:15.2010800Z triton_convolution2d_4 0.0118 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:20:15.2012294Z triton_convolution2d_3 0.0123 ms 95.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:20:15.2013553Z triton_convolution2d_0 0.0124 ms 95.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:20:15.2014687Z triton_convolution2d_1 0.0134 ms 88.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:20:15.2015984Z triton_convolution2d_2 0.0175 ms 67.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:20:15.2016693Z convolution 0.0235 ms 50.3% 2025-09-07T13:20:15.2017129Z SingleProcess AUTOTUNE benchmarking takes 0.0941 seconds and 0.0002 seconds precompiling for 6 choices 2025-09-07T13:20:15.3983299Z Autotune Choices Stats: 2025-09-07T13:20:15.3984352Z {"num_choices": 15, "num_triton_choices": 13, "best_kernel": "triton_mm_215", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-09-07T13:20:15.4109975Z AUTOTUNE addmm(8x128, 8x32, 32x128) 2025-09-07T13:20:15.4110255Z strides: [0, 1], [32, 1], [1, 32] 2025-09-07T13:20:15.4110564Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:20:15.4111631Z triton_mm_215 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:20:15.4112648Z triton_mm_206 0.0060 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:15.4113606Z triton_mm_209 0.0060 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:15.4114573Z triton_mm_214 0.0060 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:15.4115658Z triton_mm_205 0.0060 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:20:15.4116610Z triton_mm_207 0.0060 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:20:15.4117533Z triton_mm_210 0.0060 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:15.4118420Z triton_mm_212 0.0060 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:15.4119312Z triton_mm_216 0.0062 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:15.4120193Z triton_mm_208 0.0064 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:15.4120982Z SingleProcess AUTOTUNE benchmarking takes 0.2105 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T13:20:15.6513454Z Autotune Choices Stats: 2025-09-07T13:20:15.6514580Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_256", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.00598399993032217, "best_triton_pos": 0} 2025-09-07T13:20:15.6643281Z AUTOTUNE addmm(8x256, 8x64, 64x256) 2025-09-07T13:20:15.6643584Z strides: [0, 1], [64, 1], [1, 64] 2025-09-07T13:20:15.6643908Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:20:15.6644648Z triton_mm_256 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:15.6645810Z triton_mm_263 0.0060 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:15.6647054Z triton_mm_257 0.0061 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:15.6648576Z triton_mm_252 0.0064 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:20:15.6650097Z triton_mm_262 0.0064 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:15.6651613Z triton_mm_251 0.0065 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:15.6653324Z triton_mm_250 0.0065 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:20:15.6654874Z triton_mm_253 0.0066 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:20:15.6656725Z triton_mm_265 0.0066 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:15.6657557Z triton_mm_261 0.0067 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:15.6658299Z SingleProcess AUTOTUNE benchmarking takes 0.2527 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:20:15.9302285Z Autotune Choices Stats: 2025-09-07T13:20:15.9303270Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_270", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007552000228315592, "best_triton_pos": 0} 2025-09-07T13:20:15.9438750Z AUTOTUNE addmm(392x256, 392x256, 256x256) 2025-09-07T13:20:15.9439246Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T13:20:15.9439754Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:20:15.9440888Z triton_mm_270 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:15.9442452Z triton_mm_269 0.0078 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:15.9443984Z triton_mm_267 0.0078 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:15.9446278Z triton_mm_268 0.0078 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:15.9447200Z bias_addmm 0.0079 ms 95.5% 2025-09-07T13:20:15.9447837Z triton_mm_274 0.0079 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:15.9448806Z triton_mm_273 0.0082 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:20:15.9449701Z triton_mm_278 0.0084 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:15.9450596Z triton_mm_277 0.0085 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:15.9451486Z triton_mm_276 0.0088 ms 86.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:15.9452264Z SingleProcess AUTOTUNE benchmarking takes 0.2789 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:20:16.1879647Z Autotune Choices Stats: 2025-09-07T13:20:16.1881226Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_306", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.00940799992531538, "best_triton_pos": 0} 2025-09-07T13:20:16.2021916Z AUTOTUNE addmm(8x1000, 8x1280, 1280x1000) 2025-09-07T13:20:16.2022661Z strides: [0, 1], [1280, 1], [1, 1280] 2025-09-07T13:20:16.2023208Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:20:16.2024334Z triton_mm_306 0.0094 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:20:16.2025835Z bias_addmm 0.0099 ms 95.1% 2025-09-07T13:20:16.2027014Z triton_mm_310 0.0099 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:16.2028032Z triton_mm_314 0.0112 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:16.2029018Z triton_mm_318 0.0122 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:16.2029972Z triton_mm_305 0.0132 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:20:16.2030578Z addmm 0.0135 ms 69.8% 2025-09-07T13:20:16.2031142Z triton_mm_304 0.0139 ms 67.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:16.2032088Z triton_mm_309 0.0141 ms 66.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:16.2033030Z triton_mm_303 0.0146 ms 64.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:20:16.2033870Z SingleProcess AUTOTUNE benchmarking takes 0.2577 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:20:18.1571527Z pass 2025-09-07T13:20:21.9892152Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:20:21.9895528Z import pynvml # type: ignore[import] 2025-09-07T13:20:25.1363703Z 2025-09-07T13:20:26.1258751Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:20:26.1259282Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:20:26.1340790Z cuda eval levit_128 2025-09-07T13:20:49.1812862Z Autotune Choices Stats: 2025-09-07T13:20:49.1814204Z {"num_choices": 6, "num_triton_choices": 5, "best_kernel": "triton_convolution2d_1", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.013856000266969204, "best_triton_pos": 0} 2025-09-07T13:20:49.1993100Z AUTOTUNE convolution(8x3x224x224, 16x3x3x3) 2025-09-07T13:20:49.1993488Z strides: [150528, 50176, 224, 1], [27, 9, 3, 1] 2025-09-07T13:20:49.1993806Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:20:49.1994602Z triton_convolution2d_1 0.0139 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:20:49.1996427Z triton_convolution2d_4 0.0143 ms 96.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:20:49.1997783Z triton_convolution2d_3 0.0158 ms 87.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:20:49.1999387Z triton_convolution2d_2 0.0171 ms 81.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:20:49.2000537Z triton_convolution2d_0 0.0171 ms 80.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:20:49.2001251Z convolution 0.0202 ms 68.6% 2025-09-07T13:20:49.2001692Z SingleProcess AUTOTUNE benchmarking takes 0.1006 seconds and 0.0002 seconds precompiling for 6 choices 2025-09-07T13:20:49.6409910Z Autotune Choices Stats: 2025-09-07T13:20:49.6411104Z {"num_choices": 7, "num_triton_choices": 6, "best_kernel": "triton_convolution2d_9", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.010847999714314938, "best_triton_pos": 0} 2025-09-07T13:20:49.6548997Z AUTOTUNE convolution(8x16x112x112, 32x16x3x3) 2025-09-07T13:20:49.6549346Z strides: [200704, 12544, 112, 1], [144, 9, 3, 1] 2025-09-07T13:20:49.6549650Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:20:49.6550444Z triton_convolution2d_9 0.0108 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:20:49.6551683Z triton_convolution2d_5 0.0109 ms 99.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:20:49.6552931Z triton_convolution2d_8 0.0115 ms 94.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:20:49.6554171Z triton_convolution2d_10 0.0123 ms 88.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:20:49.6556057Z triton_convolution2d_6 0.0127 ms 85.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:20:49.6556904Z convolution 0.0261 ms 41.5% 2025-09-07T13:20:49.6557576Z triton_convolution2d_7 0.0374 ms 29.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:20:49.6558469Z SingleProcess AUTOTUNE benchmarking takes 0.1069 seconds and 0.0002 seconds precompiling for 7 choices 2025-09-07T13:20:50.4882519Z Autotune Choices Stats: 2025-09-07T13:20:50.4883711Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_16", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.011231999844312668, "best_triton_pos": 0} 2025-09-07T13:20:50.5028103Z AUTOTUNE convolution(8x32x56x56, 64x32x3x3) 2025-09-07T13:20:50.5028446Z strides: [100352, 3136, 56, 1], [288, 9, 3, 1] 2025-09-07T13:20:50.5028735Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:20:50.5029521Z triton_convolution2d_16 0.0112 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:20:50.5030979Z triton_convolution2d_15 0.0119 ms 94.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:20:50.5032219Z triton_convolution2d_14 0.0131 ms 85.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:20:50.5033450Z triton_convolution2d_11 0.0140 ms 80.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:20:50.5034659Z triton_convolution2d_17 0.0164 ms 68.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:20:50.5036075Z triton_convolution2d_12 0.0195 ms 57.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:20:50.5036812Z convolution 0.0204 ms 54.9% 2025-09-07T13:20:50.5037504Z triton_convolution2d_13 0.0527 ms 21.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:20:50.5038405Z SingleProcess AUTOTUNE benchmarking takes 0.5480 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:20:51.0532060Z Autotune Choices Stats: 2025-09-07T13:20:51.0533060Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_107", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.007007999811321497, "best_triton_pos": 0} 2025-09-07T13:20:51.0674264Z AUTOTUNE mm(1568x128, 128x256) 2025-09-07T13:20:51.0674560Z strides: [128, 1], [1, 128] 2025-09-07T13:20:51.0674842Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:20:51.0675655Z triton_mm_107 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:51.0677628Z triton_mm_103 0.0070 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:20:51.0678791Z triton_mm_97 0.0074 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:51.0679794Z triton_mm_108 0.0074 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:51.0680779Z triton_mm_104 0.0075 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:51.0681748Z triton_mm_106 0.0075 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:51.0682715Z triton_mm_110 0.0075 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:51.0683695Z triton_mm_105 0.0076 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:51.0684655Z triton_mm_98 0.0077 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:51.0685901Z triton_mm_109 0.0077 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:51.0686739Z SingleProcess AUTOTUNE benchmarking takes 0.2493 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:20:51.6550165Z Autotune Choices Stats: 2025-09-07T13:20:51.6551219Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_80", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.00684799998998642, "best_triton_pos": 0} 2025-09-07T13:20:51.6686927Z AUTOTUNE mm(1568x128, 128x128) 2025-09-07T13:20:51.6687217Z strides: [128, 1], [1, 128] 2025-09-07T13:20:51.6687501Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:20:51.6688178Z triton_mm_80 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:51.6689163Z triton_mm_84 0.0069 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:20:51.6690135Z triton_mm_79 0.0069 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:51.6691094Z triton_mm_78 0.0071 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:51.6692060Z triton_mm_88 0.0072 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:51.6693014Z triton_mm_89 0.0074 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:51.6693992Z triton_mm_87 0.0074 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:51.6695745Z triton_mm_91 0.0074 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:51.6696879Z triton_mm_85 0.0075 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:51.6697766Z triton_mm_86 0.0075 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:51.6698540Z SingleProcess AUTOTUNE benchmarking takes 0.2537 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:20:52.9837162Z Autotune Choices Stats: 2025-09-07T13:20:52.9838394Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_119", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007135999854654074, "best_triton_pos": 0} 2025-09-07T13:20:52.9976628Z AUTOTUNE mm(1568x256, 256x128) 2025-09-07T13:20:52.9977239Z strides: [256, 1], [1, 256] 2025-09-07T13:20:52.9977568Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:20:52.9978287Z triton_mm_119 0.0071 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:52.9979307Z triton_mm_123 0.0073 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:52.9980799Z triton_mm_122 0.0076 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:20:52.9981909Z triton_mm_117 0.0078 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:52.9982860Z triton_mm_118 0.0078 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:52.9983479Z mm 0.0078 ms 91.0% 2025-09-07T13:20:52.9984056Z triton_mm_126 0.0079 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:52.9985376Z triton_mm_127 0.0079 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:52.9986353Z triton_mm_116 0.0080 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:52.9987438Z triton_mm_129 0.0084 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:52.9988288Z SingleProcess AUTOTUNE benchmarking takes 0.2514 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:20:53.6770303Z Autotune Choices Stats: 2025-09-07T13:20:53.6771365Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_552", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006816000211983919, "best_triton_pos": 0} 2025-09-07T13:20:53.6916184Z AUTOTUNE mm(392x256, 256x512) 2025-09-07T13:20:53.6916507Z strides: [256, 1], [1, 256] 2025-09-07T13:20:53.6916777Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:20:53.6918007Z triton_mm_552 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:53.6919222Z triton_mm_551 0.0071 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:53.6920355Z triton_mm_556 0.0073 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:53.6921332Z triton_mm_555 0.0074 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:20:53.6922309Z triton_mm_550 0.0076 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:53.6922921Z mm 0.0077 ms 88.0% 2025-09-07T13:20:53.6923505Z triton_mm_559 0.0078 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:53.6924488Z triton_mm_560 0.0078 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:53.6925800Z triton_mm_549 0.0079 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:53.6926929Z triton_mm_562 0.0082 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:53.6927784Z SingleProcess AUTOTUNE benchmarking takes 0.2454 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:20:54.3126172Z Autotune Choices Stats: 2025-09-07T13:20:54.3127360Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_533", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007424000184983015, "best_triton_pos": 0} 2025-09-07T13:20:54.3269240Z AUTOTUNE mm(392x512, 512x256) 2025-09-07T13:20:54.3269606Z strides: [512, 1], [1, 512] 2025-09-07T13:20:54.3269839Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:20:54.3270411Z triton_mm_533 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:54.3271253Z triton_mm_537 0.0078 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:54.3272060Z triton_mm_541 0.0084 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:54.3272882Z triton_mm_532 0.0087 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:54.3273675Z triton_mm_531 0.0088 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:54.3274453Z triton_mm_536 0.0089 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:20:54.3275377Z triton_mm_530 0.0091 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:54.3276471Z triton_mm_540 0.0091 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:54.3277408Z triton_mm_543 0.0097 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:54.3278315Z triton_mm_539 0.0097 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:54.3279020Z SingleProcess AUTOTUNE benchmarking takes 0.2466 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:20:55.1481844Z Autotune Choices Stats: 2025-09-07T13:20:55.1482977Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_636", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007135999854654074, "best_triton_pos": 0} 2025-09-07T13:20:55.1628297Z AUTOTUNE mm(392x256, 256x256) 2025-09-07T13:20:55.1628606Z strides: [256, 1], [1, 256] 2025-09-07T13:20:55.1628876Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:20:55.1629543Z triton_mm_636 0.0071 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:55.1630173Z mm 0.0074 ms 96.1% 2025-09-07T13:20:55.1630735Z triton_mm_640 0.0074 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:55.1632092Z triton_mm_633 0.0075 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:55.1633064Z triton_mm_639 0.0076 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:20:55.1634012Z triton_mm_635 0.0076 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:55.1635367Z triton_mm_634 0.0076 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:55.1636317Z triton_mm_643 0.0078 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:55.1637276Z triton_mm_644 0.0080 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:55.1638230Z triton_mm_642 0.0082 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:55.1638999Z SingleProcess AUTOTUNE benchmarking takes 0.2512 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:20:55.5914154Z Autotune Choices Stats: 2025-09-07T13:20:55.5915327Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1078", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007424000184983015, "best_triton_pos": 0} 2025-09-07T13:20:55.6056516Z AUTOTUNE mm(128x384, 384x768) 2025-09-07T13:20:55.6056808Z strides: [384, 1], [1, 384] 2025-09-07T13:20:55.6057112Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:20:55.6057822Z triton_mm_1078 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:55.6059349Z triton_mm_1082 0.0077 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:55.6060158Z mm 0.0079 ms 93.9% 2025-09-07T13:20:55.6060741Z triton_mm_1077 0.0084 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:55.6061826Z triton_mm_1081 0.0085 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:20:55.6062801Z triton_mm_1086 0.0085 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:55.6063787Z triton_mm_1075 0.0087 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:55.6064763Z triton_mm_1076 0.0088 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:55.6065907Z triton_mm_1085 0.0088 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:55.6066887Z triton_mm_1088 0.0093 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:55.6067967Z SingleProcess AUTOTUNE benchmarking takes 0.2460 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:20:56.2765494Z Autotune Choices Stats: 2025-09-07T13:20:56.2766613Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1059", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008576000109314919, "best_triton_pos": 0} 2025-09-07T13:20:56.2916503Z AUTOTUNE mm(128x1024, 1024x384) 2025-09-07T13:20:56.2917013Z strides: [1024, 1], [1, 1024] 2025-09-07T13:20:56.2917304Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:20:56.2918133Z triton_mm_1059 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:56.2919339Z triton_mm_1063 0.0089 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:56.2920005Z mm 0.0089 ms 96.1% 2025-09-07T13:20:56.2920635Z triton_mm_1067 0.0103 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:56.2921701Z triton_mm_1058 0.0114 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:56.2922746Z triton_mm_1062 0.0118 ms 72.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:20:56.2923770Z triton_mm_1057 0.0119 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:56.2924816Z triton_mm_1073 0.0123 ms 70.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:56.2926521Z triton_mm_1056 0.0124 ms 69.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:56.2927773Z triton_mm_1066 0.0124 ms 69.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:56.2928733Z SingleProcess AUTOTUNE benchmarking takes 0.2499 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:20:56.8070790Z Autotune Choices Stats: 2025-09-07T13:20:56.8071839Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1097", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008063999935984612, "best_triton_pos": 0} 2025-09-07T13:20:56.8216872Z AUTOTUNE mm(128x768, 768x384) 2025-09-07T13:20:56.8217436Z strides: [768, 1], [1, 768] 2025-09-07T13:20:56.8217736Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:20:56.8218407Z triton_mm_1097 0.0081 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:56.8219373Z triton_mm_1101 0.0085 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:56.8219941Z mm 0.0086 ms 93.3% 2025-09-07T13:20:56.8220468Z triton_mm_1105 0.0094 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:56.8221878Z triton_mm_1096 0.0100 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:56.8222786Z triton_mm_1100 0.0104 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:20:56.8223667Z triton_mm_1095 0.0105 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:56.8224553Z triton_mm_1104 0.0108 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:56.8225648Z triton_mm_1094 0.0109 ms 73.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:56.8226557Z triton_mm_1111 0.0112 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:56.8227344Z SingleProcess AUTOTUNE benchmarking takes 0.2518 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:20:57.3321537Z Autotune Choices Stats: 2025-09-07T13:20:57.3322635Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1145", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007296000141650438, "best_triton_pos": 0} 2025-09-07T13:20:57.3462319Z AUTOTUNE mm(128x384, 384x384) 2025-09-07T13:20:57.3462734Z strides: [384, 1], [1, 384] 2025-09-07T13:20:57.3463013Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:20:57.3463732Z triton_mm_1145 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:57.3464751Z triton_mm_1149 0.0076 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:57.3466319Z mm 0.0077 ms 94.6% 2025-09-07T13:20:57.3467064Z triton_mm_1144 0.0082 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:57.3468319Z triton_mm_1153 0.0082 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:57.3469319Z triton_mm_1148 0.0083 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:20:57.3470289Z triton_mm_1152 0.0085 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:57.3471270Z triton_mm_1142 0.0086 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:57.3472243Z triton_mm_1143 0.0086 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:57.3473211Z triton_mm_1155 0.0088 ms 82.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:57.3474065Z SingleProcess AUTOTUNE benchmarking takes 0.2497 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:20:57.7232775Z Autotune Choices Stats: 2025-09-07T13:20:57.7234106Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_1477", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.00723200011998415, "best_triton_pos": 0} 2025-09-07T13:20:57.7378083Z AUTOTUNE mm(8x384, 384x1000) 2025-09-07T13:20:57.7378441Z strides: [384, 1], [1, 384] 2025-09-07T13:20:57.7378753Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:20:57.7379447Z triton_mm_1477 0.0072 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:20:57.7380495Z triton_mm_1481 0.0075 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:57.7381117Z mm 0.0077 ms 93.4% 2025-09-07T13:20:57.7381808Z triton_mm_1476 0.0080 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:20:57.7382777Z triton_mm_1480 0.0081 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:57.7383758Z triton_mm_1489 0.0081 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:57.7384717Z triton_mm_1485 0.0082 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:57.7385870Z triton_mm_1475 0.0083 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:57.7386848Z triton_mm_1474 0.0085 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:20:57.7388239Z triton_mm_1484 0.0086 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:57.7389286Z SingleProcess AUTOTUNE benchmarking takes 0.2273 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T13:20:58.6986514Z Autotune Choices Stats: 2025-09-07T13:20:58.6987384Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_487", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.006624000146985054, "best_triton_pos": 0} 2025-09-07T13:20:58.7136434Z AUTOTUNE mm(392x128, 128x128) 2025-09-07T13:20:58.7136756Z strides: [128, 1], [1, 128] 2025-09-07T13:20:58.7137052Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:20:58.7137744Z triton_mm_487 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:20:58.7138801Z triton_mm_482 0.0067 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:58.7139677Z triton_mm_483 0.0067 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:58.7140511Z triton_mm_481 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:58.7141785Z triton_mm_491 0.0068 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:58.7142335Z mm 0.0071 ms 92.8% 2025-09-07T13:20:58.7142840Z triton_mm_490 0.0072 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:58.7143676Z triton_mm_493 0.0073 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:58.7144516Z triton_mm_494 0.0073 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:58.7145491Z triton_mm_488 0.0073 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:58.7146230Z SingleProcess AUTOTUNE benchmarking takes 0.2573 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:20:59.1673246Z Autotune Choices Stats: 2025-09-07T13:20:59.1674314Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1021", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006688000168651342, "best_triton_pos": 0} 2025-09-07T13:20:59.1819786Z AUTOTUNE mm(128x256, 256x256) 2025-09-07T13:20:59.1820101Z strides: [256, 1], [1, 256] 2025-09-07T13:20:59.1820374Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:20:59.1821082Z triton_mm_1021 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:59.1822233Z triton_mm_1025 0.0071 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:20:59.1822859Z mm 0.0072 ms 92.5% 2025-09-07T13:20:59.1823795Z triton_mm_1020 0.0073 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:59.1825922Z triton_mm_1018 0.0074 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:20:59.1827079Z triton_mm_1024 0.0075 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:20:59.1828053Z triton_mm_1028 0.0075 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:20:59.1829147Z triton_mm_1019 0.0075 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:20:59.1830124Z triton_mm_1029 0.0076 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:20:59.1831093Z triton_mm_1027 0.0080 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:20:59.1831935Z SingleProcess AUTOTUNE benchmarking takes 0.2470 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:21:00.1325505Z Autotune Choices Stats: 2025-09-07T13:21:00.1327184Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_22", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.01775999926030636, "best_triton_pos": 0} 2025-09-07T13:21:00.1466990Z AUTOTUNE convolution(8x64x28x28, 128x64x3x3) 2025-09-07T13:21:00.1467356Z strides: [50176, 784, 28, 1], [576, 9, 3, 1] 2025-09-07T13:21:00.1467659Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:21:00.1468489Z triton_convolution2d_22 0.0178 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:21:00.1469409Z convolution 0.0201 ms 88.5% 2025-09-07T13:21:00.1470133Z triton_convolution2d_23 0.0225 ms 78.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:21:00.1471372Z triton_convolution2d_21 0.0264 ms 67.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:21:00.1472596Z triton_convolution2d_24 0.0273 ms 65.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:21:00.1473814Z triton_convolution2d_18 0.0284 ms 62.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:21:00.1475205Z triton_convolution2d_19 0.0347 ms 51.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:21:00.1476433Z triton_convolution2d_20 0.0748 ms 23.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:21:00.1477620Z SingleProcess AUTOTUNE benchmarking takes 0.1182 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:21:00.3494646Z Autotune Choices Stats: 2025-09-07T13:21:00.3496060Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_bmm_53", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.007296000141650438, "best_triton_pos": 0} 2025-09-07T13:21:00.3635488Z AUTOTUNE bmm(32x196x16, 32x16x196) 2025-09-07T13:21:00.3635795Z strides: [3136, 16, 1], [3136, 196, 1] 2025-09-07T13:21:00.3636081Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:21:00.3636757Z triton_bmm_53 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:21:00.3637758Z triton_bmm_54 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:21:00.3638735Z triton_bmm_50 0.0073 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:21:00.3639701Z triton_bmm_49 0.0074 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:21:00.3640616Z triton_bmm_52 0.0074 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:00.3641753Z triton_bmm_59 0.0074 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:21:00.3642647Z triton_bmm_44 0.0075 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:21:00.3643531Z triton_bmm_48 0.0075 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:00.3644408Z triton_bmm_51 0.0075 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:00.3645497Z triton_bmm_45 0.0075 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:21:00.3646278Z SingleProcess AUTOTUNE benchmarking takes 0.2120 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T13:21:00.5724505Z Autotune Choices Stats: 2025-09-07T13:21:00.5725836Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_62", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008287999778985977, "best_triton_pos": 0} 2025-09-07T13:21:00.5866459Z AUTOTUNE bmm(32x196x196, 32x196x32) 2025-09-07T13:21:00.5866754Z strides: [38464, 196, 1], [6272, 32, 1] 2025-09-07T13:21:00.5867046Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:21:00.5867713Z triton_bmm_62 0.0083 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:00.5868757Z triton_bmm_63 0.0083 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:21:00.5869888Z triton_bmm_71 0.0083 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:21:00.5871211Z triton_bmm_70 0.0084 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:00.5872301Z triton_bmm_64 0.0085 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:00.5873263Z triton_bmm_69 0.0085 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:21:00.5874225Z triton_bmm_67 0.0086 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:21:00.5875353Z triton_bmm_61 0.0088 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:21:00.5876323Z triton_bmm_68 0.0089 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:00.5877295Z triton_bmm_72 0.0094 ms 88.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:00.5878147Z SingleProcess AUTOTUNE benchmarking takes 0.2225 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T13:21:00.8561847Z Autotune Choices Stats: 2025-09-07T13:21:00.8563204Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_472", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.0077760000713169575, "best_triton_pos": 0} 2025-09-07T13:21:00.8707259Z AUTOTUNE mm(1568x128, 128x640) 2025-09-07T13:21:00.8707535Z strides: [128, 1], [1, 128] 2025-09-07T13:21:00.8707808Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:21:00.8708507Z triton_mm_472 0.0078 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:00.8709682Z triton_mm_471 0.0079 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:21:00.8710635Z triton_mm_468 0.0079 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:21:00.8711599Z triton_mm_475 0.0079 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:21:00.8712572Z triton_mm_470 0.0080 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:00.8713538Z triton_mm_473 0.0081 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:21:00.8714508Z triton_mm_474 0.0082 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:00.8715449Z mm 0.0083 ms 93.8% 2025-09-07T13:21:00.8716012Z triton_mm_469 0.0084 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:00.8716994Z triton_mm_479 0.0084 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:21:00.8718003Z SingleProcess AUTOTUNE benchmarking takes 0.2453 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:21:01.0575241Z Autotune Choices Stats: 2025-09-07T13:21:01.0576430Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_bmm_510", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.006752000190317631, "best_triton_pos": 0} 2025-09-07T13:21:01.0722115Z AUTOTUNE bmm(64x49x16, 64x16x196) 2025-09-07T13:21:01.0722424Z strides: [784, 16, 1], [3136, 196, 1] 2025-09-07T13:21:01.0722723Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:21:01.0723430Z triton_bmm_510 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:01.0724458Z triton_bmm_507 0.0068 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:01.0725686Z triton_bmm_508 0.0068 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:21:01.0726686Z triton_bmm_504 0.0068 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:21:01.0727660Z triton_bmm_506 0.0068 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:01.0728896Z triton_bmm_503 0.0069 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:01.0729923Z triton_bmm_509 0.0069 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:21:01.0730819Z triton_bmm_501 0.0071 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:21:01.0731712Z triton_bmm_505 0.0071 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:21:01.0732628Z triton_bmm_499 0.0071 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:21:01.0733414Z SingleProcess AUTOTUNE benchmarking takes 0.2009 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T13:21:01.2588871Z Autotune Choices Stats: 2025-09-07T13:21:01.2589972Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_bmm_526", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.008320000022649765, "best_triton_pos": 0} 2025-09-07T13:21:01.2729602Z AUTOTUNE bmm(64x49x196, 64x196x64) 2025-09-07T13:21:01.2730017Z strides: [9664, 196, 1], [12544, 64, 1] 2025-09-07T13:21:01.2730307Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:21:01.2730964Z triton_bmm_526 0.0083 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:21:01.2731993Z triton_bmm_517 0.0084 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:21:01.2733452Z triton_bmm_516 0.0084 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:21:01.2734625Z triton_bmm_525 0.0084 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:01.2736226Z triton_bmm_518 0.0084 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:01.2737205Z triton_bmm_528 0.0084 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:21:01.2738180Z triton_bmm_521 0.0085 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:21:01.2739194Z triton_bmm_522 0.0085 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:01.2740074Z triton_bmm_524 0.0086 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:21:01.2740921Z triton_bmm_515 0.0089 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:21:01.2741756Z SingleProcess AUTOTUNE benchmarking takes 0.2002 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T13:21:01.4430496Z Autotune Choices Stats: 2025-09-07T13:21:01.4431888Z {"num_choices": 14, "num_triton_choices": 13, "best_kernel": "triton_bmm_606", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.0063680000603199005, "best_triton_pos": 0} 2025-09-07T13:21:01.4571583Z AUTOTUNE bmm(64x49x16, 64x16x49) 2025-09-07T13:21:01.4571920Z strides: [784, 16, 1], [784, 49, 1] 2025-09-07T13:21:01.4572222Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:21:01.4572895Z triton_bmm_606 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:21:01.4573887Z triton_bmm_607 0.0065 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:21:01.4574853Z triton_bmm_608 0.0065 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:21:01.4576221Z triton_bmm_609 0.0065 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:01.4577169Z triton_bmm_605 0.0067 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:21:01.4578153Z triton_bmm_615 0.0067 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:21:01.4579121Z triton_bmm_616 0.0067 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:21:01.4579995Z triton_bmm_610 0.0068 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:21:01.4581085Z triton_bmm_611 0.0068 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:21:01.4582164Z triton_bmm_612 0.0068 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:01.4582979Z SingleProcess AUTOTUNE benchmarking takes 0.1792 seconds and 0.0002 seconds precompiling for 14 choices 2025-09-07T13:21:01.6320849Z Autotune Choices Stats: 2025-09-07T13:21:01.6321887Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_bmm_620", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006719999946653843, "best_triton_pos": 0} 2025-09-07T13:21:01.6461906Z AUTOTUNE bmm(64x49x49, 64x49x32) 2025-09-07T13:21:01.6462247Z strides: [2432, 49, 1], [1600, 32, 1] 2025-09-07T13:21:01.6462548Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:21:01.6463222Z triton_bmm_620 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:01.6464264Z triton_bmm_625 0.0070 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:21:01.6465649Z triton_bmm_631 0.0070 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:21:01.6466990Z triton_bmm_619 0.0071 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:21:01.6467981Z triton_bmm_621 0.0071 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:21:01.6468979Z triton_bmm_627 0.0072 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:21:01.6470124Z triton_bmm_629 0.0072 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:21:01.6471085Z triton_bmm_622 0.0072 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:01.6472054Z triton_bmm_628 0.0072 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:01.6473021Z triton_bmm_624 0.0073 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:21:01.6473871Z SingleProcess AUTOTUNE benchmarking takes 0.1885 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T13:21:01.9170173Z Autotune Choices Stats: 2025-09-07T13:21:01.9171440Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.007872000336647034, "best_triton_pos": 1, "best_triton_time": 0.008063999935984612, "best_triton_kernel": "triton_mm_1010", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T13:21:01.9312618Z AUTOTUNE mm(392x256, 256x1280) 2025-09-07T13:21:01.9312894Z strides: [256, 1], [1, 256] 2025-09-07T13:21:01.9313157Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:21:01.9313808Z mm 0.0079 ms 100.0% 2025-09-07T13:21:01.9314403Z triton_mm_1010 0.0081 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:21:01.9317205Z triton_mm_1005 0.0082 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:21:01.9318318Z triton_mm_1009 0.0082 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:01.9319287Z triton_mm_1012 0.0083 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:21:01.9320253Z triton_mm_1008 0.0084 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:21:01.9321150Z triton_mm_999 0.0087 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:21:01.9322046Z triton_mm_1000 0.0088 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:21:01.9322958Z triton_mm_1016 0.0089 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:21:01.9323862Z triton_mm_1015 0.0090 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:01.9324745Z SingleProcess AUTOTUNE benchmarking takes 0.2459 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:21:02.0302195Z Autotune Choices Stats: 2025-09-07T13:21:02.0303198Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_bmm_1038", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006304000038653612, "best_triton_pos": 0} 2025-09-07T13:21:02.0441591Z AUTOTUNE bmm(128x16x16, 128x16x49) 2025-09-07T13:21:02.0441922Z strides: [256, 16, 1], [784, 49, 1] 2025-09-07T13:21:02.0442237Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:21:02.0442956Z triton_bmm_1038 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:02.0444011Z triton_bmm_1040 0.0064 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:21:02.0446439Z triton_bmm_1041 0.0064 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:02.0447513Z triton_bmm_1042 0.0064 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:21:02.0448512Z triton_bmm_1037 0.0067 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:21:02.0449537Z triton_bmm_1039 0.0067 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:21:02.0450396Z triton_bmm_1036 0.0068 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:21:02.0451254Z bmm 0.0072 ms 87.2% 2025-09-07T13:21:02.0451649Z SingleProcess AUTOTUNE benchmarking takes 0.1117 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:21:02.1970000Z Autotune Choices Stats: 2025-09-07T13:21:02.1971386Z {"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_bmm_1045", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006783999968320131, "best_triton_pos": 0} 2025-09-07T13:21:02.2110433Z AUTOTUNE bmm(128x16x49, 128x49x64) 2025-09-07T13:21:02.2110827Z strides: [784, 49, 1], [3136, 64, 1] 2025-09-07T13:21:02.2111102Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:21:02.2111776Z triton_bmm_1045 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:02.2112780Z triton_bmm_1052 0.0068 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:02.2113760Z triton_bmm_1046 0.0068 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:21:02.2114738Z triton_bmm_1053 0.0068 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:21:02.2116060Z triton_bmm_1050 0.0071 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:02.2117275Z triton_bmm_1051 0.0071 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:02.2118266Z triton_bmm_1054 0.0071 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:21:02.2119249Z triton_bmm_1047 0.0072 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:21:02.2120224Z triton_bmm_1044 0.0072 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:21:02.2121149Z triton_bmm_1049 0.0072 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:21:02.2121953Z SingleProcess AUTOTUNE benchmarking takes 0.1665 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T13:21:02.2922955Z Autotune Choices Stats: 2025-09-07T13:21:02.2923916Z {"num_choices": 6, "num_triton_choices": 5, "best_kernel": "triton_bmm_1132", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1", "best_time": 0.005791999865323305, "best_triton_pos": 0} 2025-09-07T13:21:02.3067156Z AUTOTUNE bmm(96x16x16, 96x16x16) 2025-09-07T13:21:02.3067530Z strides: [256, 16, 1], [256, 16, 1] 2025-09-07T13:21:02.3067817Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:21:02.3068483Z triton_bmm_1132 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1 2025-09-07T13:21:02.3069583Z triton_bmm_1133 0.0059 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1 2025-09-07T13:21:02.3070820Z triton_bmm_1135 0.0059 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=1 2025-09-07T13:21:02.3071907Z triton_bmm_1134 0.0060 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=1 2025-09-07T13:21:02.3072973Z triton_bmm_1131 0.0060 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=1 2025-09-07T13:21:02.3073586Z bmm 0.0068 ms 84.6% 2025-09-07T13:21:02.3074025Z SingleProcess AUTOTUNE benchmarking takes 0.0908 seconds and 0.0002 seconds precompiling for 6 choices 2025-09-07T13:21:02.3835796Z Autotune Choices Stats: 2025-09-07T13:21:02.3836778Z {"num_choices": 6, "num_triton_choices": 5, "best_kernel": "triton_bmm_1138", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006016000173985958, "best_triton_pos": 0} 2025-09-07T13:21:02.3980027Z AUTOTUNE bmm(96x16x16, 96x16x32) 2025-09-07T13:21:02.3980353Z strides: [256, 16, 1], [512, 32, 1] 2025-09-07T13:21:02.3980658Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:21:02.3981345Z triton_bmm_1138 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:21:02.3982446Z triton_bmm_1140 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T13:21:02.3983701Z triton_bmm_1137 0.0060 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:21:02.3984703Z triton_bmm_1139 0.0060 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T13:21:02.3986032Z triton_bmm_1136 0.0061 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:21:02.3986648Z bmm 0.0070 ms 86.2% 2025-09-07T13:21:02.3987106Z SingleProcess AUTOTUNE benchmarking takes 0.0908 seconds and 0.0002 seconds precompiling for 6 choices 2025-09-07T13:21:26.7434379Z pass 2025-09-07T13:21:31.4412013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:21:31.4413780Z import pynvml # type: ignore[import] 2025-09-07T13:21:34.4973299Z 2025-09-07T13:21:35.8230323Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:21:35.8230673Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:21:35.8275951Z cuda eval mixer_b16_224 2025-09-07T13:21:47.0455472Z Autotune Choices Stats: 2025-09-07T13:21:47.0456641Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.019328000023961067, "best_triton_pos": 1, "best_triton_time": 0.021568000316619873, "best_triton_kernel": "triton_mm_61", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T13:21:47.0608423Z AUTOTUNE mm(1568x768, 768x3072) 2025-09-07T13:21:47.0608732Z strides: [768, 1], [1, 768] 2025-09-07T13:21:47.0609003Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:21:47.0609294Z mm 0.0193 ms 100.0% 2025-09-07T13:21:47.0609912Z triton_mm_61 0.0216 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:47.0611366Z triton_mm_62 0.0220 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:47.0612581Z triton_mm_56 0.0238 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:47.0613488Z triton_mm_63 0.0250 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:21:47.0614376Z triton_mm_55 0.0271 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:21:47.0615427Z triton_mm_54 0.0283 ms 68.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:47.0616317Z triton_mm_57 0.0286 ms 67.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:21:47.0617209Z triton_mm_58 0.0299 ms 64.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:47.0618086Z triton_mm_52 0.0299 ms 64.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:21:47.0618864Z SingleProcess AUTOTUNE benchmarking takes 0.2903 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:21:47.8636568Z Autotune Choices Stats: 2025-09-07T13:21:47.8637580Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.013183999806642532, "best_triton_pos": 0} 2025-09-07T13:21:47.8783633Z AUTOTUNE mm(6144x196, 196x384) 2025-09-07T13:21:47.8783928Z strides: [196, 1], [1, 196] 2025-09-07T13:21:47.8784191Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:21:47.8784851Z triton_mm_16 0.0132 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:47.8786341Z triton_mm_13 0.0137 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:21:47.8787337Z triton_mm_22 0.0139 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:21:47.8788326Z triton_mm_23 0.0139 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:47.8789305Z triton_mm_20 0.0141 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:47.8790254Z triton_mm_17 0.0145 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:21:47.8791222Z triton_mm_18 0.0150 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:47.8792218Z triton_mm_21 0.0151 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:21:47.8793639Z triton_mm_24 0.0151 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:47.8794718Z triton_mm_12 0.0157 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:21:47.8795709Z SingleProcess AUTOTUNE benchmarking takes 0.2499 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:21:51.3075845Z Autotune Choices Stats: 2025-09-07T13:21:51.3077311Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.1292479932308197, "best_triton_pos": 1, "best_triton_time": 0.1327359974384308, "best_triton_kernel": "triton_convolution2d_6", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T13:21:51.3224238Z AUTOTUNE convolution(8x3x224x224, 768x3x16x16) 2025-09-07T13:21:51.3224585Z strides: [150528, 50176, 224, 1], [768, 256, 16, 1] 2025-09-07T13:21:51.3225112Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:21:51.3225416Z convolution 0.1292 ms 100.0% 2025-09-07T13:21:51.3226180Z triton_convolution2d_6 0.1327 ms 97.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:21:51.3227797Z triton_convolution2d_3 0.1448 ms 89.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:21:51.3229075Z triton_convolution2d_1 0.1468 ms 88.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:21:51.3230369Z triton_convolution2d_4 0.1764 ms 73.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:21:51.3231613Z triton_convolution2d_5 0.1954 ms 66.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:21:51.3232846Z triton_convolution2d_0 0.2171 ms 59.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:21:51.3234275Z triton_convolution2d_2 0.4088 ms 31.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:21:51.3235384Z SingleProcess AUTOTUNE benchmarking takes 0.2318 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T13:21:51.5626147Z Autotune Choices Stats: 2025-09-07T13:21:51.5627706Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_44", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.011168000288307667, "best_triton_pos": 0} 2025-09-07T13:21:51.5770628Z AUTOTUNE mm(6144x384, 384x196) 2025-09-07T13:21:51.5770895Z strides: [384, 1], [1, 384] 2025-09-07T13:21:51.5771164Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:21:51.5771859Z triton_mm_44 0.0112 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:21:51.5773039Z triton_mm_37 0.0114 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:51.5774149Z triton_mm_33 0.0115 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:21:51.5775451Z triton_mm_43 0.0118 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:51.5776453Z triton_mm_36 0.0120 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:21:51.5777411Z triton_mm_40 0.0124 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:21:51.5778371Z triton_mm_35 0.0129 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:51.5779334Z triton_mm_39 0.0131 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:51.5780292Z triton_mm_38 0.0133 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:21:51.5781347Z triton_mm_42 0.0133 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:51.5782295Z SingleProcess AUTOTUNE benchmarking takes 0.2537 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:21:51.8560032Z Autotune Choices Stats: 2025-09-07T13:21:51.8561239Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.018848000094294548, "best_triton_pos": 1, "best_triton_time": 0.02393599972128868, "best_triton_kernel": "triton_mm_82", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T13:21:51.8706033Z AUTOTUNE mm(1568x3072, 3072x768) 2025-09-07T13:21:51.8706462Z strides: [3072, 1], [1, 3072] 2025-09-07T13:21:51.8706906Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:21:51.8707368Z mm 0.0188 ms 100.0% 2025-09-07T13:21:51.8708328Z triton_mm_82 0.0239 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:21:51.8709920Z triton_mm_76 0.0287 ms 65.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:21:51.8711471Z triton_mm_75 0.0294 ms 64.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:51.8712990Z triton_mm_71 0.0297 ms 63.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:21:51.8714509Z triton_mm_81 0.0312 ms 60.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:51.8715922Z triton_mm_72 0.0322 ms 58.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:51.8716874Z triton_mm_74 0.0352 ms 53.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:21:51.8718061Z triton_mm_78 0.0356 ms 53.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:21:51.8719082Z triton_mm_68 0.0436 ms 43.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:51.8719915Z SingleProcess AUTOTUNE benchmarking takes 0.2925 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:21:52.2120841Z Autotune Choices Stats: 2025-09-07T13:21:52.2121798Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_923", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.00800000037997961, "best_triton_pos": 0} 2025-09-07T13:21:52.2269613Z AUTOTUNE addmm(8x1000, 8x768, 768x1000) 2025-09-07T13:21:52.2270125Z strides: [0, 1], [768, 1], [1, 768] 2025-09-07T13:21:52.2270648Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:21:52.2271776Z triton_mm_923 0.0080 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:21:52.2273386Z triton_mm_927 0.0088 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:52.2274465Z bias_addmm 0.0092 ms 86.8% 2025-09-07T13:21:52.2275606Z triton_mm_931 0.0098 ms 82.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:21:52.2276584Z triton_mm_935 0.0101 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:21:52.2277554Z triton_mm_922 0.0105 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:21:52.2278507Z triton_mm_921 0.0107 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:21:52.2279455Z triton_mm_926 0.0108 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:21:52.2280398Z triton_mm_920 0.0113 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:21:52.2281348Z triton_mm_933 0.0117 ms 68.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:21:52.2282189Z SingleProcess AUTOTUNE benchmarking takes 0.2549 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:21:56.0883303Z pass 2025-09-07T13:21:59.7065543Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:21:59.7066849Z import pynvml # type: ignore[import] 2025-09-07T13:22:02.8063882Z 2025-09-07T13:22:04.0167362Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:22:04.0167751Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:22:04.0272808Z cuda eval mixnet_l 2025-09-07T13:22:33.8306804Z Autotune Choices Stats: 2025-09-07T13:22:33.8308900Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_170", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.011839999817311764, "best_triton_pos": 0} 2025-09-07T13:22:33.8469628Z AUTOTUNE addmm(25088x240, 25088x40, 40x240) 2025-09-07T13:22:33.8469953Z strides: [0, 1], [40, 1], [1, 40] 2025-09-07T13:22:33.8470263Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:33.8470976Z triton_mm_170 0.0118 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:33.8471993Z triton_mm_171 0.0134 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:33.8472974Z triton_mm_161 0.0135 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:33.8473958Z triton_mm_165 0.0138 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:33.8474926Z triton_mm_164 0.0139 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:33.8476869Z triton_mm_160 0.0146 ms 81.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:33.8478016Z triton_mm_169 0.0148 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:33.8478942Z triton_mm_167 0.0152 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:33.8479848Z triton_mm_166 0.0154 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:33.8480762Z triton_mm_163 0.0156 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:33.8481554Z SingleProcess AUTOTUNE benchmarking takes 0.2952 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T13:22:34.4429824Z Autotune Choices Stats: 2025-09-07T13:22:34.4431276Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_497", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.008799999952316284, "best_triton_pos": 0} 2025-09-07T13:22:34.4674756Z AUTOTUNE addmm(6272x336, 6272x56, 56x336) 2025-09-07T13:22:34.4675313Z strides: [0, 1], [56, 1], [1, 56] 2025-09-07T13:22:34.4675764Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:34.4676720Z triton_mm_497 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:34.4678091Z triton_mm_502 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:34.4679440Z triton_mm_498 0.0089 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:34.4681266Z triton_mm_501 0.0089 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:34.4682842Z triton_mm_494 0.0093 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:34.4684313Z triton_mm_503 0.0101 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:34.4685317Z bias_addmm 0.0103 ms 85.4% 2025-09-07T13:22:34.4686147Z triton_mm_491 0.0104 ms 84.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:34.4687490Z triton_mm_504 0.0104 ms 84.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:34.4688848Z triton_mm_499 0.0104 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:34.4690025Z SingleProcess AUTOTUNE benchmarking takes 0.3077 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:22:35.0271768Z Autotune Choices Stats: 2025-09-07T13:22:35.0272834Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_1252", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.009247999638319016, "best_triton_pos": 0} 2025-09-07T13:22:35.0424785Z AUTOTUNE addmm(1568x960, 1568x160, 160x960) 2025-09-07T13:22:35.0425612Z strides: [0, 1], [160, 1], [1, 160] 2025-09-07T13:22:35.0426022Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:35.0426810Z triton_mm_1252 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:35.0427863Z triton_mm_1253 0.0093 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:35.0428852Z triton_mm_1257 0.0094 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:35.0429833Z triton_mm_1254 0.0095 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:35.0430818Z triton_mm_1256 0.0095 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:35.0431445Z bias_addmm 0.0098 ms 94.8% 2025-09-07T13:22:35.0432048Z triton_mm_1250 0.0098 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:35.0433036Z triton_mm_1261 0.0098 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:35.0434029Z triton_mm_1259 0.0100 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:35.0435160Z triton_mm_1260 0.0100 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:35.0436035Z SingleProcess AUTOTUNE benchmarking takes 0.2888 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:22:35.5951478Z Autotune Choices Stats: 2025-09-07T13:22:35.5952897Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_857", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.00825599953532219, "best_triton_pos": 0} 2025-09-07T13:22:35.6103092Z AUTOTUNE addmm(1568x624, 1568x104, 104x624) 2025-09-07T13:22:35.6103465Z strides: [0, 1], [104, 1], [1, 104] 2025-09-07T13:22:35.6103797Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:35.6104554Z triton_mm_857 0.0083 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:35.6106015Z triton_mm_852 0.0083 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:35.6107055Z triton_mm_853 0.0084 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:35.6108023Z triton_mm_854 0.0085 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:35.6108982Z triton_mm_856 0.0087 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:35.6109594Z bias_addmm 0.0088 ms 94.2% 2025-09-07T13:22:35.6110318Z triton_mm_855 0.0090 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:35.6111288Z triton_mm_859 0.0090 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:35.6112287Z triton_mm_849 0.0090 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:35.6113282Z triton_mm_858 0.0091 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:35.6114134Z SingleProcess AUTOTUNE benchmarking takes 0.2896 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:22:36.1551377Z Autotune Choices Stats: 2025-09-07T13:22:36.1552682Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.009440000168979168, "best_triton_pos": 1, "best_triton_time": 0.009983999654650688, "best_triton_kernel": "triton_mm_1326", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T13:22:36.1701058Z AUTOTUNE addmm(392x1584, 392x264, 264x1584) 2025-09-07T13:22:36.1701389Z strides: [0, 1], [264, 1], [1, 264] 2025-09-07T13:22:36.1701811Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:36.1702146Z bias_addmm 0.0094 ms 100.0% 2025-09-07T13:22:36.1702784Z triton_mm_1326 0.0100 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:36.1703794Z triton_mm_1329 0.0100 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:36.1704792Z triton_mm_1322 0.0101 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:36.1706786Z triton_mm_1325 0.0102 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:36.1707893Z triton_mm_1327 0.0105 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:36.1708897Z triton_mm_1324 0.0106 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:36.1709877Z triton_mm_1333 0.0109 ms 86.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:36.1710865Z triton_mm_1328 0.0112 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:36.1711860Z triton_mm_1332 0.0115 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:36.1712730Z SingleProcess AUTOTUNE benchmarking takes 0.2899 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:22:36.7324677Z Autotune Choices Stats: 2025-09-07T13:22:36.7326005Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_1338", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.010495999827980995, "best_triton_pos": 0} 2025-09-07T13:22:36.7479652Z AUTOTUNE addmm(8x132, 8x1584, 1584x132) 2025-09-07T13:22:36.7479977Z strides: [0, 1], [1584, 1], [1, 1584] 2025-09-07T13:22:36.7480323Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:36.7481052Z triton_mm_1338 0.0105 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:36.7482067Z triton_mm_1342 0.0113 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:36.7483069Z triton_mm_1346 0.0130 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:36.7484047Z triton_mm_1337 0.0141 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:36.7485234Z triton_mm_1350 0.0142 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:36.7486322Z triton_mm_1336 0.0149 ms 70.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:36.7487309Z triton_mm_1341 0.0155 ms 67.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:36.7488213Z triton_mm_1335 0.0167 ms 62.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:22:36.7489122Z triton_mm_1345 0.0173 ms 60.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:36.7490021Z triton_mm_1348 0.0174 ms 60.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:36.7490961Z SingleProcess AUTOTUNE benchmarking takes 0.2659 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:22:37.2908548Z Autotune Choices Stats: 2025-09-07T13:22:37.2909696Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_955", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.007327999919652939, "best_triton_pos": 0} 2025-09-07T13:22:37.3060357Z AUTOTUNE addmm(8x80, 8x480, 480x80) 2025-09-07T13:22:37.3060687Z strides: [0, 1], [480, 1], [1, 480] 2025-09-07T13:22:37.3061026Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:37.3061897Z triton_mm_955 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:37.3062945Z triton_mm_959 0.0076 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:37.3063584Z bias_addmm 0.0080 ms 91.2% 2025-09-07T13:22:37.3064191Z triton_mm_954 0.0082 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:37.3065493Z triton_mm_963 0.0084 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:37.3066769Z triton_mm_953 0.0084 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:37.3067778Z triton_mm_952 0.0086 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:22:37.3068766Z triton_mm_967 0.0089 ms 82.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:37.3069730Z triton_mm_958 0.0090 ms 81.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:37.3070684Z triton_mm_965 0.0091 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:37.3071524Z SingleProcess AUTOTUNE benchmarking takes 0.2631 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:22:37.8413774Z Autotune Choices Stats: 2025-09-07T13:22:37.8415471Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "bias_addmm", "best_time": 0.008671999908983707, "best_triton_pos": 1, "best_triton_time": 0.009056000038981438, "best_triton_kernel": "triton_mm_1266", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2"} 2025-09-07T13:22:37.8564603Z AUTOTUNE addmm(8x80, 8x960, 960x80) 2025-09-07T13:22:37.8564905Z strides: [0, 1], [960, 1], [1, 960] 2025-09-07T13:22:37.8565370Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:37.8565700Z bias_addmm 0.0087 ms 100.0% 2025-09-07T13:22:37.8566336Z triton_mm_1266 0.0091 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:37.8567346Z triton_mm_1270 0.0092 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:37.8568594Z triton_mm_1274 0.0099 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:37.8569744Z triton_mm_1278 0.0106 ms 81.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:37.8570854Z triton_mm_1265 0.0108 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:37.8571805Z triton_mm_1269 0.0109 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:37.8572419Z addmm 0.0110 ms 78.8% 2025-09-07T13:22:37.8572999Z triton_mm_1264 0.0111 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:37.8573973Z triton_mm_1273 0.0118 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:37.8574845Z SingleProcess AUTOTUNE benchmarking takes 0.2563 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:22:38.0723370Z Autotune Choices Stats: 2025-09-07T13:22:38.0724367Z {"num_choices": 15, "num_triton_choices": 13, "best_kernel": "triton_mm_868", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.007615999784320593, "best_triton_pos": 0} 2025-09-07T13:22:38.0868413Z AUTOTUNE addmm(8x52, 8x624, 624x52) 2025-09-07T13:22:38.0868724Z strides: [0, 1], [624, 1], [1, 624] 2025-09-07T13:22:38.0869038Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:38.0869730Z triton_mm_868 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:38.0870717Z triton_mm_872 0.0079 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:38.0871681Z triton_mm_876 0.0082 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:38.0872638Z triton_mm_875 0.0082 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:38.0873592Z triton_mm_867 0.0090 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:38.0874550Z triton_mm_866 0.0092 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:38.0875668Z triton_mm_871 0.0095 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:38.0876609Z triton_mm_874 0.0101 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:38.0877566Z triton_mm_865 0.0101 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:22:38.0878524Z triton_mm_873 0.0122 ms 62.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:38.0879499Z SingleProcess AUTOTUNE benchmarking takes 0.2098 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T13:22:38.5168463Z Autotune Choices Stats: 2025-09-07T13:22:38.5169621Z {"num_choices": 13, "num_triton_choices": 11, "best_kernel": "triton_mm_251", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006976000033318996, "best_triton_pos": 0} 2025-09-07T13:22:38.5316320Z AUTOTUNE addmm(8x28, 8x336, 336x28) 2025-09-07T13:22:38.5316602Z strides: [0, 1], [336, 1], [1, 336] 2025-09-07T13:22:38.5316988Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:38.5317841Z triton_mm_251 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:38.5318846Z triton_mm_258 0.0071 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:38.5319808Z triton_mm_257 0.0071 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T13:22:38.5320765Z triton_mm_250 0.0075 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:38.5321729Z triton_mm_254 0.0077 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T13:22:38.5322820Z triton_mm_256 0.0078 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T13:22:38.5323788Z triton_mm_249 0.0083 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:22:38.5324754Z triton_mm_255 0.0086 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T13:22:38.5325886Z bias_addmm 0.0113 ms 61.9% 2025-09-07T13:22:38.5326518Z triton_mm_253 0.0116 ms 59.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:22:38.5327428Z SingleProcess AUTOTUNE benchmarking takes 0.1895 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T13:22:39.0557344Z Autotune Choices Stats: 2025-09-07T13:22:39.0558540Z {"num_choices": 13, "num_triton_choices": 11, "best_kernel": "triton_mm_591", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.008832000195980072, "best_triton_pos": 0} 2025-09-07T13:22:39.0817225Z AUTOTUNE addmm(8x26, 8x624, 624x26) 2025-09-07T13:22:39.0817562Z strides: [0, 1], [624, 1], [1, 624] 2025-09-07T13:22:39.0817882Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:39.0818623Z triton_mm_591 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:39.0819638Z triton_mm_590 0.0098 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:22:39.0820296Z bias_addmm 0.0124 ms 71.3% 2025-09-07T13:22:39.0820545Z addmm 0.0152 ms 58.0% 2025-09-07T13:22:39.0821366Z triton_mm_589 0.0244 ms 36.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:22:39.0822627Z triton_mm_598 0.0280 ms 31.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T13:22:39.0823716Z triton_mm_594 0.0311 ms 28.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:22:39.0824719Z triton_mm_599 0.0316 ms 28.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:39.0826228Z triton_mm_595 0.0321 ms 27.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T13:22:39.0827382Z triton_mm_592 0.0321 ms 27.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:39.0828248Z SingleProcess AUTOTUNE benchmarking takes 0.2602 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T13:22:39.5794199Z Autotune Choices Stats: 2025-09-07T13:22:39.5795610Z {"num_choices": 13, "num_triton_choices": 11, "best_kernel": "triton_mm_182", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006560000125318766, "best_triton_pos": 0} 2025-09-07T13:22:39.5950013Z AUTOTUNE addmm(8x20, 8x240, 240x20) 2025-09-07T13:22:39.5950297Z strides: [0, 1], [240, 1], [1, 240] 2025-09-07T13:22:39.5950918Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:39.5951632Z triton_mm_182 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:39.5952636Z triton_mm_181 0.0068 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T13:22:39.5953605Z triton_mm_174 0.0070 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:39.5954561Z triton_mm_175 0.0070 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:39.5955703Z triton_mm_178 0.0072 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T13:22:39.5956678Z triton_mm_173 0.0073 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:22:39.5957645Z triton_mm_180 0.0073 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T13:22:39.5958567Z triton_mm_179 0.0083 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T13:22:39.5959130Z bias_addmm 0.0095 ms 69.0% 2025-09-07T13:22:39.5959673Z triton_mm_177 0.0097 ms 67.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:22:39.5960469Z SingleProcess AUTOTUNE benchmarking takes 0.1930 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T13:22:40.0361475Z Autotune Choices Stats: 2025-09-07T13:22:40.0362930Z {"num_choices": 13, "num_triton_choices": 11, "best_kernel": "triton_mm_512", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1", "best_time": 0.006943999789655209, "best_triton_pos": 0} 2025-09-07T13:22:40.0514201Z AUTOTUNE addmm(8x14, 8x336, 336x14) 2025-09-07T13:22:40.0514490Z strides: [0, 1], [336, 1], [1, 336] 2025-09-07T13:22:40.0514791Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:40.0515786Z triton_mm_512 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1 2025-09-07T13:22:40.0516786Z triton_mm_518 0.0070 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=1 2025-09-07T13:22:40.0517780Z triton_mm_519 0.0072 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1 2025-09-07T13:22:40.0518707Z triton_mm_511 0.0076 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1 2025-09-07T13:22:40.0519591Z triton_mm_515 0.0077 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=1 2025-09-07T13:22:40.0520462Z triton_mm_517 0.0077 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=1 2025-09-07T13:22:40.0521516Z triton_mm_510 0.0080 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1 2025-09-07T13:22:40.0522425Z triton_mm_516 0.0086 ms 81.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=1 2025-09-07T13:22:40.0523316Z triton_mm_514 0.0107 ms 64.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1 2025-09-07T13:22:40.0523901Z bias_addmm 0.0114 ms 61.0% 2025-09-07T13:22:40.0524343Z SingleProcess AUTOTUNE benchmarking takes 0.1908 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T13:22:40.8802508Z Autotune Choices Stats: 2025-09-07T13:22:40.8803739Z {"num_choices": 7, "num_triton_choices": 6, "best_kernel": "triton_convolution2d_4", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.023711999878287315, "best_triton_pos": 0} 2025-09-07T13:22:40.8961282Z AUTOTUNE convolution(8x3x224x224, 32x3x3x3) 2025-09-07T13:22:40.8961630Z strides: [150528, 1, 672, 3], [27, 1, 9, 3] 2025-09-07T13:22:40.8961933Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:22:40.8962728Z triton_convolution2d_4 0.0237 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:22:40.8963946Z triton_convolution2d_0 0.0271 ms 87.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:22:40.8964704Z convolution 0.0273 ms 86.9% 2025-09-07T13:22:40.8965639Z triton_convolution2d_2 0.0276 ms 85.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T13:22:40.8967397Z triton_convolution2d_3 0.0289 ms 82.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:22:40.8968743Z triton_convolution2d_5 0.0342 ms 69.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T13:22:40.8969875Z triton_convolution2d_1 0.0419 ms 56.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T13:22:40.8970771Z SingleProcess AUTOTUNE benchmarking takes 0.1104 seconds and 0.0002 seconds precompiling for 7 choices 2025-09-07T13:22:41.1184642Z Autotune Choices Stats: 2025-09-07T13:22:41.1185821Z {"num_choices": 17, "num_triton_choices": 15, "best_kernel": "triton_mm_18", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.01027199998497963, "best_triton_pos": 0} 2025-09-07T13:22:41.1336459Z AUTOTUNE addmm(100352x32, 100352x32, 32x32) 2025-09-07T13:22:41.1336776Z strides: [0, 1], [32, 1], [1, 32] 2025-09-07T13:22:41.1337096Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:41.1337837Z triton_mm_18 0.0103 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:41.1339016Z triton_mm_19 0.0104 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:22:41.1339984Z triton_mm_20 0.0104 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:41.1340944Z triton_mm_8 0.0105 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:41.1341975Z triton_mm_7 0.0106 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:41.1342902Z triton_mm_9 0.0108 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:41.1343842Z triton_mm_15 0.0108 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:41.1344782Z triton_mm_13 0.0109 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:41.1346025Z triton_mm_17 0.0109 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:41.1346969Z triton_mm_10 0.0112 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:41.1347907Z SingleProcess AUTOTUNE benchmarking takes 0.2368 seconds and 0.0003 seconds precompiling for 17 choices 2025-09-07T13:22:41.3338430Z Autotune Choices Stats: 2025-09-07T13:22:41.3339415Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.013248000293970108, "best_triton_pos": 0} 2025-09-07T13:22:41.3495723Z AUTOTUNE mm(100352x16, 16x96) 2025-09-07T13:22:41.3495988Z strides: [16, 1], [1, 16] 2025-09-07T13:22:41.3496255Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:22:41.3497101Z triton_mm_35 0.0132 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:41.3498203Z triton_mm_21 0.0137 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:22:41.3499024Z triton_mm_30 0.0137 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:41.3499853Z triton_mm_28 0.0138 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:41.3500663Z triton_mm_26 0.0140 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:41.3501563Z triton_mm_31 0.0140 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:41.3502084Z mm 0.0141 ms 94.1% 2025-09-07T13:22:41.3502568Z triton_mm_33 0.0141 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:41.3503483Z triton_mm_29 0.0142 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:41.3504314Z triton_mm_25 0.0143 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:41.3505225Z SingleProcess AUTOTUNE benchmarking takes 0.2154 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T13:22:41.5665934Z Autotune Choices Stats: 2025-09-07T13:22:41.5666892Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_63", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.009184000082314014, "best_triton_pos": 0} 2025-09-07T13:22:41.5822307Z AUTOTUNE mm(25088x96, 96x20) 2025-09-07T13:22:41.5822588Z strides: [96, 1], [1, 96] 2025-09-07T13:22:41.5822855Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:22:41.5823515Z triton_mm_63 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:41.5824490Z triton_mm_61 0.0092 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:41.5825943Z triton_mm_62 0.0092 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:41.5826945Z triton_mm_56 0.0092 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:41.5828094Z triton_mm_60 0.0093 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:41.5829124Z triton_mm_65 0.0093 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:41.5830209Z triton_mm_59 0.0094 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:41.5831260Z triton_mm_64 0.0094 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:41.5832308Z triton_mm_57 0.0095 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:41.5833275Z triton_mm_66 0.0095 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:41.5834119Z SingleProcess AUTOTUNE benchmarking takes 0.2276 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T13:22:41.7857624Z Autotune Choices Stats: 2025-09-07T13:22:41.7858641Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_93", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.0074880002066493034, "best_triton_pos": 0} 2025-09-07T13:22:41.8012379Z AUTOTUNE mm(25088x20, 20x60) 2025-09-07T13:22:41.8012641Z strides: [40, 1], [1, 20] 2025-09-07T13:22:41.8012911Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:22:41.8013573Z triton_mm_93 0.0075 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:41.8014764Z triton_mm_95 0.0075 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:41.8016095Z triton_mm_89 0.0075 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:41.8017062Z triton_mm_94 0.0075 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:41.8018070Z triton_mm_96 0.0076 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:41.8018929Z triton_mm_97 0.0076 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:41.8019755Z triton_mm_98 0.0076 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:41.8020584Z triton_mm_92 0.0078 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:41.8021419Z triton_mm_102 0.0078 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:41.8022332Z triton_mm_100 0.0079 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:41.8023062Z SingleProcess AUTOTUNE benchmarking takes 0.2140 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T13:22:42.0000401Z Autotune Choices Stats: 2025-09-07T13:22:42.0001371Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_113", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.0074880002066493034, "best_triton_pos": 0} 2025-09-07T13:22:42.0157188Z AUTOTUNE mm(25088x20, 20x60) 2025-09-07T13:22:42.0157464Z strides: [40, 1], [1, 20] 2025-09-07T13:22:42.0157897Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:22:42.0158862Z triton_mm_113 0.0075 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:42.0159851Z triton_mm_110 0.0075 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:42.0160838Z triton_mm_114 0.0075 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:42.0161796Z triton_mm_111 0.0076 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:42.0162760Z triton_mm_105 0.0076 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:42.0163714Z triton_mm_112 0.0076 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:42.0164661Z triton_mm_109 0.0077 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:42.0166242Z triton_mm_106 0.0078 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:42.0167257Z triton_mm_118 0.0078 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:42.0168269Z triton_mm_107 0.0079 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:42.0168999Z SingleProcess AUTOTUNE benchmarking takes 0.2140 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T13:22:42.2266207Z Autotune Choices Stats: 2025-09-07T13:22:42.2267188Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_126", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007903999648988247, "best_triton_pos": 0} 2025-09-07T13:22:42.2424273Z AUTOTUNE mm(25088x60, 60x20) 2025-09-07T13:22:42.2424555Z strides: [60, 1], [1, 60] 2025-09-07T13:22:42.2424815Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:22:42.2425749Z triton_mm_126 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:42.2426727Z triton_mm_129 0.0081 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:42.2427795Z triton_mm_120 0.0081 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:42.2428885Z triton_mm_127 0.0081 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:42.2429845Z triton_mm_130 0.0081 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:42.2431054Z triton_mm_123 0.0082 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:42.2432096Z triton_mm_128 0.0083 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:42.2433066Z triton_mm_135 0.0083 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:42.2434030Z triton_mm_122 0.0083 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:42.2435111Z triton_mm_125 0.0084 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:42.2435956Z SingleProcess AUTOTUNE benchmarking takes 0.2261 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T13:22:42.4427897Z Autotune Choices Stats: 2025-09-07T13:22:42.4428964Z {"num_choices": 15, "num_triton_choices": 13, "best_kernel": "triton_mm_185", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.005888000130653381, "best_triton_pos": 0} 2025-09-07T13:22:42.4584248Z AUTOTUNE addmm(8x240, 8x20, 20x240) 2025-09-07T13:22:42.4584536Z strides: [0, 1], [20, 1], [1, 20] 2025-09-07T13:22:42.4584856Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:42.4585892Z triton_mm_185 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:42.4586934Z triton_mm_188 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:42.4588022Z triton_mm_189 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:42.4589045Z triton_mm_193 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:42.4590015Z triton_mm_195 0.0060 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:42.4590989Z triton_mm_191 0.0060 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:42.4591960Z triton_mm_194 0.0061 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:22:42.4592931Z triton_mm_184 0.0061 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:22:42.4593892Z triton_mm_183 0.0063 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:22:42.4594844Z triton_mm_186 0.0063 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:42.4595812Z SingleProcess AUTOTUNE benchmarking takes 0.2109 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T13:22:42.7125964Z Autotune Choices Stats: 2025-09-07T13:22:42.7127151Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_207", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.008927999995648861, "best_triton_pos": 0} 2025-09-07T13:22:42.7288335Z AUTOTUNE addmm(6272x56, 6272x240, 240x56) 2025-09-07T13:22:42.7288648Z strides: [0, 1], [240, 1], [1, 240] 2025-09-07T13:22:42.7289000Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:42.7289722Z triton_mm_207 0.0089 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:42.7290708Z triton_mm_204 0.0090 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:42.7291676Z triton_mm_208 0.0090 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:42.7292664Z triton_mm_203 0.0090 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:42.7293630Z triton_mm_206 0.0092 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:42.7294238Z bias_addmm 0.0093 ms 95.9% 2025-09-07T13:22:42.7295179Z triton_mm_197 0.0095 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:42.7296213Z triton_mm_213 0.0095 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:42.7297191Z triton_mm_199 0.0096 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:42.7298185Z triton_mm_198 0.0096 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:42.7298912Z SingleProcess AUTOTUNE benchmarking takes 0.2698 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:22:42.9395790Z Autotune Choices Stats: 2025-09-07T13:22:42.9396759Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_221", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007199999876320362, "best_triton_pos": 0} 2025-09-07T13:22:42.9554146Z AUTOTUNE mm(6272x28, 28x168) 2025-09-07T13:22:42.9554394Z strides: [56, 1], [1, 28] 2025-09-07T13:22:42.9554636Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:22:42.9555419Z triton_mm_221 0.0072 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:42.9556404Z triton_mm_222 0.0073 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:42.9557369Z triton_mm_220 0.0074 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:42.9558333Z triton_mm_223 0.0074 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:42.9559449Z triton_mm_225 0.0074 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:42.9560444Z triton_mm_217 0.0075 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:42.9561431Z triton_mm_218 0.0075 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:42.9562317Z triton_mm_224 0.0075 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:42.9563206Z triton_mm_216 0.0075 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:42.9564109Z triton_mm_226 0.0075 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:42.9564896Z SingleProcess AUTOTUNE benchmarking takes 0.2261 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T13:22:43.1641669Z Autotune Choices Stats: 2025-09-07T13:22:43.1642647Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_233", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.007135999854654074, "best_triton_pos": 0} 2025-09-07T13:22:43.1801345Z AUTOTUNE mm(6272x28, 28x168) 2025-09-07T13:22:43.1801760Z strides: [56, 1], [1, 28] 2025-09-07T13:22:43.1802052Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:22:43.1802714Z triton_mm_233 0.0071 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:43.1803708Z triton_mm_238 0.0073 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:43.1804673Z triton_mm_237 0.0073 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:43.1805941Z triton_mm_239 0.0073 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:43.1806894Z triton_mm_240 0.0074 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:43.1807873Z triton_mm_241 0.0074 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:43.1808897Z triton_mm_244 0.0074 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:43.1809781Z triton_mm_235 0.0075 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:43.1810657Z triton_mm_234 0.0075 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:43.1811537Z triton_mm_242 0.0075 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:43.1812426Z SingleProcess AUTOTUNE benchmarking takes 0.2241 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T13:22:43.3818019Z Autotune Choices Stats: 2025-09-07T13:22:43.3819386Z {"num_choices": 15, "num_triton_choices": 13, "best_kernel": "triton_mm_261", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.005791999865323305, "best_triton_pos": 0} 2025-09-07T13:22:43.3977918Z AUTOTUNE addmm(8x336, 8x28, 28x336) 2025-09-07T13:22:43.3978236Z strides: [0, 1], [28, 1], [1, 28] 2025-09-07T13:22:43.3978566Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:43.3979318Z triton_mm_261 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:43.3980360Z triton_mm_264 0.0059 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:43.3981368Z triton_mm_269 0.0060 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:43.3982435Z triton_mm_270 0.0060 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:22:43.3983464Z triton_mm_265 0.0061 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:43.3984637Z triton_mm_267 0.0061 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:43.3985950Z triton_mm_271 0.0061 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:43.3986960Z triton_mm_262 0.0062 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:43.3987965Z triton_mm_263 0.0062 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:43.3988972Z triton_mm_259 0.0064 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:22:43.3989854Z SingleProcess AUTOTUNE benchmarking takes 0.2170 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T13:22:43.6109142Z Autotune Choices Stats: 2025-09-07T13:22:43.6110124Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_275", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.008063999935984612, "best_triton_pos": 0} 2025-09-07T13:22:43.6264207Z AUTOTUNE mm(6272x168, 168x28) 2025-09-07T13:22:43.6264469Z strides: [168, 1], [1, 168] 2025-09-07T13:22:43.6264771Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:22:43.6265635Z triton_mm_275 0.0081 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:43.6266639Z triton_mm_282 0.0081 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:43.6267619Z triton_mm_288 0.0081 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:43.6269096Z triton_mm_279 0.0081 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:43.6270149Z triton_mm_281 0.0081 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:43.6271102Z triton_mm_287 0.0081 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:43.6272064Z triton_mm_276 0.0083 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:43.6273022Z triton_mm_283 0.0083 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:43.6273982Z triton_mm_285 0.0084 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:43.6275102Z triton_mm_280 0.0085 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:43.6275954Z SingleProcess AUTOTUNE benchmarking takes 0.2281 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T13:22:43.8641504Z Autotune Choices Stats: 2025-09-07T13:22:43.8642781Z {"num_choices": 14, "num_triton_choices": 12, "best_kernel": "triton_mm_521", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2", "best_time": 0.005824000108987093, "best_triton_pos": 0} 2025-09-07T13:22:43.8796264Z AUTOTUNE addmm(8x336, 8x14, 14x336) 2025-09-07T13:22:43.8796530Z strides: [0, 1], [14, 1], [1, 14] 2025-09-07T13:22:43.8796858Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:43.8797572Z triton_mm_521 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:22:43.8798659Z triton_mm_520 0.0059 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:22:43.8799759Z triton_mm_522 0.0059 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:43.8800710Z triton_mm_523 0.0059 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:43.8801666Z triton_mm_524 0.0059 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:43.8802612Z triton_mm_525 0.0059 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:43.8803561Z triton_mm_529 0.0059 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:43.8804517Z triton_mm_527 0.0060 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:43.8805819Z triton_mm_530 0.0060 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:22:43.8807069Z triton_mm_531 0.0061 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:43.8808050Z SingleProcess AUTOTUNE benchmarking takes 0.2016 seconds and 0.0002 seconds precompiling for 14 choices 2025-09-07T13:22:44.1513886Z Autotune Choices Stats: 2025-09-07T13:22:44.1514853Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_536", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007935999892652035, "best_triton_pos": 0} 2025-09-07T13:22:44.1679044Z AUTOTUNE addmm(1568x104, 1568x336, 336x104) 2025-09-07T13:22:44.1679332Z strides: [0, 1], [336, 1], [1, 336] 2025-09-07T13:22:44.1679645Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:44.1680357Z triton_mm_536 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:44.1680990Z bias_addmm 0.0084 ms 94.7% 2025-09-07T13:22:44.1681586Z triton_mm_540 0.0085 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:44.1682570Z triton_mm_535 0.0086 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:44.1683692Z triton_mm_534 0.0086 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:44.1684689Z triton_mm_539 0.0090 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:44.1685810Z triton_mm_533 0.0092 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:44.1686785Z triton_mm_544 0.0092 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:44.1687760Z triton_mm_543 0.0097 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:44.1688726Z triton_mm_542 0.0099 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:44.1689559Z SingleProcess AUTOTUNE benchmarking takes 0.2877 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:22:44.4047947Z Autotune Choices Stats: 2025-09-07T13:22:44.4048935Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_552", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006912000011652708, "best_triton_pos": 0} 2025-09-07T13:22:44.4210680Z AUTOTUNE mm(1568x52, 52x312) 2025-09-07T13:22:44.4210966Z strides: [104, 1], [1, 52] 2025-09-07T13:22:44.4211247Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:22:44.4211899Z triton_mm_552 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:44.4212880Z triton_mm_555 0.0070 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:44.4214144Z triton_mm_558 0.0070 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:44.4215391Z triton_mm_559 0.0070 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:44.4216355Z triton_mm_562 0.0070 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:44.4217343Z triton_mm_553 0.0071 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:44.4218337Z triton_mm_560 0.0071 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:44.4219227Z triton_mm_563 0.0071 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:44.4220060Z triton_mm_557 0.0072 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:44.4220891Z triton_mm_561 0.0073 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:44.4221795Z SingleProcess AUTOTUNE benchmarking takes 0.2526 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:22:44.6582073Z Autotune Choices Stats: 2025-09-07T13:22:44.6583085Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_574", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.00684799998998642, "best_triton_pos": 0} 2025-09-07T13:22:44.6747450Z AUTOTUNE mm(1568x52, 52x312) 2025-09-07T13:22:44.6747711Z strides: [104, 1], [1, 52] 2025-09-07T13:22:44.6748062Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:22:44.6748837Z triton_mm_574 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:44.6749805Z triton_mm_571 0.0069 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:44.6750762Z triton_mm_578 0.0069 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:44.6751747Z triton_mm_581 0.0070 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:44.6752720Z triton_mm_577 0.0070 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:44.6753676Z triton_mm_579 0.0070 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:44.6754632Z triton_mm_582 0.0070 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:44.6755965Z triton_mm_572 0.0071 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:44.6757201Z triton_mm_580 0.0072 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:44.6758248Z triton_mm_576 0.0072 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:44.6759093Z SingleProcess AUTOTUNE benchmarking takes 0.2531 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:22:44.8720312Z Autotune Choices Stats: 2025-09-07T13:22:44.8721292Z {"num_choices": 15, "num_triton_choices": 13, "best_kernel": "triton_mm_605", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-09-07T13:22:44.8881755Z AUTOTUNE addmm(8x624, 8x26, 26x624) 2025-09-07T13:22:44.8882046Z strides: [0, 1], [26, 1], [1, 26] 2025-09-07T13:22:44.8882366Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:44.8883079Z triton_mm_605 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:44.8884053Z triton_mm_601 0.0060 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:22:44.8885414Z triton_mm_603 0.0060 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:44.8886559Z triton_mm_606 0.0060 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:44.8887536Z triton_mm_602 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:44.8888533Z triton_mm_610 0.0061 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:44.8889503Z triton_mm_612 0.0061 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:44.8890393Z triton_mm_611 0.0061 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:22:44.8891275Z triton_mm_600 0.0062 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:22:44.8892164Z triton_mm_608 0.0062 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:44.8892938Z SingleProcess AUTOTUNE benchmarking takes 0.2128 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T13:22:45.1134439Z Autotune Choices Stats: 2025-09-07T13:22:45.1135728Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_617", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007712000049650669, "best_triton_pos": 0} 2025-09-07T13:22:45.1301255Z AUTOTUNE mm(1568x312, 312x52) 2025-09-07T13:22:45.1301609Z strides: [312, 1], [1, 312] 2025-09-07T13:22:45.1301883Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:22:45.1302750Z triton_mm_617 0.0077 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:45.1303866Z triton_mm_616 0.0081 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:45.1304933Z triton_mm_625 0.0082 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:45.1306104Z triton_mm_621 0.0082 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:45.1307085Z triton_mm_615 0.0084 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:45.1308077Z triton_mm_624 0.0084 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:45.1309175Z triton_mm_620 0.0084 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:45.1310137Z triton_mm_623 0.0086 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:45.1311112Z triton_mm_614 0.0089 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:45.1312176Z triton_mm_630 0.0090 ms 86.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:45.1313025Z SingleProcess AUTOTUNE benchmarking takes 0.2415 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:22:45.4277066Z Autotune Choices Stats: 2025-09-07T13:22:45.4278034Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_878", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2", "best_time": 0.006496000103652477, "best_triton_pos": 0} 2025-09-07T13:22:45.4446803Z AUTOTUNE addmm(8x624, 8x52, 52x624) 2025-09-07T13:22:45.4447096Z strides: [0, 1], [52, 1], [1, 52] 2025-09-07T13:22:45.4447431Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:45.4448145Z triton_mm_878 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:22:45.4449147Z triton_mm_880 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:45.4450125Z triton_mm_890 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:45.4451093Z triton_mm_879 0.0066 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:45.4452054Z triton_mm_884 0.0066 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:45.4453022Z triton_mm_883 0.0066 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:45.4454150Z triton_mm_885 0.0066 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:45.4455533Z triton_mm_881 0.0067 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:45.4456582Z triton_mm_891 0.0067 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:45.4457563Z triton_mm_893 0.0067 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:45.4458424Z SingleProcess AUTOTUNE benchmarking takes 0.2623 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:22:45.7137470Z Autotune Choices Stats: 2025-09-07T13:22:45.7138478Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_898", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008799999952316284, "best_triton_pos": 0} 2025-09-07T13:22:45.7304169Z AUTOTUNE addmm(1568x160, 1568x624, 624x160) 2025-09-07T13:22:45.7304517Z strides: [0, 1], [624, 1], [1, 624] 2025-09-07T13:22:45.7304862Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:45.7305758Z triton_mm_898 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:45.7306414Z bias_addmm 0.0091 ms 96.5% 2025-09-07T13:22:45.7307183Z triton_mm_902 0.0093 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:45.7308177Z triton_mm_897 0.0103 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:45.7309294Z triton_mm_901 0.0106 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:45.7310264Z triton_mm_906 0.0108 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:45.7311234Z triton_mm_895 0.0113 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:45.7312200Z triton_mm_905 0.0115 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:45.7313167Z triton_mm_904 0.0120 ms 73.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:45.7313778Z addmm 0.0121 ms 72.9% 2025-09-07T13:22:45.7314221Z SingleProcess AUTOTUNE benchmarking takes 0.2851 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:22:45.9703424Z Autotune Choices Stats: 2025-09-07T13:22:45.9704393Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_920", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007007999811321497, "best_triton_pos": 0} 2025-09-07T13:22:45.9869152Z AUTOTUNE mm(1568x80, 80x240) 2025-09-07T13:22:45.9869444Z strides: [160, 1], [1, 80] 2025-09-07T13:22:45.9869867Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:22:45.9870504Z triton_mm_920 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:45.9871584Z triton_mm_923 0.0071 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:45.9872281Z mm 0.0072 ms 97.3% 2025-09-07T13:22:45.9872850Z triton_mm_927 0.0072 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:45.9873818Z triton_mm_914 0.0072 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:45.9874782Z triton_mm_916 0.0072 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:45.9875893Z triton_mm_924 0.0072 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:45.9876853Z triton_mm_915 0.0073 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:45.9877816Z triton_mm_922 0.0074 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:45.9878922Z triton_mm_925 0.0074 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:45.9879773Z SingleProcess AUTOTUNE benchmarking takes 0.2559 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:22:46.2219701Z Autotune Choices Stats: 2025-09-07T13:22:46.2220694Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_939", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.006976000033318996, "best_triton_pos": 0} 2025-09-07T13:22:46.2385282Z AUTOTUNE mm(1568x80, 80x240) 2025-09-07T13:22:46.2385582Z strides: [160, 1], [1, 80] 2025-09-07T13:22:46.2385849Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:22:46.2386515Z triton_mm_939 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:46.2387506Z triton_mm_942 0.0071 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:46.2388474Z triton_mm_941 0.0071 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:46.2389639Z triton_mm_943 0.0072 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:46.2390597Z triton_mm_946 0.0072 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:46.2391204Z mm 0.0072 ms 96.5% 2025-09-07T13:22:46.2391767Z triton_mm_945 0.0072 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:46.2392724Z triton_mm_935 0.0073 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:46.2393983Z triton_mm_933 0.0073 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:46.2395187Z triton_mm_934 0.0074 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:46.2396038Z SingleProcess AUTOTUNE benchmarking takes 0.2510 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:22:46.5099271Z Autotune Choices Stats: 2025-09-07T13:22:46.5100283Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_982", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.0066559999249875546, "best_triton_pos": 0} 2025-09-07T13:22:46.5266260Z AUTOTUNE addmm(8x480, 8x80, 80x480) 2025-09-07T13:22:46.5266550Z strides: [0, 1], [80, 1], [1, 80] 2025-09-07T13:22:46.5266868Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:46.5267595Z triton_mm_982 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:46.5268604Z triton_mm_981 0.0067 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:46.5269761Z triton_mm_975 0.0068 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:46.5270923Z triton_mm_978 0.0068 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:46.5271914Z triton_mm_977 0.0068 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:46.5272887Z triton_mm_979 0.0070 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:46.5273862Z triton_mm_974 0.0071 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:46.5274821Z triton_mm_984 0.0072 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:46.5276060Z triton_mm_976 0.0072 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:46.5276694Z bias_addmm 0.0073 ms 91.2% 2025-09-07T13:22:46.5277163Z SingleProcess AUTOTUNE benchmarking takes 0.2875 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:22:46.7675881Z Autotune Choices Stats: 2025-09-07T13:22:46.7676840Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_989", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.0072639998979866505, "best_triton_pos": 0} 2025-09-07T13:22:46.7839264Z AUTOTUNE mm(1568x240, 240x80) 2025-09-07T13:22:46.7839595Z strides: [240, 1], [1, 240] 2025-09-07T13:22:46.7839925Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:22:46.7840658Z triton_mm_989 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:46.7841976Z triton_mm_993 0.0075 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:46.7843080Z triton_mm_988 0.0076 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:46.7844063Z triton_mm_992 0.0076 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:46.7844682Z mm 0.0077 ms 94.2% 2025-09-07T13:22:46.7845446Z triton_mm_987 0.0077 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:46.7846421Z triton_mm_986 0.0078 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:46.7847397Z triton_mm_996 0.0080 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:46.7848391Z triton_mm_997 0.0082 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:46.7849469Z triton_mm_995 0.0083 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:46.7850331Z SingleProcess AUTOTUNE benchmarking takes 0.2567 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:22:47.0812809Z Autotune Choices Stats: 2025-09-07T13:22:47.0813814Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_1282", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006591999903321266, "best_triton_pos": 0} 2025-09-07T13:22:47.0973660Z AUTOTUNE addmm(8x960, 8x80, 80x960) 2025-09-07T13:22:47.0973945Z strides: [0, 1], [80, 1], [1, 80] 2025-09-07T13:22:47.0974268Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:47.0975137Z triton_mm_1282 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:47.0976166Z triton_mm_1293 0.0067 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:47.0977137Z triton_mm_1281 0.0067 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:47.0978111Z triton_mm_1292 0.0067 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:47.0979101Z triton_mm_1286 0.0068 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:47.0980037Z triton_mm_1288 0.0070 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:47.0980871Z triton_mm_1289 0.0071 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:47.0981781Z triton_mm_1290 0.0071 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:47.0982845Z triton_mm_1295 0.0072 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:47.0983760Z triton_mm_1285 0.0073 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:47.0984490Z SingleProcess AUTOTUNE benchmarking takes 0.2602 seconds and 0.0003 seconds precompiling for 19 choices 2025-09-07T13:22:47.3688998Z Autotune Choices Stats: 2025-09-07T13:22:47.3690266Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.008991999551653862, "best_triton_pos": 1, "best_triton_time": 0.009119999594986439, "best_triton_kernel": "triton_mm_1300", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T13:22:47.3857935Z AUTOTUNE addmm(392x264, 392x960, 960x264) 2025-09-07T13:22:47.3858238Z strides: [0, 1], [960, 1], [1, 960] 2025-09-07T13:22:47.3858569Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:47.3858911Z bias_addmm 0.0090 ms 100.0% 2025-09-07T13:22:47.3859564Z triton_mm_1300 0.0091 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:47.3860557Z triton_mm_1304 0.0096 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:47.3861766Z triton_mm_1308 0.0111 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:47.3862769Z triton_mm_1299 0.0114 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:47.3863734Z triton_mm_1298 0.0117 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:47.3864690Z triton_mm_1303 0.0119 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:47.3865648Z addmm 0.0126 ms 71.3% 2025-09-07T13:22:47.3866248Z triton_mm_1307 0.0126 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:47.3867217Z triton_mm_1314 0.0127 ms 70.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:47.3868077Z SingleProcess AUTOTUNE benchmarking takes 0.2878 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T13:22:47.6282517Z Autotune Choices Stats: 2025-09-07T13:22:47.6283539Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_1354", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006943999789655209, "best_triton_pos": 0} 2025-09-07T13:22:47.6450097Z AUTOTUNE addmm(8x1584, 8x132, 132x1584) 2025-09-07T13:22:47.6450399Z strides: [0, 1], [132, 1], [1, 132] 2025-09-07T13:22:47.6450723Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:47.6451454Z triton_mm_1354 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:47.6452803Z triton_mm_1353 0.0071 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:47.6453908Z triton_mm_1352 0.0073 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:22:47.6454899Z triton_mm_1355 0.0073 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:47.6456063Z triton_mm_1358 0.0073 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:47.6457040Z triton_mm_1359 0.0073 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:47.6458045Z triton_mm_1364 0.0074 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:47.6459032Z triton_mm_1365 0.0074 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:47.6460011Z triton_mm_1361 0.0077 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:47.6460975Z triton_mm_1362 0.0077 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:47.6461785Z SingleProcess AUTOTUNE benchmarking takes 0.2575 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:22:47.8850842Z Autotune Choices Stats: 2025-09-07T13:22:47.8851832Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1372", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009312000125646591, "best_triton_pos": 0} 2025-09-07T13:22:47.9018794Z AUTOTUNE mm(392x792, 792x132) 2025-09-07T13:22:47.9019064Z strides: [792, 1], [1, 792] 2025-09-07T13:22:47.9019336Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:22:47.9020033Z triton_mm_1372 0.0093 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:47.9021037Z triton_mm_1376 0.0100 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:47.9022087Z triton_mm_1371 0.0104 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:47.9023063Z triton_mm_1370 0.0107 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:47.9024029Z triton_mm_1375 0.0109 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:47.9025458Z triton_mm_1369 0.0116 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:22:47.9026483Z triton_mm_1380 0.0118 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:47.9027779Z triton_mm_1379 0.0121 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:47.9028854Z triton_mm_1378 0.0125 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:47.9029983Z triton_mm_1382 0.0131 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:47.9030833Z SingleProcess AUTOTUNE benchmarking takes 0.2564 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:22:48.2032637Z Autotune Choices Stats: 2025-09-07T13:22:48.2033888Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.009279999881982803, "best_triton_pos": 1, "best_triton_time": 0.009664000011980534, "best_triton_kernel": "triton_mm_1599", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T13:22:48.2203522Z AUTOTUNE addmm(392x1536, 392x264, 264x1536) 2025-09-07T13:22:48.2203844Z strides: [0, 1], [264, 1], [1, 264] 2025-09-07T13:22:48.2204159Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:48.2204525Z bias_addmm 0.0093 ms 100.0% 2025-09-07T13:22:48.2205315Z triton_mm_1599 0.0097 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:48.2206485Z triton_mm_1602 0.0099 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:48.2207475Z triton_mm_1595 0.0100 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:22:48.2208443Z triton_mm_1598 0.0100 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:22:48.2209467Z triton_mm_1600 0.0102 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:48.2210458Z triton_mm_1597 0.0107 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:48.2211356Z triton_mm_1601 0.0110 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:48.2212264Z triton_mm_1605 0.0110 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:48.2213191Z triton_mm_1606 0.0111 ms 83.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:48.2213982Z SingleProcess AUTOTUNE benchmarking takes 0.2830 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:22:48.4665223Z Autotune Choices Stats: 2025-09-07T13:22:48.4666211Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_1611", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.009664000011980534, "best_triton_pos": 0} 2025-09-07T13:22:48.4872098Z AUTOTUNE addmm(8x1000, 8x1536, 1536x1000) 2025-09-07T13:22:48.4872380Z strides: [0, 1], [1536, 1], [1, 1536] 2025-09-07T13:22:48.4872696Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:22:48.4873534Z triton_mm_1611 0.0097 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:48.4874293Z bias_addmm 0.0101 ms 95.6% 2025-09-07T13:22:48.4874889Z triton_mm_1615 0.0102 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:48.4876024Z triton_mm_1619 0.0120 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:22:48.4876992Z triton_mm_1623 0.0132 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:22:48.4877603Z addmm 0.0137 ms 70.4% 2025-09-07T13:22:48.4878186Z triton_mm_1610 0.0145 ms 66.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:22:48.4879134Z triton_mm_1609 0.0152 ms 63.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:22:48.4880091Z triton_mm_1614 0.0154 ms 62.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:22:48.4881121Z triton_mm_1608 0.0157 ms 61.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:22:48.4881911Z SingleProcess AUTOTUNE benchmarking takes 0.2658 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:22:56.2444852Z pass 2025-09-07T13:23:00.4444417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:23:00.4445904Z import pynvml # type: ignore[import] 2025-09-07T13:23:03.4622756Z 2025-09-07T13:23:04.4513375Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:23:04.4513952Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:23:04.4580920Z cuda eval mnasnet_100 2025-09-07T13:23:21.0576725Z Autotune Choices Stats: 2025-09-07T13:23:21.0579300Z {"num_choices": 17, "num_triton_choices": 15, "best_kernel": "triton_mm_23", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.01071999967098236, "best_triton_pos": 0} 2025-09-07T13:23:21.0752022Z AUTOTUNE addmm(100352x48, 100352x16, 16x48) 2025-09-07T13:23:21.0753645Z strides: [0, 1], [16, 1], [1, 16] 2025-09-07T13:23:21.0754133Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:21.0755186Z triton_mm_23 0.0107 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:21.0756365Z triton_mm_28 0.0109 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:21.0757507Z triton_mm_25 0.0110 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:21.0758635Z triton_mm_26 0.0110 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:21.0760295Z triton_mm_30 0.0114 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:21.0761556Z triton_mm_31 0.0114 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:23:21.0762699Z triton_mm_32 0.0115 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:21.0763837Z triton_mm_29 0.0117 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:21.0765139Z triton_mm_27 0.0119 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:21.0766277Z triton_mm_18 0.0122 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:23:21.0767279Z SingleProcess AUTOTUNE benchmarking takes 0.2755 seconds and 0.0004 seconds precompiling for 17 choices 2025-09-07T13:23:21.6281992Z Autotune Choices Stats: 2025-09-07T13:23:21.6283593Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_63", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.00848000030964613, "best_triton_pos": 0} 2025-09-07T13:23:21.6451937Z AUTOTUNE addmm(25088x72, 25088x24, 24x72) 2025-09-07T13:23:21.6452271Z strides: [0, 1], [24, 1], [1, 24] 2025-09-07T13:23:21.6452618Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:21.6453374Z triton_mm_63 0.0085 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:21.6454395Z triton_mm_54 0.0085 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:21.6455607Z triton_mm_60 0.0086 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:21.6456595Z triton_mm_58 0.0087 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:21.6457555Z triton_mm_61 0.0087 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:21.6458514Z triton_mm_59 0.0087 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:21.6459524Z triton_mm_62 0.0087 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:21.6460405Z triton_mm_56 0.0088 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:21.6461295Z triton_mm_66 0.0088 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:21.6462297Z triton_mm_65 0.0088 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:21.6463458Z SingleProcess AUTOTUNE benchmarking takes 0.2719 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:23:22.2252203Z Autotune Choices Stats: 2025-09-07T13:23:22.2253239Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_234", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.0081599997356534, "best_triton_pos": 0} 2025-09-07T13:23:22.2421559Z AUTOTUNE addmm(6272x240, 6272x40, 40x240) 2025-09-07T13:23:22.2421873Z strides: [0, 1], [40, 1], [1, 40] 2025-09-07T13:23:22.2422187Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:22.2422958Z triton_mm_234 0.0082 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:23:22.2423970Z triton_mm_235 0.0083 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:22.2425324Z triton_mm_239 0.0083 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:22.2426337Z triton_mm_245 0.0083 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:22.2427537Z triton_mm_238 0.0086 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:22.2428522Z triton_mm_244 0.0087 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:22.2429705Z triton_mm_231 0.0087 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:22.2430325Z bias_addmm 0.0091 ms 89.8% 2025-09-07T13:23:22.2430923Z triton_mm_243 0.0092 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:22.2431900Z triton_mm_242 0.0093 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:23:22.2432749Z SingleProcess AUTOTUNE benchmarking takes 0.2947 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:23:22.7819245Z Autotune Choices Stats: 2025-09-07T13:23:22.7820302Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_387", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007807999849319458, "best_triton_pos": 0} 2025-09-07T13:23:22.7979305Z AUTOTUNE addmm(1568x576, 1568x96, 96x576) 2025-09-07T13:23:22.7979615Z strides: [0, 1], [96, 1], [1, 96] 2025-09-07T13:23:22.7979943Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:22.7980676Z triton_mm_387 0.0078 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:22.7981762Z triton_mm_390 0.0079 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:22.7982744Z triton_mm_393 0.0079 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:22.7984095Z triton_mm_389 0.0079 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:22.7985489Z triton_mm_392 0.0080 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:22.7986459Z triton_mm_386 0.0081 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:23:22.7987432Z triton_mm_391 0.0081 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:22.7988399Z triton_mm_388 0.0081 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:22.7989482Z triton_mm_385 0.0083 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:22.7990161Z bias_addmm 0.0084 ms 93.5% 2025-09-07T13:23:22.7990628Z SingleProcess AUTOTUNE benchmarking takes 0.2886 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:23:23.3590325Z Autotune Choices Stats: 2025-09-07T13:23:23.3591616Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_165", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.0072639998979866505, "best_triton_pos": 0} 2025-09-07T13:23:23.3753076Z AUTOTUNE addmm(6272x120, 6272x40, 40x120) 2025-09-07T13:23:23.3753378Z strides: [0, 1], [40, 1], [1, 40] 2025-09-07T13:23:23.3753682Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:23.3754397Z triton_mm_165 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:23.3755725Z triton_mm_160 0.0073 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:23:23.3756690Z triton_mm_161 0.0074 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:23.3757657Z triton_mm_164 0.0076 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:23.3758624Z triton_mm_157 0.0077 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:23.3759596Z triton_mm_166 0.0080 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:23.3760562Z triton_mm_154 0.0080 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:23.3761453Z triton_mm_153 0.0081 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:23:23.3762353Z triton_mm_171 0.0082 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:23.3763449Z triton_mm_159 0.0084 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:23.3764361Z SingleProcess AUTOTUNE benchmarking takes 0.2898 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:23:23.9437084Z Autotune Choices Stats: 2025-09-07T13:23:24.2024042Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_276", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.007648000027984381, "best_triton_pos": 0} 2025-09-07T13:23:24.2025980Z AUTOTUNE addmm(1568x480, 1568x80, 80x480) 2025-09-07T13:23:24.2026403Z strides: [0, 1], [80, 1], [1, 80] 2025-09-07T13:23:24.2026896Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:24.2027936Z triton_mm_276 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:24.2029456Z triton_mm_273 0.0077 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:24.2030926Z triton_mm_275 0.0077 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:24.2032380Z triton_mm_274 0.0080 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:24.2034238Z triton_mm_278 0.0080 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:24.2035973Z triton_mm_277 0.0081 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:24.2037476Z triton_mm_279 0.0081 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:24.2038947Z triton_mm_272 0.0081 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:23:24.2040404Z triton_mm_271 0.0081 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:24.2041852Z triton_mm_266 0.0084 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:24.2043121Z SingleProcess AUTOTUNE benchmarking takes 0.5395 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:23:24.6064713Z Autotune Choices Stats: 2025-09-07T13:23:24.6066368Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_76", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.009344000369310379, "best_triton_pos": 0} 2025-09-07T13:23:24.6232845Z AUTOTUNE addmm(25088x24, 25088x72, 72x24) 2025-09-07T13:23:24.6233149Z strides: [0, 1], [72, 1], [1, 72] 2025-09-07T13:23:24.6233462Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:24.6234164Z triton_mm_76 0.0093 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:24.6235328Z triton_mm_78 0.0095 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:24.6236908Z triton_mm_80 0.0096 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:24.6237977Z triton_mm_70 0.0096 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:24.6238928Z triton_mm_71 0.0096 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:24.6239881Z triton_mm_68 0.0099 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:24.6240833Z triton_mm_72 0.0100 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:24.6241778Z triton_mm_81 0.0100 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:23:24.6242393Z bias_addmm 0.0101 ms 92.7% 2025-09-07T13:23:24.6242983Z triton_mm_67 0.0101 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:23:24.6243827Z SingleProcess AUTOTUNE benchmarking takes 0.2688 seconds and 0.0003 seconds precompiling for 19 choices 2025-09-07T13:23:25.2544353Z Autotune Choices Stats: 2025-09-07T13:23:25.2546088Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_463", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007968000136315823, "best_triton_pos": 0} 2025-09-07T13:23:25.2718501Z AUTOTUNE addmm(392x1152, 392x192, 192x1152) 2025-09-07T13:23:25.2718790Z strides: [0, 1], [192, 1], [1, 192] 2025-09-07T13:23:25.2719114Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:25.2719830Z triton_mm_463 0.0080 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:25.2721092Z triton_mm_462 0.0081 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:23:25.2722054Z triton_mm_458 0.0082 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:25.2722671Z bias_addmm 0.0083 ms 96.1% 2025-09-07T13:23:25.2723263Z triton_mm_469 0.0083 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:25.2724223Z triton_mm_466 0.0085 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:25.2725368Z triton_mm_456 0.0087 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:25.2726345Z triton_mm_465 0.0087 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:25.2727326Z triton_mm_467 0.0087 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:25.2728599Z triton_mm_457 0.0088 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:25.2729565Z SingleProcess AUTOTUNE benchmarking takes 0.2879 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T13:23:25.8593015Z Autotune Choices Stats: 2025-09-07T13:23:25.8594039Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_180", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007872000336647034, "best_triton_pos": 0} 2025-09-07T13:23:25.8757536Z AUTOTUNE addmm(6272x40, 6272x120, 120x40) 2025-09-07T13:23:25.8757829Z strides: [0, 1], [120, 1], [1, 120] 2025-09-07T13:23:25.8758139Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:25.8758847Z triton_mm_180 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:25.8759839Z triton_mm_184 0.0080 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:25.8761030Z triton_mm_173 0.0082 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:25.8762078Z triton_mm_183 0.0085 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:25.8763476Z triton_mm_188 0.0085 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:25.8764142Z bias_addmm 0.0086 ms 91.8% 2025-09-07T13:23:25.8764741Z triton_mm_175 0.0086 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:25.8765909Z triton_mm_179 0.0086 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:23:25.8766872Z triton_mm_189 0.0087 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:25.8767849Z triton_mm_185 0.0088 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:25.8768709Z SingleProcess AUTOTUNE benchmarking takes 0.2796 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:23:26.5129094Z Autotune Choices Stats: 2025-09-07T13:23:26.5130463Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.008927999995648861, "best_triton_pos": 1, "best_triton_time": 0.008927999995648861, "best_triton_kernel": "triton_mm_402", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T13:23:26.5293346Z AUTOTUNE addmm(1568x96, 1568x576, 576x96) 2025-09-07T13:23:26.5293671Z strides: [0, 1], [576, 1], [1, 576] 2025-09-07T13:23:26.5294005Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:26.5294346Z bias_addmm 0.0089 ms 100.0% 2025-09-07T13:23:26.5295356Z triton_mm_402 0.0089 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:26.5296785Z triton_mm_406 0.0094 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:26.5297955Z triton_mm_401 0.0100 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:26.5299060Z triton_mm_400 0.0102 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:26.5300042Z triton_mm_405 0.0102 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:23:26.5301059Z triton_mm_410 0.0103 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:26.5302055Z triton_mm_409 0.0108 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:26.5302955Z triton_mm_399 0.0110 ms 80.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:26.5303855Z triton_mm_416 0.0115 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:26.5304648Z SingleProcess AUTOTUNE benchmarking takes 0.2900 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:23:27.0520780Z Autotune Choices Stats: 2025-09-07T13:23:27.0522477Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_288", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008576000109314919, "best_triton_pos": 0} 2025-09-07T13:23:27.0685818Z AUTOTUNE addmm(1568x80, 1568x480, 480x80) 2025-09-07T13:23:27.0686145Z strides: [0, 1], [480, 1], [1, 480] 2025-09-07T13:23:27.0686496Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:27.0687213Z triton_mm_288 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:27.0688203Z triton_mm_292 0.0090 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:27.0688842Z bias_addmm 0.0093 ms 92.1% 2025-09-07T13:23:27.0689435Z triton_mm_287 0.0095 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:27.0690426Z triton_mm_286 0.0097 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:27.0691443Z triton_mm_291 0.0100 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:23:27.0692338Z triton_mm_296 0.0100 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:27.0693229Z triton_mm_285 0.0102 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:27.0694124Z triton_mm_295 0.0106 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:27.0695641Z triton_mm_294 0.0109 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:27.0696542Z SingleProcess AUTOTUNE benchmarking takes 0.2875 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:23:27.7063933Z Autotune Choices Stats: 2025-09-07T13:23:27.7065670Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.00940799992531538, "best_triton_pos": 1, "best_triton_time": 0.00940799992531538, "best_triton_kernel": "triton_mm_478", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T13:23:27.7236171Z AUTOTUNE addmm(392x192, 392x1152, 1152x192) 2025-09-07T13:23:27.7236469Z strides: [0, 1], [1152, 1], [1, 1152] 2025-09-07T13:23:27.7236777Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:27.7239414Z bias_addmm 0.0094 ms 100.0% 2025-09-07T13:23:27.7240070Z triton_mm_478 0.0094 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:27.7241059Z triton_mm_482 0.0100 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:27.7242020Z triton_mm_486 0.0112 ms 84.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:27.7242629Z addmm 0.0124 ms 76.2% 2025-09-07T13:23:27.7243448Z triton_mm_477 0.0126 ms 74.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:27.7244438Z triton_mm_481 0.0127 ms 73.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:23:27.7245559Z triton_mm_476 0.0131 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:27.7246515Z triton_mm_492 0.0136 ms 69.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:27.7247475Z triton_mm_475 0.0137 ms 68.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:27.7248312Z SingleProcess AUTOTUNE benchmarking takes 0.2925 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T13:23:28.6749885Z Autotune Choices Stats: 2025-09-07T13:23:28.6751130Z {"num_choices": 14, "num_triton_choices": 12, "best_kernel": "triton_mm_11", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.009184000082314014, "best_triton_pos": 0} 2025-09-07T13:23:28.6916499Z AUTOTUNE addmm(100352x16, 100352x32, 32x16) 2025-09-07T13:23:28.6916797Z strides: [0, 1], [32, 1], [1, 32] 2025-09-07T13:23:28.6917106Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:28.6917795Z triton_mm_11 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:28.6918768Z triton_mm_9 0.0094 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:28.6920018Z triton_mm_12 0.0094 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:28.6921129Z triton_mm_13 0.0095 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:28.6922176Z triton_mm_16 0.0096 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:23:28.6923085Z triton_mm_17 0.0096 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:28.6923982Z triton_mm_15 0.0096 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:28.6924864Z triton_mm_8 0.0096 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:23:28.6926087Z triton_mm_7 0.0098 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:23:28.6926974Z triton_mm_14 0.0098 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:28.6927751Z SingleProcess AUTOTUNE benchmarking takes 0.2070 seconds and 0.0002 seconds precompiling for 14 choices 2025-09-07T13:23:28.9348426Z Autotune Choices Stats: 2025-09-07T13:23:28.9350332Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_40", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.00848000030964613, "best_triton_pos": 0} 2025-09-07T13:23:28.9513789Z AUTOTUNE addmm(25088x24, 25088x48, 48x24) 2025-09-07T13:23:28.9514071Z strides: [0, 1], [48, 1], [1, 48] 2025-09-07T13:23:28.9514384Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:28.9515463Z triton_mm_40 0.0085 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:23:28.9516457Z triton_mm_43 0.0085 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:28.9517429Z triton_mm_44 0.0085 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:28.9518378Z triton_mm_37 0.0085 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:28.9519332Z triton_mm_36 0.0087 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:28.9520293Z triton_mm_42 0.0087 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:28.9521269Z triton_mm_48 0.0087 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:28.9522230Z triton_mm_33 0.0088 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:23:28.9523373Z triton_mm_45 0.0088 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:28.9524423Z triton_mm_47 0.0089 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:23:28.9525473Z SingleProcess AUTOTUNE benchmarking takes 0.2591 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:23:29.2063460Z Autotune Choices Stats: 2025-09-07T13:23:29.2064497Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_143", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007648000027984381, "best_triton_pos": 0} 2025-09-07T13:23:29.2231925Z AUTOTUNE addmm(6272x40, 6272x72, 72x40) 2025-09-07T13:23:29.2232203Z strides: [0, 1], [72, 1], [1, 72] 2025-09-07T13:23:29.2232516Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:29.2233232Z triton_mm_143 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:29.2234225Z triton_mm_147 0.0078 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:29.2235330Z triton_mm_136 0.0079 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:29.2236430Z triton_mm_138 0.0081 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:29.2237059Z bias_addmm 0.0082 ms 93.7% 2025-09-07T13:23:29.2237657Z triton_mm_135 0.0082 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:23:29.2238629Z triton_mm_148 0.0082 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:29.2239601Z triton_mm_144 0.0082 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:29.2240561Z triton_mm_146 0.0083 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:29.2241528Z triton_mm_145 0.0083 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:29.2242344Z SingleProcess AUTOTUNE benchmarking takes 0.2711 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:23:29.4904820Z Autotune Choices Stats: 2025-09-07T13:23:29.4906640Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_250", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007807999849319458, "best_triton_pos": 0} 2025-09-07T13:23:29.5077364Z AUTOTUNE addmm(1568x80, 1568x240, 240x80) 2025-09-07T13:23:29.5077645Z strides: [0, 1], [240, 1], [1, 240] 2025-09-07T13:23:29.5077972Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:29.5078676Z triton_mm_250 0.0078 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:29.5079944Z triton_mm_254 0.0080 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:29.5080995Z triton_mm_248 0.0082 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:29.5082121Z triton_mm_249 0.0082 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:29.5083020Z triton_mm_253 0.0083 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:23:29.5083590Z bias_addmm 0.0083 ms 93.8% 2025-09-07T13:23:29.5084160Z triton_mm_247 0.0084 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:29.5085239Z triton_mm_257 0.0088 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:29.5086140Z triton_mm_256 0.0089 ms 88.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:29.5087044Z triton_mm_258 0.0089 ms 88.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:29.5087834Z SingleProcess AUTOTUNE benchmarking takes 0.2840 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:23:29.7759000Z Autotune Choices Stats: 2025-09-07T13:23:29.7759967Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_364", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.00848000030964613, "best_triton_pos": 0} 2025-09-07T13:23:29.7937875Z AUTOTUNE addmm(1568x96, 1568x480, 480x96) 2025-09-07T13:23:29.7938395Z strides: [0, 1], [480, 1], [1, 480] 2025-09-07T13:23:29.7938933Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:29.7940116Z triton_mm_364 0.0085 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:29.7941165Z bias_addmm 0.0090 ms 94.6% 2025-09-07T13:23:29.7942073Z triton_mm_368 0.0090 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:29.7942896Z triton_mm_363 0.0095 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:29.7943721Z triton_mm_362 0.0098 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:29.7944550Z triton_mm_367 0.0100 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:23:29.7945667Z triton_mm_372 0.0100 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:29.7946505Z triton_mm_361 0.0101 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:29.7947331Z triton_mm_371 0.0107 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:29.7948508Z triton_mm_370 0.0109 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:29.7949310Z SingleProcess AUTOTUNE benchmarking takes 0.2853 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:23:30.0599338Z Autotune Choices Stats: 2025-09-07T13:23:30.0600570Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.008320000022649765, "best_triton_pos": 1, "best_triton_time": 0.008448000065982342, "best_triton_kernel": "triton_mm_440", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T13:23:30.0769212Z AUTOTUNE addmm(392x192, 392x576, 576x192) 2025-09-07T13:23:30.0769670Z strides: [0, 1], [576, 1], [1, 576] 2025-09-07T13:23:30.0770175Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:30.0770697Z bias_addmm 0.0083 ms 100.0% 2025-09-07T13:23:30.0771703Z triton_mm_440 0.0084 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:30.0773262Z triton_mm_444 0.0089 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:30.0774788Z triton_mm_439 0.0092 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:30.0776976Z triton_mm_443 0.0097 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:23:30.0778540Z triton_mm_438 0.0098 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:30.0780086Z triton_mm_448 0.0100 ms 83.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:30.0781864Z triton_mm_437 0.0101 ms 82.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:30.0782859Z triton_mm_447 0.0104 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:30.0783748Z triton_mm_446 0.0111 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:30.0784523Z SingleProcess AUTOTUNE benchmarking takes 0.2825 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:23:30.3496016Z Autotune Choices Stats: 2025-09-07T13:23:30.3497575Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_592", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009472000412642956, "best_triton_pos": 0} 2025-09-07T13:23:30.3671327Z AUTOTUNE addmm(392x320, 392x1152, 1152x320) 2025-09-07T13:23:30.3671681Z strides: [0, 1], [1152, 1], [1, 1152] 2025-09-07T13:23:30.3672065Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:30.3672765Z triton_mm_592 0.0095 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:30.3673525Z bias_addmm 0.0097 ms 97.4% 2025-09-07T13:23:30.3674191Z triton_mm_596 0.0102 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:23:30.3675368Z triton_mm_600 0.0114 ms 83.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:30.3676327Z triton_mm_591 0.0128 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:30.3677282Z triton_mm_595 0.0129 ms 73.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:23:30.3677896Z addmm 0.0130 ms 72.7% 2025-09-07T13:23:30.3678465Z triton_mm_590 0.0133 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:30.3679427Z triton_mm_606 0.0136 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:30.3680391Z triton_mm_589 0.0137 ms 69.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:30.3681223Z SingleProcess AUTOTUNE benchmarking takes 0.2893 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T13:23:30.6367909Z Autotune Choices Stats: 2025-09-07T13:23:30.6370084Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.008767999708652496, "best_triton_pos": 1, "best_triton_time": 0.009151999838650227, "best_triton_kernel": "triton_mm_614", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8"} 2025-09-07T13:23:30.6541840Z AUTOTUNE addmm(392x1280, 392x320, 320x1280) 2025-09-07T13:23:30.6542181Z strides: [0, 1], [320, 1], [1, 320] 2025-09-07T13:23:30.6542538Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T13:23:30.6542884Z bias_addmm 0.0088 ms 100.0% 2025-09-07T13:23:30.6543510Z triton_mm_614 0.0092 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:23:30.6544489Z triton_mm_618 0.0094 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:30.6545624Z triton_mm_619 0.0096 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:23:30.6546603Z triton_mm_621 0.0099 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:30.6547612Z triton_mm_617 0.0099 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:23:30.6548575Z triton_mm_609 0.0101 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:30.6549539Z triton_mm_608 0.0103 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:23:30.6550528Z triton_mm_625 0.0103 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:23:30.6551826Z triton_mm_624 0.0104 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:23:30.6552736Z SingleProcess AUTOTUNE benchmarking takes 0.2865 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T13:23:32.9684718Z pass 2025-09-07T13:23:35.6560179Z accuracy pass_rate=100.00% 2025-09-07T13:23:35.6564543Z calls_captured gmean=367.65x mean=508.875x 2025-09-07T13:23:35.6568393Z unique_graphs gmean=1.09x mean=1.125x 2025-09-07T13:23:35.6571967Z graph_breaks gmean=0.00x mean=0.000x 2025-09-07T13:23:35.6575523Z unique_graph_breaks gmean=0.00x mean=0.000x 2025-09-07T13:23:35.6578940Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T13:23:35.6582323Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T13:23:35.6585941Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T13:23:35.6587056Z compilation_latency mean=43.259 seconds 2025-09-07T13:23:36.9408527Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *aotinductor-true* ]] 2025-09-07T13:23:36.9409948Z + [[ inference == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T13:23:36.9410239Z + [[ accuracy == \a\c\c\u\r\a\c\y ]] 2025-09-07T13:23:36.9411615Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --inference --bfloat16 --export --disable-cudagraphs --device cuda --total-partitions 7 --partition-id 3 --output /var/lib/jenkins/workspace/test/test-reports/inductor_export_timm_models_bfloat16_inference_cuda_h100_accuracy.csv 2025-09-07T13:23:37.9234651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:23:37.9236406Z import pynvml # type: ignore[import] 2025-09-07T13:23:42.6144796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:23:42.6146333Z import pynvml # type: ignore[import] 2025-09-07T13:23:45.7102639Z 2025-09-07T13:23:47.4701961Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:23:47.4702340Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:23:47.5071108Z cuda eval hrnet_w18 2025-09-07T13:24:02.3931435Z pass 2025-09-07T13:24:04.9381566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:24:04.9382877Z import pynvml # type: ignore[import] 2025-09-07T13:24:07.9713513Z 2025-09-07T13:24:09.2401342Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:24:09.2401665Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:24:09.2516982Z cuda eval inception_v3 2025-09-07T13:24:15.9628047Z pass 2025-09-07T13:24:18.3297000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:24:18.3298259Z import pynvml # type: ignore[import] 2025-09-07T13:24:21.3790011Z 2025-09-07T13:24:25.9556805Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:24:25.9557190Z loading model: 0it [00:04, ?it/s] 2025-09-07T13:24:25.9645669Z cuda eval jx_nest_base 2025-09-07T13:24:32.1758614Z pass 2025-09-07T13:24:34.5319932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:24:34.5321375Z import pynvml # type: ignore[import] 2025-09-07T13:24:37.5368686Z 2025-09-07T13:24:38.2847572Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:24:38.2848086Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:24:38.2886773Z cuda eval lcnet_050 2025-09-07T13:24:41.5425696Z pass 2025-09-07T13:24:43.9638615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:24:43.9640364Z import pynvml # type: ignore[import] 2025-09-07T13:24:47.0397337Z 2025-09-07T13:24:48.2244761Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:24:48.2245680Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:24:48.2349135Z cuda eval levit_128 2025-09-07T13:24:53.9456867Z pass 2025-09-07T13:24:56.2351718Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:24:56.2352927Z import pynvml # type: ignore[import] 2025-09-07T13:24:59.2223363Z 2025-09-07T13:25:01.0039611Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:25:01.0040516Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:25:01.0083393Z cuda eval mixer_b16_224 2025-09-07T13:25:04.2777916Z pass 2025-09-07T13:25:06.4941848Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:25:06.4943789Z import pynvml # type: ignore[import] 2025-09-07T13:25:09.4880393Z 2025-09-07T13:25:10.4286274Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:25:10.4286812Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:25:10.4397824Z cuda eval mixnet_l 2025-09-07T13:25:16.4646652Z pass 2025-09-07T13:25:18.8370172Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:25:18.8371417Z import pynvml # type: ignore[import] 2025-09-07T13:25:21.8420969Z 2025-09-07T13:25:22.8416220Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:25:22.8416594Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:25:22.8475487Z cuda eval mnasnet_100 2025-09-07T13:25:26.9051845Z pass 2025-09-07T13:25:28.1437783Z accuracy pass_rate=100.00% 2025-09-07T13:25:28.1443139Z calls_captured gmean=429.93x mean=530.000x 2025-09-07T13:25:28.1447329Z unique_graphs gmean=1.00x mean=1.000x 2025-09-07T13:25:28.1450945Z graph_breaks gmean=0.00x mean=0.000x 2025-09-07T13:25:28.1454398Z unique_graph_breaks gmean=0.00x mean=0.000x 2025-09-07T13:25:28.1458204Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T13:25:28.1461595Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T13:25:28.1465201Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T13:25:28.1466509Z compilation_latency mean=3.819 seconds 2025-09-07T13:25:29.3154160Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --inference --bfloat16 --export-aot-inductor --disable-cudagraphs --device cuda --total-partitions 7 --partition-id 3 --output /var/lib/jenkins/workspace/test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_accuracy.csv 2025-09-07T13:25:30.3460903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:25:30.3462206Z import pynvml # type: ignore[import] 2025-09-07T13:25:35.1413006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:25:35.1414017Z import pynvml # type: ignore[import] 2025-09-07T13:25:38.2792873Z 2025-09-07T13:25:40.2210526Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:25:40.2210883Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:25:40.2585350Z cuda eval hrnet_w18 2025-09-07T13:26:47.4008946Z pass 2025-09-07T13:26:52.4108869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:26:52.4110319Z import pynvml # type: ignore[import] 2025-09-07T13:26:55.4047195Z 2025-09-07T13:26:59.1345536Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:26:59.1345904Z loading model: 0it [00:03, ?it/s] 2025-09-07T13:26:59.1460258Z cuda eval inception_v3 2025-09-07T13:27:25.4412048Z pass 2025-09-07T13:27:29.7850219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:27:29.7852262Z import pynvml # type: ignore[import] 2025-09-07T13:27:32.8013410Z 2025-09-07T13:27:34.9360068Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:27:34.9360417Z loading model: 0it [00:02, ?it/s] 2025-09-07T13:27:34.9446615Z cuda eval jx_nest_base 2025-09-07T13:28:05.8962802Z pass 2025-09-07T13:28:10.3404920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:28:10.3406612Z import pynvml # type: ignore[import] 2025-09-07T13:28:13.3306283Z 2025-09-07T13:28:14.0507108Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:28:14.0507460Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:28:14.0544785Z cuda eval lcnet_050 2025-09-07T13:28:27.5349293Z pass 2025-09-07T13:28:30.9442228Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:28:30.9444204Z import pynvml # type: ignore[import] 2025-09-07T13:28:33.9952094Z 2025-09-07T13:28:34.9872292Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:28:34.9872630Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:28:34.9957345Z cuda eval levit_128 2025-09-07T13:28:59.1455735Z pass 2025-09-07T13:29:02.8209789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:29:02.8212158Z import pynvml # type: ignore[import] 2025-09-07T13:29:05.9359689Z 2025-09-07T13:29:07.1992930Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:29:07.1993368Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:29:07.2038574Z cuda eval mixer_b16_224 2025-09-07T13:29:24.0710520Z pass 2025-09-07T13:29:27.4954399Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:29:27.4955877Z import pynvml # type: ignore[import] 2025-09-07T13:29:30.5880561Z 2025-09-07T13:29:31.5784262Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:29:31.5784651Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:29:31.5887564Z cuda eval mixnet_l 2025-09-07T13:29:57.6522628Z pass 2025-09-07T13:30:01.5433412Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:30:01.5443373Z import pynvml # type: ignore[import] 2025-09-07T13:30:04.5494735Z 2025-09-07T13:30:05.6472642Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:30:05.6473265Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:30:05.6533422Z cuda eval mnasnet_100 2025-09-07T13:30:21.0794190Z pass 2025-09-07T13:30:23.5657732Z accuracy pass_rate=100.00% 2025-09-07T13:30:23.5661933Z calls_captured gmean=0.00x mean=0.000x 2025-09-07T13:30:23.5665490Z unique_graphs gmean=0.00x mean=0.000x 2025-09-07T13:30:23.5669177Z graph_breaks gmean=0.00x mean=0.000x 2025-09-07T13:30:23.5672302Z unique_graph_breaks gmean=0.00x mean=0.000x 2025-09-07T13:30:23.5675760Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T13:30:23.5679246Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T13:30:23.5682419Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T13:30:23.5683483Z compilation_latency mean=0.000 seconds 2025-09-07T13:30:24.6177364Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *maxautotune-true* ]] 2025-09-07T13:30:24.6178658Z + TORCHINDUCTOR_MAX_AUTOTUNE=1 2025-09-07T13:30:24.6179966Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --inference --bfloat16 --backend inductor --device cuda --total-partitions 7 --partition-id 3 --output /var/lib/jenkins/workspace/test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.csv 2025-09-07T13:30:25.6513863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:30:25.6515184Z import pynvml # type: ignore[import] 2025-09-07T13:30:30.4936735Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:30:30.4937960Z import pynvml # type: ignore[import] 2025-09-07T13:30:33.4705674Z 2025-09-07T13:30:35.2941113Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:30:35.2941552Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:30:35.3320455Z cuda eval hrnet_w18 2025-09-07T13:31:25.7607760Z Autotune Choices Stats: 2025-09-07T13:31:25.7609556Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.011552000418305397, "best_triton_pos": 1, "best_triton_time": 0.011744000017642975, "best_triton_kernel": "triton_mm_73", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T13:31:25.7789426Z AUTOTUNE mm(25088x64, 64x256) 2025-09-07T13:31:25.7789674Z strides: [64, 1], [1, 64] 2025-09-07T13:31:25.7789921Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:25.7790192Z mm 0.0116 ms 100.0% 2025-09-07T13:31:25.7790787Z triton_mm_73 0.0117 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:25.7791768Z triton_mm_65 0.0126 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:25.7792765Z triton_mm_70 0.0126 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:25.7793727Z triton_mm_71 0.0127 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:25.7794678Z triton_mm_67 0.0128 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:25.7796160Z triton_mm_66 0.0129 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:25.7797126Z triton_mm_69 0.0132 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:31:25.7798034Z triton_mm_68 0.0132 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:25.7798921Z triton_mm_63 0.0133 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:25.7799695Z SingleProcess AUTOTUNE benchmarking takes 0.2532 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:31:29.2466091Z Autotune Choices Stats: 2025-09-07T13:31:29.2467364Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_2802", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007903999648988247, "best_triton_pos": 0} 2025-09-07T13:31:29.2635369Z AUTOTUNE mm(25088x18, 18x128) 2025-09-07T13:31:29.2635633Z strides: [18, 1], [1, 18] 2025-09-07T13:31:29.2635888Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:29.2636562Z triton_mm_2802 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:29.2637772Z triton_mm_2800 0.0079 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:29.2638829Z triton_mm_2807 0.0079 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:29.2639854Z triton_mm_2804 0.0081 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:29.2641525Z triton_mm_2806 0.0083 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:29.2642684Z triton_mm_2803 0.0084 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:29.2643684Z triton_mm_2810 0.0084 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:29.2644685Z triton_mm_2805 0.0085 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:31:29.2645904Z triton_mm_2808 0.0085 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:31:29.2646904Z triton_mm_2809 0.0085 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:29.2647745Z SingleProcess AUTOTUNE benchmarking takes 0.2232 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T13:31:30.4188721Z Autotune Choices Stats: 2025-09-07T13:31:30.4189760Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_2783", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.008128000423312187, "best_triton_pos": 0} 2025-09-07T13:31:30.4357041Z AUTOTUNE mm(25088x32, 32x128) 2025-09-07T13:31:30.4357327Z strides: [32, 1], [1, 32] 2025-09-07T13:31:30.4357587Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:30.4358264Z triton_mm_2783 0.0081 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:30.4359258Z triton_mm_2785 0.0083 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:30.4360259Z triton_mm_2789 0.0085 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:30.4361235Z triton_mm_2787 0.0085 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:30.4362200Z triton_mm_2786 0.0085 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:30.4363171Z triton_mm_2788 0.0085 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:31:30.4364144Z triton_mm_2790 0.0086 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:30.4365463Z triton_mm_2791 0.0087 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:31:30.4366449Z triton_mm_2792 0.0089 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:30.4367423Z triton_mm_2793 0.0090 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:30.4368410Z SingleProcess AUTOTUNE benchmarking takes 0.2234 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T13:31:30.6947541Z Autotune Choices Stats: 2025-09-07T13:31:30.6948754Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_20", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.008352000266313553, "best_triton_pos": 0} 2025-09-07T13:31:30.7119731Z AUTOTUNE mm(25088x64, 64x64) 2025-09-07T13:31:30.7119972Z strides: [64, 1], [1, 64] 2025-09-07T13:31:30.7120196Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:30.7120753Z triton_mm_20 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:31:30.7121545Z triton_mm_23 0.0085 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:30.7122316Z triton_mm_22 0.0085 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:30.7123101Z triton_mm_25 0.0085 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:31:30.7123851Z triton_mm_21 0.0086 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:30.7124776Z triton_mm_24 0.0087 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:30.7125849Z triton_mm_17 0.0088 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:30.7126627Z triton_mm_30 0.0088 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:30.7127390Z triton_mm_27 0.0089 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:30.7128162Z triton_mm_26 0.0090 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:30.7128857Z SingleProcess AUTOTUNE benchmarking takes 0.2326 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:31:31.2436047Z Autotune Choices Stats: 2025-09-07T13:31:31.2437123Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_87", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.012703999876976013, "best_triton_pos": 0} 2025-09-07T13:31:31.2611850Z AUTOTUNE mm(25088x256, 256x64) 2025-09-07T13:31:31.2612146Z strides: [256, 1], [1, 256] 2025-09-07T13:31:31.2612439Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:31.2613121Z triton_mm_87 0.0127 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:31.2614100Z triton_mm_83 0.0128 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:31:31.2614716Z mm 0.0130 ms 97.5% 2025-09-07T13:31:31.2615852Z triton_mm_92 0.0131 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:31.2617006Z triton_mm_90 0.0140 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:31.2618193Z triton_mm_89 0.0142 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:31.2619085Z triton_mm_86 0.0143 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:31.2619970Z triton_mm_85 0.0145 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:31.2620844Z triton_mm_82 0.0148 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:31.2622049Z triton_mm_93 0.0148 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:31.2622836Z SingleProcess AUTOTUNE benchmarking takes 0.2342 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:31:31.5739560Z Autotune Choices Stats: 2025-09-07T13:31:31.5740855Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2696", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.007360000163316727, "best_triton_pos": 0} 2025-09-07T13:31:31.5910255Z AUTOTUNE mm(6272x36, 36x256) 2025-09-07T13:31:31.5910542Z strides: [36, 1], [1, 36] 2025-09-07T13:31:31.5910786Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:31.5911452Z triton_mm_2696 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:31.5912452Z triton_mm_2697 0.0076 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:31:31.5913429Z triton_mm_2693 0.0076 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:31.5914412Z triton_mm_2692 0.0077 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:31:31.5915720Z triton_mm_2703 0.0078 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:31.5916697Z triton_mm_2690 0.0078 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:31.5917658Z triton_mm_2691 0.0079 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:31.5918635Z triton_mm_2702 0.0079 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:31.5919548Z triton_mm_2698 0.0079 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:31.5920633Z triton_mm_2694 0.0080 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:31.5921519Z SingleProcess AUTOTUNE benchmarking takes 0.2467 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:31:34.6237928Z Autotune Choices Stats: 2025-09-07T13:31:34.6239496Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2673", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007519999984651804, "best_triton_pos": 0} 2025-09-07T13:31:34.6417174Z AUTOTUNE mm(6272x64, 64x256) 2025-09-07T13:31:34.6417617Z strides: [64, 1], [1, 64] 2025-09-07T13:31:34.6418043Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:34.6419102Z triton_mm_2673 0.0075 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:31:34.6419972Z triton_mm_2674 0.0076 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:34.6420816Z triton_mm_2677 0.0077 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:34.6421749Z triton_mm_2678 0.0077 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:31:34.6423035Z triton_mm_2675 0.0077 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:34.6423876Z triton_mm_2676 0.0077 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:34.6424742Z triton_mm_2680 0.0078 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:34.6425929Z triton_mm_2679 0.0078 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:34.6426795Z triton_mm_2684 0.0078 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:34.6427630Z triton_mm_2672 0.0079 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:34.6428359Z SingleProcess AUTOTUNE benchmarking takes 0.2521 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:31:34.9296990Z Autotune Choices Stats: 2025-09-07T13:31:34.9298065Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2588", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.007424000184983015, "best_triton_pos": 0} 2025-09-07T13:31:34.9471358Z AUTOTUNE mm(1568x72, 72x512) 2025-09-07T13:31:34.9471628Z strides: [72, 1], [1, 72] 2025-09-07T13:31:34.9471879Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:34.9472540Z triton_mm_2588 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:34.9473544Z triton_mm_2589 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:34.9474826Z triton_mm_2593 0.0078 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:34.9476305Z triton_mm_2591 0.0078 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:31:34.9477412Z triton_mm_2592 0.0078 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:34.9478397Z triton_mm_2590 0.0079 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:34.9479368Z triton_mm_2585 0.0081 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:34.9480381Z triton_mm_2595 0.0081 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:34.9481360Z triton_mm_2579 0.0082 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:31:34.9482323Z triton_mm_2584 0.0083 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:34.9483165Z SingleProcess AUTOTUNE benchmarking takes 0.2451 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:31:35.5815947Z Autotune Choices Stats: 2025-09-07T13:31:35.5817040Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_mm_2767", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.007199999876320362, "best_triton_pos": 0} 2025-09-07T13:31:35.5993396Z AUTOTUNE mm(25088x18, 18x32) 2025-09-07T13:31:35.5993692Z strides: [18, 1], [1, 18] 2025-09-07T13:31:35.5993947Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:35.5994616Z triton_mm_2767 0.0072 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:35.5995778Z triton_mm_2758 0.0072 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:35.5996762Z triton_mm_2756 0.0073 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:35.5997734Z triton_mm_2757 0.0073 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:35.5998711Z triton_mm_2764 0.0073 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:35.5999693Z triton_mm_2759 0.0074 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:35.6000606Z triton_mm_2768 0.0075 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:31:35.6001510Z triton_mm_2755 0.0075 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:31:35.6002700Z triton_mm_2760 0.0076 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:35.6003716Z triton_mm_2769 0.0076 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:35.6004508Z SingleProcess AUTOTUNE benchmarking takes 0.2040 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T13:31:37.3457354Z Autotune Choices Stats: 2025-09-07T13:31:37.3458449Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2571", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.007519999984651804, "best_triton_pos": 0} 2025-09-07T13:31:37.3631832Z AUTOTUNE mm(1568x128, 128x512) 2025-09-07T13:31:37.3632157Z strides: [128, 1], [1, 128] 2025-09-07T13:31:37.3632426Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:37.3633108Z triton_mm_2571 0.0075 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:37.3634088Z triton_mm_2567 0.0078 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:31:37.3635383Z triton_mm_2570 0.0079 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:37.3636634Z triton_mm_2574 0.0079 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:37.3637639Z triton_mm_2568 0.0080 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:37.3638606Z triton_mm_2569 0.0080 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:37.3639579Z triton_mm_2572 0.0080 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:31:37.3640535Z triton_mm_2577 0.0082 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:37.3641426Z triton_mm_2573 0.0084 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:37.3642324Z triton_mm_2578 0.0084 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:37.3643109Z SingleProcess AUTOTUNE benchmarking takes 0.2436 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:31:37.6533581Z Autotune Choices Stats: 2025-09-07T13:31:37.6534717Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2482", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007296000141650438, "best_triton_pos": 0} 2025-09-07T13:31:37.6713483Z AUTOTUNE mm(392x144, 144x1024) 2025-09-07T13:31:37.6713804Z strides: [144, 1], [1, 144] 2025-09-07T13:31:37.6714096Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:37.6714749Z triton_mm_2482 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:31:37.6716832Z triton_mm_2483 0.0075 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:37.6717936Z triton_mm_2484 0.0076 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:37.6718923Z triton_mm_2486 0.0076 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:37.6719920Z triton_mm_2489 0.0076 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:37.6720872Z triton_mm_2485 0.0076 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:37.6721768Z triton_mm_2478 0.0079 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:37.6722671Z triton_mm_2488 0.0079 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:37.6723239Z mm 0.0079 ms 92.3% 2025-09-07T13:31:37.6723786Z triton_mm_2487 0.0080 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:31:37.6724725Z SingleProcess AUTOTUNE benchmarking takes 0.2483 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:31:38.3415365Z Autotune Choices Stats: 2025-09-07T13:31:38.3416498Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_2642", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006624000146985054, "best_triton_pos": 0} 2025-09-07T13:31:38.3596965Z AUTOTUNE mm(6272x36, 36x64) 2025-09-07T13:31:38.3597243Z strides: [36, 1], [1, 36] 2025-09-07T13:31:38.3597492Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:38.3598160Z triton_mm_2642 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:38.3599166Z triton_mm_2648 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:31:38.3600153Z triton_mm_2650 0.0067 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:38.3601100Z triton_mm_2651 0.0067 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:38.3601996Z triton_mm_2641 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:31:38.3602882Z triton_mm_2646 0.0068 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:38.3603794Z triton_mm_2647 0.0068 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:38.3605276Z triton_mm_2645 0.0068 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:38.3606354Z triton_mm_2649 0.0070 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:38.3607372Z triton_mm_2652 0.0070 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:38.3608179Z SingleProcess AUTOTUNE benchmarking takes 0.2364 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:31:38.9198229Z Autotune Choices Stats: 2025-09-07T13:31:38.9199270Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2537", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.006752000190317631, "best_triton_pos": 0} 2025-09-07T13:31:38.9378867Z AUTOTUNE mm(1568x72, 72x128) 2025-09-07T13:31:38.9379179Z strides: [72, 1], [1, 72] 2025-09-07T13:31:38.9379466Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:38.9380166Z triton_mm_2537 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:38.9381203Z triton_mm_2536 0.0069 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:38.9382532Z triton_mm_2541 0.0069 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:31:38.9383530Z triton_mm_2543 0.0071 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:38.9384513Z triton_mm_2548 0.0071 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:38.9387072Z triton_mm_2535 0.0072 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:38.9388054Z triton_mm_2540 0.0072 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:38.9389025Z triton_mm_2544 0.0072 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:38.9390120Z triton_mm_2547 0.0072 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:38.9391106Z triton_mm_2546 0.0073 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:31:38.9391972Z SingleProcess AUTOTUNE benchmarking takes 0.2492 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:31:39.5464374Z Autotune Choices Stats: 2025-09-07T13:31:39.5466040Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2432", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.007007999811321497, "best_triton_pos": 0} 2025-09-07T13:31:39.5656003Z AUTOTUNE mm(392x144, 144x256) 2025-09-07T13:31:39.5656372Z strides: [144, 1], [1, 144] 2025-09-07T13:31:39.5656987Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:39.5657885Z triton_mm_2432 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:39.5659120Z triton_mm_2433 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:39.5660197Z triton_mm_2434 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:39.5661220Z triton_mm_2437 0.0071 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:31:39.5662371Z triton_mm_2431 0.0071 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:39.5662991Z mm 0.0074 ms 95.2% 2025-09-07T13:31:39.5663563Z triton_mm_2441 0.0074 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:39.5664537Z triton_mm_2444 0.0075 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:39.5665727Z triton_mm_2443 0.0077 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:39.5666835Z triton_mm_2442 0.0079 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:31:39.5667716Z SingleProcess AUTOTUNE benchmarking takes 0.3030 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:31:41.4871514Z Autotune Choices Stats: 2025-09-07T13:31:41.4872587Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_341", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.006432000081986189, "best_triton_pos": 0} 2025-09-07T13:31:41.5046579Z AUTOTUNE mm(6272x36, 36x18) 2025-09-07T13:31:41.5046867Z strides: [36, 1], [1, 36] 2025-09-07T13:31:41.5047158Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:41.5047855Z triton_mm_341 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:31:41.5048834Z triton_mm_342 0.0066 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:41.5049815Z triton_mm_340 0.0068 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:41.5050884Z triton_mm_343 0.0068 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:41.5051789Z triton_mm_335 0.0068 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:41.5052675Z triton_mm_337 0.0069 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:41.5053565Z triton_mm_347 0.0069 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:41.5054933Z triton_mm_339 0.0069 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:41.5056330Z triton_mm_336 0.0070 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:41.5057234Z triton_mm_348 0.0070 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:31:41.5058019Z SingleProcess AUTOTUNE benchmarking takes 0.2246 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T13:31:41.8818379Z Autotune Choices Stats: 2025-09-07T13:31:41.8819906Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_631", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006591999903321266, "best_triton_pos": 0} 2025-09-07T13:31:41.8999316Z AUTOTUNE mm(1568x72, 72x36) 2025-09-07T13:31:41.8999654Z strides: [72, 1], [1, 72] 2025-09-07T13:31:41.8999939Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:41.9000662Z triton_mm_631 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:41.9001738Z triton_mm_633 0.0067 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:41.9003084Z triton_mm_641 0.0068 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:41.9004081Z triton_mm_639 0.0069 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:41.9005480Z triton_mm_640 0.0069 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:41.9006442Z triton_mm_632 0.0071 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:41.9007397Z triton_mm_637 0.0072 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:31:41.9008365Z triton_mm_638 0.0072 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:41.9009327Z triton_mm_636 0.0074 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:41.9010289Z triton_mm_646 0.0074 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:41.9011136Z SingleProcess AUTOTUNE benchmarking takes 0.2555 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:31:42.1152835Z Autotune Choices Stats: 2025-09-07T13:31:42.1154322Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_553", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.0066559999249875546, "best_triton_pos": 0} 2025-09-07T13:31:42.1338416Z AUTOTUNE mm(1568x72, 72x18) 2025-09-07T13:31:42.1338872Z strides: [72, 1], [1, 72] 2025-09-07T13:31:42.1339244Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:42.1340572Z triton_mm_553 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:42.1342350Z triton_mm_558 0.0067 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:42.1343740Z triton_mm_559 0.0067 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:42.1345351Z triton_mm_560 0.0067 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:42.1346784Z triton_mm_551 0.0067 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:42.1348235Z triton_mm_552 0.0068 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:42.1349649Z triton_mm_557 0.0068 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:31:42.1351041Z triton_mm_563 0.0069 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:42.1352686Z triton_mm_562 0.0069 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:42.1354112Z triton_mm_565 0.0071 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:42.1355520Z SingleProcess AUTOTUNE benchmarking takes 0.2278 seconds and 0.0003 seconds precompiling for 18 choices 2025-09-07T13:31:43.0426647Z Autotune Choices Stats: 2025-09-07T13:31:43.0427749Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_1731", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.006496000103652477, "best_triton_pos": 0} 2025-09-07T13:31:43.0610770Z AUTOTUNE mm(392x144, 144x36) 2025-09-07T13:31:43.0611116Z strides: [144, 1], [1, 144] 2025-09-07T13:31:43.0611436Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:43.0612143Z triton_mm_1731 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:43.0613185Z triton_mm_1724 0.0069 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:43.0614178Z triton_mm_1730 0.0070 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:43.0615532Z triton_mm_1727 0.0070 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:31:43.0616539Z triton_mm_1723 0.0071 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:43.0617882Z triton_mm_1728 0.0072 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:43.0619029Z triton_mm_1729 0.0072 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:43.0620124Z triton_mm_1732 0.0072 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:31:43.0621101Z triton_mm_1722 0.0072 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:43.0622014Z triton_mm_1721 0.0073 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:43.0622723Z SingleProcess AUTOTUNE benchmarking takes 0.2375 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:31:43.2772897Z Autotune Choices Stats: 2025-09-07T13:31:43.2774033Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_1632", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.006560000125318766, "best_triton_pos": 0} 2025-09-07T13:31:43.2951461Z AUTOTUNE mm(392x144, 144x18) 2025-09-07T13:31:43.2951753Z strides: [144, 1], [1, 144] 2025-09-07T13:31:43.2952011Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:43.2953125Z triton_mm_1632 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:43.2954136Z triton_mm_1624 0.0066 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:43.2955313Z triton_mm_1626 0.0068 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:43.2956304Z triton_mm_1625 0.0068 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:43.2957268Z triton_mm_1629 0.0068 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:31:43.2958237Z triton_mm_1631 0.0068 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:43.2959212Z triton_mm_1633 0.0068 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:31:43.2960198Z triton_mm_1630 0.0070 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:43.2961180Z triton_mm_1623 0.0071 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:43.2962187Z triton_mm_1637 0.0072 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:43.2963188Z SingleProcess AUTOTUNE benchmarking takes 0.2234 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T13:31:43.6258416Z Autotune Choices Stats: 2025-09-07T13:31:43.6259851Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1835", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.006752000190317631, "best_triton_pos": 0} 2025-09-07T13:31:43.6439468Z AUTOTUNE mm(392x144, 144x72) 2025-09-07T13:31:43.6439722Z strides: [144, 1], [1, 144] 2025-09-07T13:31:43.6439977Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:43.6440636Z triton_mm_1835 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:43.6441866Z triton_mm_1836 0.0070 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:43.6442881Z triton_mm_1834 0.0070 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:43.6443850Z triton_mm_1839 0.0070 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:31:43.6444810Z triton_mm_1840 0.0071 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:43.6446107Z triton_mm_1833 0.0071 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:43.6446716Z mm 0.0072 ms 93.8% 2025-09-07T13:31:43.6447491Z triton_mm_1842 0.0073 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:43.6448503Z triton_mm_1843 0.0074 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:43.6449486Z triton_mm_1841 0.0075 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:43.6450340Z SingleProcess AUTOTUNE benchmarking takes 0.2479 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:31:44.3085355Z Autotune Choices Stats: 2025-09-07T13:31:44.3086437Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2464", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007360000163316727, "best_triton_pos": 0} 2025-09-07T13:31:44.3276574Z AUTOTUNE mm(392x256, 256x1024) 2025-09-07T13:31:44.3276857Z strides: [256, 1], [1, 256] 2025-09-07T13:31:44.3277113Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:31:44.3277776Z triton_mm_2464 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:31:44.3278773Z triton_mm_2463 0.0078 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:31:44.3279742Z triton_mm_2468 0.0080 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:31:44.3280725Z triton_mm_2467 0.0080 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:44.3281341Z mm 0.0083 ms 89.1% 2025-09-07T13:31:44.3282125Z triton_mm_2466 0.0084 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:44.3283133Z triton_mm_2470 0.0084 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:31:44.3284124Z triton_mm_2473 0.0088 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:31:44.3285171Z triton_mm_2457 0.0088 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:31:44.3286093Z triton_mm_2459 0.0088 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:31:44.3286872Z SingleProcess AUTOTUNE benchmarking takes 0.2553 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:31:59.2266217Z pass 2025-09-07T13:32:04.8195303Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:32:04.8197076Z import pynvml # type: ignore[import] 2025-09-07T13:32:07.8197587Z 2025-09-07T13:32:09.1629008Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:32:09.1629346Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:32:09.1748957Z cuda eval inception_v3 2025-09-07T13:32:32.4550915Z Autotune Choices Stats: 2025-09-07T13:32:32.4552474Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.010591999627649784, "best_triton_pos": 0} 2025-09-07T13:32:32.4738794Z AUTOTUNE mm(42632x64, 64x80) 2025-09-07T13:32:32.4739134Z strides: [64, 1], [1, 64] 2025-09-07T13:32:32.4739444Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:32.4740293Z triton_mm_31 0.0106 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:32.4741503Z triton_mm_32 0.0107 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:32:32.4742302Z mm 0.0107 ms 99.1% 2025-09-07T13:32:32.4742974Z triton_mm_36 0.0115 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:32.4744166Z triton_mm_27 0.0118 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:32.4745758Z triton_mm_38 0.0122 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:32.4746902Z triton_mm_33 0.0122 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:32.4748002Z triton_mm_28 0.0124 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:32.4749114Z triton_mm_37 0.0125 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:32.4750511Z triton_mm_34 0.0125 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:32:32.4751736Z SingleProcess AUTOTUNE benchmarking takes 0.2629 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:32:33.0628269Z Autotune Choices Stats: 2025-09-07T13:32:33.0629377Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_100", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.008799999952316284, "best_triton_pos": 0} 2025-09-07T13:32:33.0808491Z AUTOTUNE mm(9800x192, 192x64) 2025-09-07T13:32:33.0808771Z strides: [192, 1], [1, 192] 2025-09-07T13:32:33.0809081Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:33.0809762Z triton_mm_100 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:33.0810772Z triton_mm_96 0.0091 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:33.0811744Z triton_mm_99 0.0094 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:32:33.0812741Z triton_mm_103 0.0094 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:32:33.0814022Z triton_mm_106 0.0094 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:33.0815489Z triton_mm_98 0.0096 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:33.0816430Z triton_mm_105 0.0099 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:33.0816995Z mm 0.0100 ms 88.4% 2025-09-07T13:32:33.0817525Z triton_mm_90 0.0100 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:32:33.0818417Z triton_mm_92 0.0100 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:33.0819198Z SingleProcess AUTOTUNE benchmarking takes 0.2468 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:32:33.6341639Z Autotune Choices Stats: 2025-09-07T13:32:33.6342722Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_192", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.009664000011980534, "best_triton_pos": 0} 2025-09-07T13:32:33.6519056Z AUTOTUNE mm(9800x256, 256x64) 2025-09-07T13:32:33.6519344Z strides: [256, 1], [1, 256] 2025-09-07T13:32:33.6519619Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:33.6520288Z triton_mm_192 0.0097 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:33.6521331Z triton_mm_188 0.0097 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:33.6522299Z triton_mm_198 0.0099 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:33.6523193Z mm 0.0100 ms 97.1% 2025-09-07T13:32:33.6523933Z triton_mm_191 0.0101 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:32:33.6525250Z triton_mm_190 0.0101 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:33.6526293Z triton_mm_182 0.0103 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:32:33.6527285Z triton_mm_197 0.0103 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:33.6528260Z triton_mm_195 0.0106 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:32:33.6529220Z triton_mm_184 0.0108 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:33.6530058Z SingleProcess AUTOTUNE benchmarking takes 0.2461 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:32:33.9093371Z Autotune Choices Stats: 2025-09-07T13:32:33.9094867Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_291", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.01027199998497963, "best_triton_pos": 0} 2025-09-07T13:32:33.9276358Z AUTOTUNE mm(9800x288, 288x64) 2025-09-07T13:32:33.9276663Z strides: [288, 1], [1, 288] 2025-09-07T13:32:33.9276941Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:33.9277638Z triton_mm_291 0.0103 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:33.9278665Z triton_mm_281 0.0104 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:33.9279631Z triton_mm_285 0.0104 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:33.9280600Z triton_mm_284 0.0105 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:32:33.9281210Z mm 0.0105 ms 97.6% 2025-09-07T13:32:33.9281788Z triton_mm_288 0.0108 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:32:33.9282779Z triton_mm_277 0.0108 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:33.9283764Z triton_mm_290 0.0109 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:33.9284734Z triton_mm_283 0.0111 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:33.9286149Z triton_mm_276 0.0114 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:33.9287202Z SingleProcess AUTOTUNE benchmarking takes 0.2440 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:32:34.2140353Z Autotune Choices Stats: 2025-09-07T13:32:34.2141734Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_75", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.008991999551653862, "best_triton_pos": 0} 2025-09-07T13:32:34.2318504Z AUTOTUNE mm(9800x192, 192x48) 2025-09-07T13:32:34.2318769Z strides: [192, 1], [1, 192] 2025-09-07T13:32:34.2319034Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:34.2319706Z triton_mm_75 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:34.2320696Z triton_mm_80 0.0092 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:34.2321690Z triton_mm_71 0.0093 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:34.2322671Z triton_mm_81 0.0093 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:34.2323649Z triton_mm_73 0.0095 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:34.2324786Z triton_mm_78 0.0096 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:32:34.2325889Z mm 0.0096 ms 93.4% 2025-09-07T13:32:34.2326426Z triton_mm_74 0.0098 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:32:34.2327317Z triton_mm_77 0.0098 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:34.2328209Z triton_mm_65 0.0099 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:32:34.2328986Z SingleProcess AUTOTUNE benchmarking takes 0.2455 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:32:34.7791744Z Autotune Choices Stats: 2025-09-07T13:32:34.7792797Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_167", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.009631999768316746, "best_triton_pos": 0} 2025-09-07T13:32:34.7970317Z AUTOTUNE mm(9800x256, 256x48) 2025-09-07T13:32:34.7970689Z strides: [256, 1], [1, 256] 2025-09-07T13:32:34.7971003Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:34.7971715Z triton_mm_167 0.0096 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:34.7972732Z triton_mm_173 0.0098 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:34.7973703Z triton_mm_157 0.0099 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:32:34.7974671Z triton_mm_163 0.0099 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:34.7976170Z mm 0.0100 ms 96.8% 2025-09-07T13:32:34.7976873Z triton_mm_170 0.0101 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:32:34.7977882Z triton_mm_166 0.0104 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:32:34.7978769Z triton_mm_172 0.0105 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:34.7979664Z triton_mm_159 0.0106 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:34.7980551Z triton_mm_168 0.0107 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:32:34.7981438Z SingleProcess AUTOTUNE benchmarking takes 0.2443 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:32:35.0577092Z Autotune Choices Stats: 2025-09-07T13:32:35.0578390Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "mm", "best_time": 0.010143999941647053, "best_triton_pos": 1, "best_triton_time": 0.010367999784648418, "best_triton_kernel": "triton_mm_263", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8"} 2025-09-07T13:32:35.0759400Z AUTOTUNE mm(9800x288, 288x48) 2025-09-07T13:32:35.0759783Z strides: [288, 1], [1, 288] 2025-09-07T13:32:35.0760430Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:35.0760740Z mm 0.0101 ms 100.0% 2025-09-07T13:32:35.0761377Z triton_mm_263 0.0104 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:32:35.0762403Z triton_mm_260 0.0104 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:35.0763390Z triton_mm_266 0.0104 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:35.0764354Z triton_mm_256 0.0106 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:35.0765745Z triton_mm_259 0.0107 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:32:35.0766727Z triton_mm_258 0.0108 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:35.0767610Z triton_mm_252 0.0109 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:35.0768518Z triton_mm_265 0.0109 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:35.0769448Z triton_mm_262 0.0114 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:35.0770239Z SingleProcess AUTOTUNE benchmarking takes 0.2476 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:32:35.3475399Z Autotune Choices Stats: 2025-09-07T13:32:35.3476921Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_744", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009920000098645687, "best_triton_pos": 0} 2025-09-07T13:32:35.3655881Z AUTOTUNE mm(2312x768, 768x192) 2025-09-07T13:32:35.3656171Z strides: [768, 1], [1, 768] 2025-09-07T13:32:35.3656448Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:35.3657120Z triton_mm_744 0.0099 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:35.3657748Z mm 0.0101 ms 98.4% 2025-09-07T13:32:35.3658363Z triton_mm_748 0.0107 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:32:35.3659341Z triton_mm_743 0.0114 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:35.3660301Z triton_mm_740 0.0116 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:35.3661274Z triton_mm_754 0.0118 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:35.3662360Z triton_mm_747 0.0121 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:35.3663541Z triton_mm_739 0.0125 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:35.3664530Z triton_mm_737 0.0127 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:32:35.3665817Z triton_mm_753 0.0131 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:35.3666732Z SingleProcess AUTOTUNE benchmarking takes 0.2575 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:32:36.0106406Z Autotune Choices Stats: 2025-09-07T13:32:36.0107442Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_508", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009983999654650688, "best_triton_pos": 0} 2025-09-07T13:32:36.0286217Z AUTOTUNE mm(2312x768, 768x160) 2025-09-07T13:32:36.0286522Z strides: [768, 1], [1, 768] 2025-09-07T13:32:36.0286798Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:36.0287483Z triton_mm_508 0.0100 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:36.0288165Z mm 0.0103 ms 96.9% 2025-09-07T13:32:36.0288770Z triton_mm_512 0.0109 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:32:36.0289750Z triton_mm_504 0.0116 ms 85.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:36.0290713Z triton_mm_507 0.0116 ms 85.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:36.0292027Z triton_mm_518 0.0119 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:36.0293178Z triton_mm_511 0.0120 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:36.0294315Z triton_mm_501 0.0122 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:32:36.0295593Z triton_mm_503 0.0126 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:36.0296621Z triton_mm_514 0.0132 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:32:36.0297406Z SingleProcess AUTOTUNE benchmarking takes 0.2576 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:32:36.6706968Z Autotune Choices Stats: 2025-09-07T13:32:36.6707981Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_390", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009440000168979168, "best_triton_pos": 0} 2025-09-07T13:32:36.6893356Z AUTOTUNE mm(2312x768, 768x128) 2025-09-07T13:32:36.6893636Z strides: [768, 1], [1, 768] 2025-09-07T13:32:36.6893910Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:36.6895261Z triton_mm_390 0.0094 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:36.6895944Z mm 0.0095 ms 99.0% 2025-09-07T13:32:36.6896599Z triton_mm_394 0.0108 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:32:36.6897453Z triton_mm_386 0.0112 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:36.6898289Z triton_mm_389 0.0113 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:36.6899115Z triton_mm_393 0.0120 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:36.6899944Z triton_mm_400 0.0120 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:36.6900773Z triton_mm_385 0.0123 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:36.6901671Z triton_mm_383 0.0124 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:32:36.6902514Z triton_mm_399 0.0130 ms 72.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:36.6903246Z SingleProcess AUTOTUNE benchmarking takes 0.2593 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:32:37.2819258Z Autotune Choices Stats: 2025-09-07T13:32:37.2820298Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_957", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009727999567985535, "best_triton_pos": 0} 2025-09-07T13:32:37.3001448Z AUTOTUNE mm(512x1280, 1280x448) 2025-09-07T13:32:37.3001774Z strides: [1280, 1], [1, 1280] 2025-09-07T13:32:37.3002402Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:37.3003326Z triton_mm_957 0.0097 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:37.3004065Z mm 0.0104 ms 93.3% 2025-09-07T13:32:37.3004735Z triton_mm_961 0.0105 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:37.3006213Z triton_mm_965 0.0116 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:32:37.3007122Z triton_mm_971 0.0137 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:37.3008013Z triton_mm_956 0.0138 ms 70.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:37.3008885Z triton_mm_955 0.0140 ms 69.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:37.3009773Z triton_mm_960 0.0141 ms 68.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:37.3010768Z triton_mm_964 0.0142 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:37.3011665Z triton_mm_954 0.0148 ms 65.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:32:37.3012447Z SingleProcess AUTOTUNE benchmarking takes 0.2630 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:32:37.8677912Z Autotune Choices Stats: 2025-09-07T13:32:37.8679244Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.011168000288307667, "best_triton_pos": 1, "best_triton_time": 0.011552000418305397, "best_triton_kernel": "triton_mm_1068", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T13:32:37.8857902Z AUTOTUNE mm(512x2048, 2048x448) 2025-09-07T13:32:37.8858211Z strides: [2048, 1], [1, 2048] 2025-09-07T13:32:37.8858487Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:37.8858772Z mm 0.0112 ms 100.0% 2025-09-07T13:32:37.8859409Z triton_mm_1068 0.0116 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:37.8860397Z triton_mm_1072 0.0123 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:37.8861468Z triton_mm_1076 0.0141 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:32:37.8862492Z triton_mm_1082 0.0176 ms 63.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:37.8863483Z triton_mm_1067 0.0181 ms 61.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:37.8865496Z triton_mm_1066 0.0183 ms 60.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:37.8866801Z triton_mm_1071 0.0195 ms 57.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:37.8867788Z triton_mm_1075 0.0196 ms 57.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:37.8868758Z triton_mm_1065 0.0197 ms 56.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:32:37.8869610Z SingleProcess AUTOTUNE benchmarking takes 0.2679 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:32:38.1593337Z Autotune Choices Stats: 2025-09-07T13:32:38.1594369Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_924", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009247999638319016, "best_triton_pos": 0} 2025-09-07T13:32:38.1780716Z AUTOTUNE mm(512x1280, 1280x384) 2025-09-07T13:32:38.1780988Z strides: [1280, 1], [1, 1280] 2025-09-07T13:32:38.1781261Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:38.1782006Z triton_mm_924 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:38.1782637Z mm 0.0094 ms 98.0% 2025-09-07T13:32:38.1783468Z triton_mm_928 0.0098 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:38.1784534Z triton_mm_932 0.0111 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:32:38.1785816Z triton_mm_923 0.0127 ms 72.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:38.1786951Z triton_mm_927 0.0133 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:38.1787916Z triton_mm_938 0.0133 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:38.1788875Z triton_mm_922 0.0135 ms 68.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:38.1789841Z triton_mm_931 0.0140 ms 66.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:38.1790797Z triton_mm_921 0.0143 ms 64.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:32:38.1791643Z SingleProcess AUTOTUNE benchmarking takes 0.2605 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:32:38.7465655Z Autotune Choices Stats: 2025-09-07T13:32:38.7466791Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1035", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.011264000087976456, "best_triton_pos": 0} 2025-09-07T13:32:38.7648798Z AUTOTUNE mm(512x2048, 2048x384) 2025-09-07T13:32:38.7649061Z strides: [2048, 1], [1, 2048] 2025-09-07T13:32:38.7649355Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:38.7650214Z triton_mm_1035 0.0113 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:38.7650976Z mm 0.0113 ms 99.7% 2025-09-07T13:32:38.7651559Z triton_mm_1039 0.0123 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:38.7652548Z triton_mm_1043 0.0137 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:32:38.7653518Z triton_mm_1049 0.0176 ms 64.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:38.7654487Z triton_mm_1034 0.0182 ms 62.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:38.7655623Z triton_mm_1033 0.0184 ms 61.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:38.7656622Z triton_mm_1038 0.0192 ms 58.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:38.7657756Z triton_mm_1042 0.0194 ms 58.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:38.7658674Z triton_mm_1032 0.0195 ms 57.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:32:38.7659465Z SingleProcess AUTOTUNE benchmarking takes 0.2681 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:32:39.6907821Z Autotune Choices Stats: 2025-09-07T13:32:39.6908855Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_131", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.00863999966531992, "best_triton_pos": 0} 2025-09-07T13:32:39.7093668Z AUTOTUNE mm(9800x192, 192x32) 2025-09-07T13:32:39.7093958Z strides: [192, 1], [1, 192] 2025-09-07T13:32:39.7094233Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:39.7094921Z triton_mm_131 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:39.7096095Z triton_mm_128 0.0088 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:39.7097159Z triton_mm_132 0.0089 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:32:39.7098053Z triton_mm_136 0.0090 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:39.7098941Z triton_mm_134 0.0091 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:32:39.7099827Z triton_mm_137 0.0091 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:39.7101077Z triton_mm_124 0.0092 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:39.7102221Z triton_mm_123 0.0093 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:39.7111693Z triton_mm_129 0.0093 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:39.7112291Z mm 0.0093 ms 92.5% 2025-09-07T13:32:39.7112677Z SingleProcess AUTOTUNE benchmarking takes 0.2327 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T13:32:40.2114145Z Autotune Choices Stats: 2025-09-07T13:32:40.2115370Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_905", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009279999881982803, "best_triton_pos": 0} 2025-09-07T13:32:40.2302969Z AUTOTUNE mm(512x1280, 1280x320) 2025-09-07T13:32:40.2303246Z strides: [1280, 1], [1, 1280] 2025-09-07T13:32:40.2303549Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:40.2304238Z triton_mm_905 0.0093 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:40.2304876Z mm 0.0095 ms 97.3% 2025-09-07T13:32:40.2305619Z triton_mm_909 0.0100 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:40.2306825Z triton_mm_913 0.0112 ms 83.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:32:40.2307968Z triton_mm_904 0.0131 ms 70.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:40.2308935Z triton_mm_908 0.0134 ms 69.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:40.2309904Z triton_mm_919 0.0134 ms 69.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:40.2310865Z triton_mm_903 0.0136 ms 68.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:40.2311823Z triton_mm_912 0.0141 ms 65.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:40.2312788Z triton_mm_902 0.0142 ms 65.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:32:40.2313629Z SingleProcess AUTOTUNE benchmarking takes 0.2583 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:32:40.4976729Z Autotune Choices Stats: 2025-09-07T13:32:40.4977736Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_997", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009088000282645226, "best_triton_pos": 0} 2025-09-07T13:32:40.5167316Z AUTOTUNE mm(512x1280, 1280x192) 2025-09-07T13:32:40.5167602Z strides: [1280, 1], [1, 1280] 2025-09-07T13:32:40.5167896Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:40.5168767Z triton_mm_997 0.0091 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:40.5169400Z mm 0.0092 ms 98.3% 2025-09-07T13:32:40.5170148Z triton_mm_1001 0.0096 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:40.5171253Z triton_mm_1005 0.0111 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:32:40.5172225Z triton_mm_996 0.0129 ms 70.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:40.5173181Z triton_mm_1000 0.0133 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:40.5174143Z triton_mm_995 0.0134 ms 67.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:40.5175408Z triton_mm_1011 0.0134 ms 67.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:40.5176397Z triton_mm_1004 0.0138 ms 65.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:40.5177576Z triton_mm_994 0.0140 ms 64.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:32:40.5178371Z SingleProcess AUTOTUNE benchmarking takes 0.2578 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:32:40.7656229Z Autotune Choices Stats: 2025-09-07T13:32:40.7657488Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.010688000358641148, "best_triton_pos": 1, "best_triton_time": 0.011008000001311302, "best_triton_kernel": "triton_mm_1016", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T13:32:40.7841882Z AUTOTUNE mm(512x2048, 2048x320) 2025-09-07T13:32:40.7842158Z strides: [2048, 1], [1, 2048] 2025-09-07T13:32:40.7842444Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:40.7842726Z mm 0.0107 ms 100.0% 2025-09-07T13:32:40.7843336Z triton_mm_1016 0.0110 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:40.7844338Z triton_mm_1020 0.0123 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:40.7845740Z triton_mm_1024 0.0138 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:32:40.7846740Z triton_mm_1030 0.0176 ms 60.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:40.7847746Z triton_mm_1015 0.0180 ms 59.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:40.7848586Z triton_mm_1014 0.0180 ms 59.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:40.7849865Z triton_mm_1013 0.0188 ms 56.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:32:40.7850819Z triton_mm_1019 0.0193 ms 55.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:40.7851740Z triton_mm_1023 0.0194 ms 55.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:40.7852472Z SingleProcess AUTOTUNE benchmarking takes 0.2669 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:32:41.0594476Z Autotune Choices Stats: 2025-09-07T13:32:41.0596215Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.010367999784648418, "best_triton_pos": 1, "best_triton_time": 0.010847999714314938, "best_triton_kernel": "triton_mm_1108", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T13:32:41.0781636Z AUTOTUNE mm(512x2048, 2048x192) 2025-09-07T13:32:41.0781915Z strides: [2048, 1], [1, 2048] 2025-09-07T13:32:41.0782222Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:32:41.0782503Z mm 0.0104 ms 100.0% 2025-09-07T13:32:41.0783121Z triton_mm_1108 0.0108 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:41.0784163Z triton_mm_1112 0.0120 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:32:41.0785961Z triton_mm_1116 0.0136 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:32:41.0786989Z triton_mm_1107 0.0174 ms 59.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:41.0788107Z triton_mm_1122 0.0175 ms 59.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:41.0789088Z triton_mm_1106 0.0176 ms 59.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:32:41.0790057Z triton_mm_1111 0.0182 ms 57.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:32:41.0791028Z triton_mm_1105 0.0185 ms 56.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:32:41.0792001Z triton_mm_1115 0.0187 ms 55.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:32:41.0792852Z SingleProcess AUTOTUNE benchmarking takes 0.2648 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:32:49.6802728Z pass 2025-09-07T13:32:53.8217344Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:32:53.8218628Z import pynvml # type: ignore[import] 2025-09-07T13:32:56.8330629Z 2025-09-07T13:32:58.4507040Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:32:58.4507583Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:32:58.4591718Z cuda eval jx_nest_base 2025-09-07T13:33:30.8233016Z pass 2025-09-07T13:33:34.9794015Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:33:34.9795737Z import pynvml # type: ignore[import] 2025-09-07T13:33:38.1400690Z 2025-09-07T13:33:38.8889331Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:33:38.8889711Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:33:38.8926867Z cuda eval lcnet_050 2025-09-07T13:33:48.3450634Z Autotune Choices Stats: 2025-09-07T13:33:48.3452400Z {"num_choices": 12, "num_triton_choices": 11, "best_kernel": "triton_mm_13", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.0077760000713169575, "best_triton_pos": 0} 2025-09-07T13:33:48.3646974Z AUTOTUNE mm(100352x8, 8x16) 2025-09-07T13:33:48.3647461Z strides: [8, 1], [1, 8] 2025-09-07T13:33:48.3647882Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:33:48.3648991Z triton_mm_13 0.0078 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:33:48.3650602Z triton_mm_14 0.0078 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:33:48.3652187Z triton_mm_11 0.0078 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:33:48.3654250Z triton_mm_15 0.0078 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:33:48.3656145Z triton_mm_8 0.0079 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:33:48.3657557Z triton_mm_9 0.0079 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:33:48.3658378Z triton_mm_10 0.0079 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:33:48.3659202Z triton_mm_6 0.0081 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:33:48.3660021Z triton_mm_7 0.0081 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:33:48.3660846Z triton_mm_5 0.0082 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:33:48.3661658Z SingleProcess AUTOTUNE benchmarking takes 0.1737 seconds and 0.0003 seconds precompiling for 12 choices 2025-09-07T13:33:48.8560208Z Autotune Choices Stats: 2025-09-07T13:33:48.8561037Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_mm_20", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006816000211983919, "best_triton_pos": 0} 2025-09-07T13:33:48.8751125Z AUTOTUNE mm(25088x16, 16x32) 2025-09-07T13:33:48.8751409Z strides: [16, 1], [1, 16] 2025-09-07T13:33:48.8751662Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:33:48.8752590Z triton_mm_20 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:33:48.8753710Z triton_mm_18 0.0069 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:33:48.8754799Z triton_mm_23 0.0070 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:33:48.8755959Z triton_mm_21 0.0070 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:33:48.8756925Z triton_mm_17 0.0071 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:33:48.8757854Z triton_mm_25 0.0071 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:33:48.8758730Z triton_mm_19 0.0071 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:33:48.8759607Z triton_mm_22 0.0072 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:33:48.8760474Z triton_mm_26 0.0073 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:33:48.8761458Z triton_mm_16 0.0074 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:33:48.8762235Z SingleProcess AUTOTUNE benchmarking takes 0.1988 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T13:33:49.3595968Z Autotune Choices Stats: 2025-09-07T13:33:49.3597102Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_mm_33", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.0072639998979866505, "best_triton_pos": 0} 2025-09-07T13:33:49.3781202Z AUTOTUNE mm(25088x32, 32x32) 2025-09-07T13:33:49.3781548Z strides: [32, 1], [1, 32] 2025-09-07T13:33:49.3781815Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:33:49.3782490Z triton_mm_33 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:33:49.3783462Z triton_mm_40 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:33:49.3784465Z triton_mm_32 0.0073 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:33:49.3785621Z triton_mm_37 0.0074 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:33:49.3786691Z triton_mm_38 0.0074 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:33:49.3787727Z triton_mm_44 0.0074 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:33:49.3788691Z triton_mm_31 0.0074 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:33:49.3789974Z triton_mm_36 0.0074 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:33:49.3791021Z triton_mm_34 0.0074 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:33:49.3791963Z triton_mm_39 0.0074 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:33:49.3792804Z SingleProcess AUTOTUNE benchmarking takes 0.2113 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T13:33:49.6178677Z Autotune Choices Stats: 2025-09-07T13:33:49.6179728Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_46", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006271999794989824, "best_triton_pos": 0} 2025-09-07T13:33:49.6363883Z AUTOTUNE mm(6272x32, 32x64) 2025-09-07T13:33:49.6364154Z strides: [32, 1], [1, 32] 2025-09-07T13:33:49.6364417Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:33:49.6365270Z triton_mm_46 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:33:49.6366281Z triton_mm_48 0.0066 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:33:49.6367575Z triton_mm_55 0.0066 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:33:49.6368469Z triton_mm_56 0.0066 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:33:49.6369342Z triton_mm_49 0.0066 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:33:49.6370219Z triton_mm_51 0.0067 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:33:49.6371088Z triton_mm_52 0.0067 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:33:49.6371957Z triton_mm_54 0.0067 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:33:49.6372838Z triton_mm_60 0.0067 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:33:49.6373726Z triton_mm_47 0.0067 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:33:49.6374502Z SingleProcess AUTOTUNE benchmarking takes 0.2232 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T13:33:50.1535780Z Autotune Choices Stats: 2025-09-07T13:33:50.1536846Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_62", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006943999789655209, "best_triton_pos": 0} 2025-09-07T13:33:50.1723358Z AUTOTUNE mm(6272x64, 64x64) 2025-09-07T13:33:50.1723629Z strides: [64, 1], [1, 64] 2025-09-07T13:33:50.1723898Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:33:50.1724771Z triton_mm_62 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:33:50.1726091Z triton_mm_71 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:33:50.1727160Z triton_mm_68 0.0070 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:33:50.1728044Z triton_mm_70 0.0071 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:33:50.1728933Z triton_mm_65 0.0071 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:33:50.1729836Z triton_mm_75 0.0072 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:33:50.1730712Z triton_mm_73 0.0072 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:33:50.1731571Z triton_mm_69 0.0072 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:33:50.1732572Z triton_mm_74 0.0072 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:33:50.1733463Z triton_mm_78 0.0072 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:33:50.1734242Z SingleProcess AUTOTUNE benchmarking takes 0.2476 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:33:50.4475495Z Autotune Choices Stats: 2025-09-07T13:33:50.4476514Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_80", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006591999903321266, "best_triton_pos": 0} 2025-09-07T13:33:50.5117448Z AUTOTUNE mm(1568x64, 64x128) 2025-09-07T13:33:50.5117787Z strides: [64, 1], [1, 64] 2025-09-07T13:33:50.5118115Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:33:50.5118794Z triton_mm_80 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:33:50.5119798Z triton_mm_82 0.0067 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:33:50.5120766Z triton_mm_83 0.0067 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:33:50.5121731Z triton_mm_81 0.0067 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:33:50.5122673Z triton_mm_87 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:33:50.5123608Z triton_mm_86 0.0069 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:33:50.5125148Z triton_mm_90 0.0071 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:33:50.5126273Z triton_mm_91 0.0071 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:33:50.5127329Z triton_mm_88 0.0072 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:33:50.5128219Z triton_mm_92 0.0073 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:33:50.5128995Z SingleProcess AUTOTUNE benchmarking takes 0.3032 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:33:51.2048305Z Autotune Choices Stats: 2025-09-07T13:33:51.2049914Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_224", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.006912000011652708, "best_triton_pos": 0} 2025-09-07T13:33:51.2268314Z AUTOTUNE mm(392x128, 128x256) 2025-09-07T13:33:51.2268561Z strides: [128, 1], [1, 128] 2025-09-07T13:33:51.2268881Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:33:51.2269568Z triton_mm_224 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:33:51.2270810Z triton_mm_219 0.0070 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:33:51.2271797Z triton_mm_220 0.0070 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:33:51.2272755Z triton_mm_218 0.0070 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:33:51.2273726Z triton_mm_228 0.0071 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:33:51.2274688Z triton_mm_231 0.0072 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:33:51.2275955Z triton_mm_227 0.0073 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:33:51.2276563Z mm 0.0074 ms 93.5% 2025-09-07T13:33:51.2277135Z triton_mm_226 0.0074 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:33:51.2278094Z triton_mm_225 0.0075 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:33:51.2278876Z SingleProcess AUTOTUNE benchmarking takes 0.2629 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:33:54.1030768Z pass 2025-09-07T13:33:57.5550611Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:33:57.5553072Z import pynvml # type: ignore[import] 2025-09-07T13:34:00.5384869Z 2025-09-07T13:34:01.6065842Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:34:01.6066296Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:34:01.6151625Z cuda eval levit_128 2025-09-07T13:34:53.2611273Z ERROR:common: 2025-09-07T13:34:53.2611580Z Traceback (most recent call last): 2025-09-07T13:34:53.2612111Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2326, in check_accuracy 2025-09-07T13:34:53.2612627Z new_result = self.run_n_iterations( 2025-09-07T13:34:53.2613130Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2036, in run_n_iterations 2025-09-07T13:34:53.2613679Z return model_iter_fn(mod, inputs, collect_outputs=True) 2025-09-07T13:34:53.2614341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper 2025-09-07T13:34:53.2615278Z return fn(*args, **kwargs) 2025-09-07T13:34:53.2615702Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 434, in forward_pass 2025-09-07T13:34:53.2616158Z def forward_pass(self, mod, inputs, collect_outputs=True): 2025-09-07T13:34:53.2616669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T13:34:53.2617107Z return fn(*args, **kwargs) 2025-09-07T13:34:53.2617534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1130, in forward 2025-09-07T13:34:53.2617990Z return compiled_fn(full_args) 2025-09-07T13:34:53.2618502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper 2025-09-07T13:34:53.2619050Z all_outs = call_func_at_runtime_with_args( 2025-09-07T13:34:53.2620064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args 2025-09-07T13:34:53.2620627Z out = normalize_as_list(f(args)) 2025-09-07T13:34:53.2621137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 724, in inner_fn 2025-09-07T13:34:53.2621750Z outs = compiled_fn(args) 2025-09-07T13:34:53.2622233Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper 2025-09-07T13:34:53.2622743Z return compiled_fn(runtime_args) 2025-09-07T13:34:53.2623185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 613, in __call__ 2025-09-07T13:34:53.2623637Z return self.current_callable(inputs) 2025-09-07T13:34:53.2624142Z File "/tmp/torchinductor_jenkins/em/cem6jrngdhf3a5bcqfdybbfue2l55swbtruf5matyarjylck3uuw.py", line 9201, in call 2025-09-07T13:34:53.2624664Z (buf266,) = self.partitions[0](partition0_args) 2025-09-07T13:34:53.2625290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1772, in run 2025-09-07T13:34:53.2625779Z return compiled_fn(new_inputs) # type: ignore[arg-type] 2025-09-07T13:34:53.2626336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 404, in deferred_cudagraphify 2025-09-07T13:34:53.2626914Z fn, out = cudagraphify(model, inputs, new_static_input_idxs, *args, **kwargs) 2025-09-07T13:34:53.2627466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 463, in cudagraphify 2025-09-07T13:34:53.2627924Z return manager.add_function( 2025-09-07T13:34:53.2628364Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 2316, in add_function 2025-09-07T13:34:53.2628814Z return fn, fn(inputs) 2025-09-07T13:34:53.2629204Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 2012, in run 2025-09-07T13:34:53.2629773Z out = self._run(new_inputs, function_id) 2025-09-07T13:34:53.2630211Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 2116, in _run 2025-09-07T13:34:53.2630821Z return self.run_eager(new_inputs, function_id) 2025-09-07T13:34:53.2631391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 2277, in run_eager 2025-09-07T13:34:53.2631832Z return node.run(new_inputs) 2025-09-07T13:34:53.2632240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 671, in run 2025-09-07T13:34:53.2632710Z non_cudagraph_inps_storages = get_non_cudagraph_inps() 2025-09-07T13:34:53.2633241Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 663, in get_non_cudagraph_inps 2025-09-07T13:34:53.2633724Z non_cudagraph_inps = [ 2025-09-07T13:34:53.2634144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 667, in 2025-09-07T13:34:53.2634660Z and t.untyped_storage().data_ptr() not in existing_path_data_ptrs 2025-09-07T13:34:53.2635589Z RuntimeError: Error: accessing tensor output of CUDAGraphs that has been overwritten by a subsequent run. Stack trace: File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 436, in forward_pass 2025-09-07T13:34:53.2636290Z return mod(*inputs) 2025-09-07T13:34:53.2636652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 718, in forward 2025-09-07T13:34:53.2637046Z x = self.forward_features(x) 2025-09-07T13:34:53.2637449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 709, in forward_features 2025-09-07T13:34:53.2637866Z x = self.stages(x) 2025-09-07T13:34:53.2638293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 520, in forward 2025-09-07T13:34:53.2638678Z x = self.blocks(x) 2025-09-07T13:34:53.2639011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 458, in forward 2025-09-07T13:34:53.2639403Z x = x + self.drop_path1(self.attn(x)) 2025-09-07T13:34:53.2639791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 237, in forward 2025-09-07T13:34:53.2640222Z attn = q @ k * self.scale + self.get_attention_biases(x.device) 2025-09-07T13:34:53.2640702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 216, in get_attention_biases 2025-09-07T13:34:53.2641673Z self.attention_bias_cache[device_key] = self.attention_biases[:, self.attention_bias_idxs]. To prevent overwriting, clone the tensor outside of torch.compile() or call torch.compiler.cudagraph_mark_step_begin() before each model invocation. 2025-09-07T13:34:53.2642527Z TorchDynamo optimized model failed to run because of following error 2025-09-07T13:34:53.2847188Z fail_to_run 2025-09-07T13:34:57.5617305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:34:57.5618398Z import pynvml # type: ignore[import] 2025-09-07T13:35:00.5697632Z 2025-09-07T13:35:01.7381090Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:35:01.7381524Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:35:01.7422267Z cuda eval mixer_b16_224 2025-09-07T13:35:15.5391480Z pass 2025-09-07T13:35:19.0505496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:35:19.0507419Z import pynvml # type: ignore[import] 2025-09-07T13:35:22.9154644Z 2025-09-07T13:35:24.0865478Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:35:24.0865799Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:35:24.0968066Z cuda eval mixnet_l 2025-09-07T13:35:49.4492570Z Autotune Choices Stats: 2025-09-07T13:35:49.4493773Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_164", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.01206399966031313, "best_triton_pos": 0} 2025-09-07T13:35:49.4698885Z AUTOTUNE mm(25088x40, 40x240) 2025-09-07T13:35:49.4699181Z strides: [40, 1], [1, 40] 2025-09-07T13:35:49.4699413Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:35:49.4700002Z triton_mm_164 0.0121 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:49.4700853Z triton_mm_169 0.0124 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:49.4701814Z triton_mm_161 0.0126 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:35:49.4702599Z triton_mm_165 0.0127 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:35:49.4703382Z triton_mm_171 0.0132 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:35:49.4704468Z triton_mm_170 0.0135 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:49.4705544Z triton_mm_166 0.0138 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:49.4706355Z triton_mm_167 0.0140 ms 86.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:35:49.4706849Z mm 0.0141 ms 85.7% 2025-09-07T13:35:49.4707327Z triton_mm_160 0.0142 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:35:49.4708021Z SingleProcess AUTOTUNE benchmarking takes 0.2885 seconds and 0.0006 seconds precompiling for 20 choices 2025-09-07T13:35:50.9147860Z Autotune Choices Stats: 2025-09-07T13:35:50.9148977Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_497", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.00825599953532219, "best_triton_pos": 0} 2025-09-07T13:35:50.9342466Z AUTOTUNE mm(6272x56, 56x336) 2025-09-07T13:35:50.9342790Z strides: [56, 1], [1, 56] 2025-09-07T13:35:50.9343072Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:35:50.9343757Z triton_mm_497 0.0083 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:35:50.9344789Z triton_mm_498 0.0083 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:35:50.9346247Z triton_mm_501 0.0084 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:50.9347708Z triton_mm_502 0.0085 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:35:50.9348825Z triton_mm_494 0.0086 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:35:50.9349931Z triton_mm_507 0.0087 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:50.9351046Z triton_mm_508 0.0088 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:35:50.9352161Z triton_mm_503 0.0094 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:50.9353225Z triton_mm_506 0.0094 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:50.9354280Z triton_mm_504 0.0096 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:35:50.9355380Z SingleProcess AUTOTUNE benchmarking takes 0.2996 seconds and 0.0004 seconds precompiling for 20 choices 2025-09-07T13:35:52.5183328Z Autotune Choices Stats: 2025-09-07T13:35:52.5184856Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1259", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.008991999551653862, "best_triton_pos": 0} 2025-09-07T13:35:52.5383385Z AUTOTUNE mm(1568x160, 160x960) 2025-09-07T13:35:52.5383689Z strides: [160, 1], [1, 160] 2025-09-07T13:35:52.5383970Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:35:52.5384677Z triton_mm_1259 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:52.5385923Z triton_mm_1261 0.0092 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:35:52.5386946Z triton_mm_1256 0.0092 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:52.5387937Z triton_mm_1257 0.0093 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:35:52.5388921Z triton_mm_1260 0.0093 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:52.5389911Z triton_mm_1252 0.0094 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:52.5390906Z triton_mm_1253 0.0097 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:35:52.5392102Z triton_mm_1250 0.0103 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:35:52.5393162Z triton_mm_1258 0.0105 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:35:52.5394614Z triton_mm_1255 0.0120 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:35:52.5395873Z SingleProcess AUTOTUNE benchmarking takes 0.3687 seconds and 0.0005 seconds precompiling for 20 choices 2025-09-07T13:35:53.8340327Z Autotune Choices Stats: 2025-09-07T13:35:53.8341793Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.0077760000713169575, "best_triton_pos": 1, "best_triton_time": 0.007840000092983246, "best_triton_kernel": "triton_mm_859", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8"} 2025-09-07T13:35:53.8531053Z AUTOTUNE mm(1568x104, 104x624) 2025-09-07T13:35:53.8531400Z strides: [104, 1], [1, 104] 2025-09-07T13:35:53.8531697Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:35:53.8531991Z mm 0.0078 ms 100.0% 2025-09-07T13:35:53.8532616Z triton_mm_859 0.0078 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:35:53.8533631Z triton_mm_855 0.0079 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:35:53.8534619Z triton_mm_857 0.0079 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:35:53.8536332Z triton_mm_856 0.0079 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:53.8537330Z triton_mm_854 0.0080 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:53.8538301Z triton_mm_858 0.0080 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:53.8539257Z triton_mm_852 0.0083 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:35:53.8540220Z triton_mm_853 0.0083 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:35:53.8541208Z triton_mm_861 0.0085 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:53.8542207Z SingleProcess AUTOTUNE benchmarking takes 0.2637 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:35:54.3754004Z Autotune Choices Stats: 2025-09-07T13:35:54.3755676Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.00854399986565113, "best_triton_pos": 1, "best_triton_time": 0.008960000239312649, "best_triton_kernel": "triton_mm_1326", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T13:35:54.3946623Z AUTOTUNE mm(392x264, 264x1584) 2025-09-07T13:35:54.3946907Z strides: [264, 1], [1, 264] 2025-09-07T13:35:54.3947173Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:35:54.3947471Z mm 0.0085 ms 100.0% 2025-09-07T13:35:54.3948102Z triton_mm_1326 0.0090 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:54.3949425Z triton_mm_1325 0.0092 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:35:54.3950614Z triton_mm_1329 0.0093 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:35:54.3951766Z triton_mm_1324 0.0098 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:54.3952845Z triton_mm_1322 0.0099 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:35:54.3953831Z triton_mm_1327 0.0099 ms 86.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:35:54.3954820Z triton_mm_1332 0.0100 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:54.3956012Z triton_mm_1328 0.0100 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:54.3957010Z triton_mm_1333 0.0100 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:35:54.3957881Z SingleProcess AUTOTUNE benchmarking takes 0.2579 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:35:55.7682873Z Autotune Choices Stats: 2025-09-07T13:35:55.7684461Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_203", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.00863999966531992, "best_triton_pos": 0} 2025-09-07T13:35:55.7876330Z AUTOTUNE mm(6272x240, 240x56) 2025-09-07T13:35:55.7876648Z strides: [240, 1], [1, 240] 2025-09-07T13:35:55.7876923Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:35:55.7877592Z triton_mm_203 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:35:55.7878603Z triton_mm_204 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:35:55.7879218Z mm 0.0087 ms 99.3% 2025-09-07T13:35:55.7879799Z triton_mm_207 0.0087 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:55.7880785Z triton_mm_208 0.0087 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:35:55.7881753Z triton_mm_206 0.0091 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:35:55.7882716Z triton_mm_213 0.0092 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:35:55.7883649Z triton_mm_197 0.0092 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:35:55.7884542Z triton_mm_212 0.0092 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:55.7886129Z triton_mm_205 0.0094 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:55.7887081Z SingleProcess AUTOTUNE benchmarking takes 0.2420 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:35:56.3292155Z Autotune Choices Stats: 2025-09-07T13:35:56.3293221Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_898", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.00863999966531992, "best_triton_pos": 0} 2025-09-07T13:35:56.3479259Z AUTOTUNE mm(1568x624, 624x160) 2025-09-07T13:35:56.3479538Z strides: [624, 1], [1, 624] 2025-09-07T13:35:56.3479813Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:35:56.3480474Z triton_mm_898 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:35:56.3481109Z mm 0.0090 ms 96.4% 2025-09-07T13:35:56.3481693Z triton_mm_902 0.0093 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:35:56.3482666Z triton_mm_897 0.0101 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:35:56.3483633Z triton_mm_901 0.0104 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:35:56.3484912Z triton_mm_906 0.0105 ms 82.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:35:56.3486218Z triton_mm_895 0.0113 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:35:56.3487122Z triton_mm_905 0.0113 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:56.3488028Z triton_mm_904 0.0117 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:35:56.3488926Z triton_mm_908 0.0118 ms 73.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:35:56.3489715Z SingleProcess AUTOTUNE benchmarking takes 0.2534 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:35:56.8938646Z Autotune Choices Stats: 2025-09-07T13:35:56.8939703Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_536", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007903999648988247, "best_triton_pos": 0} 2025-09-07T13:35:56.9128632Z AUTOTUNE mm(1568x336, 336x104) 2025-09-07T13:35:56.9128953Z strides: [336, 1], [1, 336] 2025-09-07T13:35:56.9129241Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:35:56.9129976Z triton_mm_536 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:35:56.9130633Z mm 0.0082 ms 96.9% 2025-09-07T13:35:56.9131244Z triton_mm_540 0.0082 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:35:56.9132595Z triton_mm_535 0.0084 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:35:56.9133745Z triton_mm_534 0.0086 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:35:56.9134739Z triton_mm_539 0.0088 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:35:56.9136236Z triton_mm_544 0.0090 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:35:56.9137143Z triton_mm_543 0.0091 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:56.9138045Z triton_mm_533 0.0091 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:35:56.9138946Z triton_mm_542 0.0091 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:35:56.9139728Z SingleProcess AUTOTUNE benchmarking takes 0.2556 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:35:57.4788105Z Autotune Choices Stats: 2025-09-07T13:35:57.4789579Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.008991999551653862, "best_triton_pos": 1, "best_triton_time": 0.009119999594986439, "best_triton_kernel": "triton_mm_1300", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T13:35:57.4983696Z AUTOTUNE mm(392x960, 960x264) 2025-09-07T13:35:57.4983965Z strides: [960, 1], [1, 960] 2025-09-07T13:35:57.4984225Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:35:57.4984516Z mm 0.0090 ms 100.0% 2025-09-07T13:35:57.4985516Z triton_mm_1300 0.0091 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:35:57.4986549Z triton_mm_1304 0.0096 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:35:57.4987545Z triton_mm_1308 0.0108 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:35:57.4988550Z triton_mm_1299 0.0111 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:35:57.4989539Z triton_mm_1303 0.0114 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:35:57.4990496Z triton_mm_1298 0.0117 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:35:57.4991455Z triton_mm_1307 0.0120 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:57.4992422Z triton_mm_1314 0.0121 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:35:57.4993564Z triton_mm_1297 0.0127 ms 71.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:35:57.4994724Z SingleProcess AUTOTUNE benchmarking takes 0.2704 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:35:58.7423611Z Autotune Choices Stats: 2025-09-07T13:35:58.7424671Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.010239999741315842, "best_triton_pos": 0} 2025-09-07T13:35:58.7615173Z AUTOTUNE mm(100352x32, 32x32) 2025-09-07T13:35:58.7615469Z strides: [32, 1], [1, 32] 2025-09-07T13:35:58.7615741Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:35:58.7616422Z triton_mm_17 0.0102 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:58.7617403Z triton_mm_16 0.0103 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:35:58.7618358Z triton_mm_19 0.0104 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:35:58.7619315Z triton_mm_20 0.0104 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:35:58.7620257Z triton_mm_14 0.0105 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:58.7621649Z triton_mm_18 0.0105 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:35:58.7622650Z triton_mm_10 0.0105 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:35:58.7623656Z triton_mm_12 0.0105 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:35:58.7624470Z triton_mm_7 0.0106 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:35:58.7625427Z triton_mm_9 0.0107 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:35:58.7626165Z SingleProcess AUTOTUNE benchmarking takes 0.2120 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T13:35:59.0077320Z Autotune Choices Stats: 2025-09-07T13:35:59.0078293Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_126", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.008960000239312649, "best_triton_pos": 0} 2025-09-07T13:35:59.0264574Z AUTOTUNE mm(25088x60, 60x20) 2025-09-07T13:35:59.0265312Z strides: [120, 1], [1, 60] 2025-09-07T13:35:59.0265601Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:35:59.0266292Z triton_mm_126 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:35:59.0267337Z triton_mm_130 0.0091 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:35:59.0268619Z triton_mm_120 0.0092 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:35:59.0269753Z triton_mm_134 0.0092 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:59.0270847Z triton_mm_129 0.0092 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:59.0271811Z triton_mm_123 0.0092 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:35:59.0272815Z triton_mm_135 0.0093 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:35:59.0273986Z triton_mm_125 0.0094 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:35:59.0275073Z triton_mm_127 0.0094 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:59.0276042Z triton_mm_128 0.0096 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:35:59.0276902Z SingleProcess AUTOTUNE benchmarking takes 0.2304 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T13:35:59.2386171Z Autotune Choices Stats: 2025-09-07T13:35:59.2387327Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_146", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.008991999551653862, "best_triton_pos": 0} 2025-09-07T13:35:59.2569026Z AUTOTUNE mm(25088x60, 60x20) 2025-09-07T13:35:59.2569281Z strides: [120, 1], [1, 60] 2025-09-07T13:35:59.2569549Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:35:59.2570224Z triton_mm_146 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:59.2571210Z triton_mm_137 0.0091 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:35:59.2572200Z triton_mm_140 0.0091 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:35:59.2573210Z triton_mm_145 0.0091 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:35:59.2574211Z triton_mm_143 0.0091 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:35:59.2575347Z triton_mm_147 0.0092 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:35:59.2576241Z triton_mm_152 0.0092 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:35:59.2577133Z triton_mm_144 0.0093 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:59.2578018Z triton_mm_139 0.0094 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:35:59.2579124Z triton_mm_142 0.0095 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:35:59.2579967Z SingleProcess AUTOTUNE benchmarking takes 0.2298 seconds and 0.0003 seconds precompiling for 18 choices 2025-09-07T13:35:59.8198636Z Autotune Choices Stats: 2025-09-07T13:35:59.8199861Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.008224000222980976, "best_triton_pos": 1, "best_triton_time": 0.009247999638319016, "best_triton_kernel": "triton_mm_1598", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8"} 2025-09-07T13:35:59.8394325Z AUTOTUNE mm(392x264, 264x1536) 2025-09-07T13:35:59.8394589Z strides: [264, 1], [1, 264] 2025-09-07T13:35:59.8394837Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:35:59.8395306Z mm 0.0082 ms 100.0% 2025-09-07T13:35:59.8395905Z triton_mm_1598 0.0092 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:35:59.8396899Z triton_mm_1599 0.0092 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:59.8397892Z triton_mm_1602 0.0094 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:35:59.8399016Z triton_mm_1597 0.0098 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:59.8400017Z triton_mm_1600 0.0099 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:35:59.8401004Z triton_mm_1595 0.0100 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:35:59.8401983Z triton_mm_1606 0.0101 ms 81.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:35:59.8402967Z triton_mm_1601 0.0101 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:59.8403951Z triton_mm_1605 0.0104 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:35:59.8404789Z SingleProcess AUTOTUNE benchmarking takes 0.2565 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:36:10.2987601Z pass 2025-09-07T13:36:14.5971806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:36:14.5972966Z import pynvml # type: ignore[import] 2025-09-07T13:36:17.5689566Z 2025-09-07T13:36:18.5933162Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:36:18.5933537Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:36:18.5993754Z cuda eval mnasnet_100 2025-09-07T13:36:33.6766367Z Autotune Choices Stats: 2025-09-07T13:36:33.6767419Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_mm_26", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.010400000028312206, "best_triton_pos": 0} 2025-09-07T13:36:33.6976529Z AUTOTUNE mm(100352x16, 16x48) 2025-09-07T13:36:33.6977178Z strides: [16, 1], [1, 16] 2025-09-07T13:36:33.6977556Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:33.6978421Z triton_mm_26 0.0104 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:33.6979799Z triton_mm_23 0.0105 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:36:33.6981163Z triton_mm_28 0.0105 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:36:33.6982617Z triton_mm_25 0.0106 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:33.6983971Z triton_mm_31 0.0108 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:36:33.6985559Z triton_mm_30 0.0108 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:33.6986915Z triton_mm_32 0.0108 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:33.6987768Z mm 0.0112 ms 93.1% 2025-09-07T13:36:33.6988718Z triton_mm_24 0.0113 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:36:33.6990062Z triton_mm_29 0.0113 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:33.6991248Z SingleProcess AUTOTUNE benchmarking takes 0.2303 seconds and 0.0004 seconds precompiling for 16 choices 2025-09-07T13:36:34.2734438Z Autotune Choices Stats: 2025-09-07T13:36:34.2735738Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_59", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.007872000336647034, "best_triton_pos": 0} 2025-09-07T13:36:34.2928403Z AUTOTUNE mm(25088x24, 24x72) 2025-09-07T13:36:34.2928709Z strides: [24, 1], [1, 24] 2025-09-07T13:36:34.2928976Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:34.2929702Z triton_mm_59 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:34.2930707Z triton_mm_61 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:36:34.2931710Z triton_mm_64 0.0079 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:36:34.2932697Z triton_mm_65 0.0081 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:34.2933799Z triton_mm_60 0.0081 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:34.2935375Z triton_mm_54 0.0082 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:34.2936510Z triton_mm_56 0.0082 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:36:34.2937565Z triton_mm_58 0.0083 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:34.2938524Z triton_mm_66 0.0083 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:34.2939494Z triton_mm_62 0.0084 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:34.2940341Z SingleProcess AUTOTUNE benchmarking takes 0.2402 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T13:36:34.8280094Z Autotune Choices Stats: 2025-09-07T13:36:34.8281221Z {"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8", "best_time": 0.009088000282645226, "best_triton_pos": 0} 2025-09-07T13:36:34.8475668Z AUTOTUNE mm(100352x32, 32x16) 2025-09-07T13:36:34.8475946Z strides: [32, 1], [1, 32] 2025-09-07T13:36:34.8476216Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:34.8477373Z triton_mm_16 0.0091 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:36:34.8478442Z triton_mm_15 0.0092 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:34.8479487Z triton_mm_17 0.0092 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:34.8480538Z triton_mm_12 0.0092 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:34.8481556Z triton_mm_11 0.0093 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:36:34.8482603Z triton_mm_9 0.0094 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:34.8483651Z triton_mm_13 0.0094 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:36:34.8484667Z triton_mm_7 0.0094 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T13:36:34.8485947Z triton_mm_8 0.0095 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T13:36:34.8486897Z triton_mm_14 0.0095 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:34.8487739Z SingleProcess AUTOTUNE benchmarking takes 0.1798 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T13:36:35.3931862Z Autotune Choices Stats: 2025-09-07T13:36:35.3933382Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_234", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007391999941319227, "best_triton_pos": 0} 2025-09-07T13:36:35.4125561Z AUTOTUNE mm(6272x40, 40x240) 2025-09-07T13:36:35.4125876Z strides: [40, 1], [1, 40] 2025-09-07T13:36:35.4126141Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:35.4126845Z triton_mm_234 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:36:35.4127894Z triton_mm_235 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:35.4128968Z triton_mm_239 0.0076 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:36:35.4130050Z triton_mm_231 0.0077 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:35.4131095Z triton_mm_238 0.0077 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:35.4132144Z triton_mm_245 0.0077 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:35.4133367Z triton_mm_244 0.0078 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:35.4134056Z mm 0.0080 ms 92.0% 2025-09-07T13:36:35.4134590Z triton_mm_241 0.0082 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:35.4135647Z triton_mm_227 0.0083 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:36:35.4136440Z SingleProcess AUTOTUNE benchmarking takes 0.2639 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:36:35.9734565Z Autotune Choices Stats: 2025-09-07T13:36:35.9736044Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_389", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.0071680000983178616, "best_triton_pos": 0} 2025-09-07T13:36:35.9933717Z AUTOTUNE mm(1568x96, 96x576) 2025-09-07T13:36:35.9933984Z strides: [96, 1], [1, 96] 2025-09-07T13:36:35.9934265Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:35.9935089Z triton_mm_389 0.0072 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:35.9936126Z triton_mm_393 0.0074 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:35.9937105Z triton_mm_388 0.0074 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:35.9938073Z triton_mm_390 0.0075 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:35.9939068Z triton_mm_391 0.0076 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:36:35.9940569Z triton_mm_392 0.0076 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:35.9941747Z triton_mm_386 0.0078 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:36:35.9942709Z triton_mm_385 0.0080 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:36:35.9943717Z triton_mm_395 0.0081 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:35.9944710Z triton_mm_397 0.0082 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:35.9945626Z SingleProcess AUTOTUNE benchmarking takes 0.2606 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:36:36.5860748Z Autotune Choices Stats: 2025-09-07T13:36:36.5861908Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_165", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.006912000011652708, "best_triton_pos": 0} 2025-09-07T13:36:36.6057834Z AUTOTUNE mm(6272x40, 40x120) 2025-09-07T13:36:36.6058097Z strides: [40, 1], [1, 40] 2025-09-07T13:36:36.6058360Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:36.6059309Z triton_mm_165 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:36:36.6060327Z triton_mm_164 0.0069 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:36.6061385Z triton_mm_160 0.0070 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:36:36.6062397Z triton_mm_161 0.0070 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:36.6063387Z triton_mm_157 0.0071 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:36.6064410Z triton_mm_154 0.0072 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:36:36.6065502Z triton_mm_153 0.0072 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:36:36.6066411Z triton_mm_171 0.0075 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:36.6075679Z triton_mm_156 0.0075 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:36.6076593Z triton_mm_167 0.0075 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:36.6077316Z SingleProcess AUTOTUNE benchmarking takes 0.2560 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:36:37.2061724Z Autotune Choices Stats: 2025-09-07T13:36:37.2063674Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_279", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.007040000054985285, "best_triton_pos": 0} 2025-09-07T13:36:37.2258077Z AUTOTUNE mm(1568x80, 80x480) 2025-09-07T13:36:37.2258549Z strides: [80, 1], [1, 80] 2025-09-07T13:36:37.2258933Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:37.2259932Z triton_mm_279 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:37.2261411Z triton_mm_276 0.0071 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:37.2262869Z triton_mm_275 0.0071 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:37.2264303Z triton_mm_274 0.0072 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:37.2265909Z triton_mm_278 0.0072 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:37.2267272Z triton_mm_277 0.0074 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:36:37.2269011Z triton_mm_272 0.0075 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:36:37.2269924Z mm 0.0076 ms 92.4% 2025-09-07T13:36:37.2270784Z triton_mm_281 0.0077 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:37.2272196Z triton_mm_271 0.0077 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:36:37.2273370Z SingleProcess AUTOTUNE benchmarking takes 0.2647 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:36:37.8269814Z Autotune Choices Stats: 2025-09-07T13:36:37.8271114Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_40", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007648000027984381, "best_triton_pos": 0} 2025-09-07T13:36:37.8472714Z AUTOTUNE mm(25088x48, 48x24) 2025-09-07T13:36:37.8473069Z strides: [48, 1], [1, 48] 2025-09-07T13:36:37.8473383Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:37.8474197Z triton_mm_40 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:36:37.8475838Z triton_mm_34 0.0078 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:36:37.8477057Z triton_mm_44 0.0078 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:36:37.8478316Z triton_mm_37 0.0079 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:37.8480154Z triton_mm_49 0.0079 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:37.8481798Z triton_mm_43 0.0079 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:37.8483196Z triton_mm_42 0.0080 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:37.8484458Z triton_mm_48 0.0081 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:37.8485846Z triton_mm_38 0.0082 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:36:37.8487037Z triton_mm_46 0.0082 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:37.8488090Z SingleProcess AUTOTUNE benchmarking takes 0.2411 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T13:36:38.7478008Z Autotune Choices Stats: 2025-09-07T13:36:38.7479046Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_80", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.009184000082314014, "best_triton_pos": 0} 2025-09-07T13:36:38.7681006Z AUTOTUNE mm(25088x72, 72x24) 2025-09-07T13:36:38.7681272Z strides: [72, 1], [1, 72] 2025-09-07T13:36:38.7681815Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:38.7682488Z triton_mm_80 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:38.7683491Z triton_mm_78 0.0093 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:36:38.7684471Z triton_mm_70 0.0093 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:38.7685588Z triton_mm_76 0.0094 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:38.7686552Z triton_mm_68 0.0096 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:36:38.7687509Z triton_mm_67 0.0096 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T13:36:38.7688469Z triton_mm_81 0.0097 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T13:36:38.7689449Z triton_mm_79 0.0098 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:38.7690412Z triton_mm_83 0.0098 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:38.7691372Z triton_mm_71 0.0098 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:38.7692208Z SingleProcess AUTOTUNE benchmarking takes 0.6163 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T13:36:39.3462796Z Autotune Choices Stats: 2025-09-07T13:36:39.3464789Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_463", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007840000092983246, "best_triton_pos": 0} 2025-09-07T13:36:39.3666361Z AUTOTUNE mm(392x192, 192x1152) 2025-09-07T13:36:39.3666758Z strides: [192, 1], [1, 192] 2025-09-07T13:36:39.3667138Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:39.3668067Z triton_mm_463 0.0078 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:39.3669481Z triton_mm_462 0.0081 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:36:39.3670355Z mm 0.0083 ms 94.6% 2025-09-07T13:36:39.3671172Z triton_mm_465 0.0085 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:39.3672590Z triton_mm_466 0.0085 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:39.3673967Z triton_mm_469 0.0085 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:39.3675789Z triton_mm_467 0.0087 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:36:39.3677218Z triton_mm_458 0.0087 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:39.3678599Z triton_mm_457 0.0088 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:39.3679940Z triton_mm_464 0.0088 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:39.3681132Z SingleProcess AUTOTUNE benchmarking takes 0.2658 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:36:40.0262401Z Autotune Choices Stats: 2025-09-07T13:36:40.0263497Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_136", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.007552000228315592, "best_triton_pos": 0} 2025-09-07T13:36:40.0464553Z AUTOTUNE mm(6272x72, 72x40) 2025-09-07T13:36:40.0464884Z strides: [72, 1], [1, 72] 2025-09-07T13:36:40.0465348Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:40.0466088Z triton_mm_136 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:36:40.0467167Z triton_mm_148 0.0076 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:40.0468216Z triton_mm_151 0.0077 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:40.0469261Z triton_mm_138 0.0078 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:40.0470884Z triton_mm_142 0.0078 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:36:40.0472085Z triton_mm_149 0.0079 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:40.0473141Z triton_mm_144 0.0079 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:40.0474190Z triton_mm_146 0.0079 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:40.0474849Z mm 0.0079 ms 95.2% 2025-09-07T13:36:40.0475623Z triton_mm_147 0.0080 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:36:40.0476416Z SingleProcess AUTOTUNE benchmarking takes 0.2550 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T13:36:40.5868881Z Autotune Choices Stats: 2025-09-07T13:36:40.5870290Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_173", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.007679999805986881, "best_triton_pos": 0} 2025-09-07T13:36:40.6064685Z AUTOTUNE mm(6272x120, 120x40) 2025-09-07T13:36:40.6065408Z strides: [120, 1], [1, 120] 2025-09-07T13:36:40.6065767Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:40.6066945Z triton_mm_173 0.0077 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:36:40.6067864Z mm 0.0077 ms 99.6% 2025-09-07T13:36:40.6068677Z triton_mm_188 0.0079 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:40.6070033Z triton_mm_175 0.0080 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:40.6071368Z triton_mm_181 0.0080 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:40.6072708Z triton_mm_186 0.0080 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:40.6074056Z triton_mm_189 0.0080 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:40.6075565Z triton_mm_179 0.0081 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:36:40.6076907Z triton_mm_183 0.0081 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:40.6078221Z triton_mm_180 0.0081 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:40.6079388Z SingleProcess AUTOTUNE benchmarking takes 0.2465 seconds and 0.0003 seconds precompiling for 19 choices 2025-09-07T13:36:41.1863908Z Autotune Choices Stats: 2025-09-07T13:36:41.1865896Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_364", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008704000152647495, "best_triton_pos": 0} 2025-09-07T13:36:41.2063677Z AUTOTUNE mm(1568x480, 480x96) 2025-09-07T13:36:41.2063964Z strides: [480, 1], [1, 480] 2025-09-07T13:36:41.2064247Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:41.2065129Z triton_mm_364 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:41.2066197Z triton_mm_368 0.0087 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:41.2066827Z mm 0.0092 ms 94.8% 2025-09-07T13:36:41.2067389Z triton_mm_363 0.0099 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:41.2068368Z triton_mm_367 0.0099 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:36:41.2069356Z triton_mm_372 0.0099 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:36:41.2070314Z triton_mm_362 0.0101 ms 86.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:41.2071436Z triton_mm_361 0.0103 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:36:41.2072429Z triton_mm_371 0.0103 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:41.2073400Z triton_mm_374 0.0104 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:41.2074245Z SingleProcess AUTOTUNE benchmarking takes 0.2588 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:36:41.7609423Z Autotune Choices Stats: 2025-09-07T13:36:41.7610461Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_402", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008991999551653862, "best_triton_pos": 0} 2025-09-07T13:36:41.7810628Z AUTOTUNE mm(1568x576, 576x96) 2025-09-07T13:36:41.7810905Z strides: [576, 1], [1, 576] 2025-09-07T13:36:41.7811183Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:41.7811873Z triton_mm_402 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:41.7812512Z mm 0.0090 ms 99.6% 2025-09-07T13:36:41.7813113Z triton_mm_406 0.0092 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:41.7814089Z triton_mm_405 0.0098 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:36:41.7815357Z triton_mm_401 0.0100 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:41.7816321Z triton_mm_400 0.0103 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:41.7817504Z triton_mm_409 0.0104 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:41.7818525Z triton_mm_410 0.0105 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:36:41.7819432Z triton_mm_399 0.0108 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:36:41.7820335Z triton_mm_416 0.0110 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:41.7821118Z SingleProcess AUTOTUNE benchmarking takes 0.2647 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:36:42.6427974Z Autotune Choices Stats: 2025-09-07T13:36:42.6429028Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_288", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008576000109314919, "best_triton_pos": 0} 2025-09-07T13:36:42.6630216Z AUTOTUNE mm(1568x480, 480x80) 2025-09-07T13:36:42.6630502Z strides: [480, 1], [1, 480] 2025-09-07T13:36:42.6630752Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:42.6631410Z triton_mm_288 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:42.6632558Z mm 0.0087 ms 98.2% 2025-09-07T13:36:42.6633149Z triton_mm_292 0.0087 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:42.6634135Z triton_mm_291 0.0092 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:36:42.6635423Z triton_mm_286 0.0094 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:42.6636391Z triton_mm_287 0.0094 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:42.6637367Z triton_mm_296 0.0098 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:36:42.6638339Z triton_mm_285 0.0101 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:36:42.6639313Z triton_mm_294 0.0106 ms 81.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:42.6640280Z triton_mm_295 0.0107 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:42.6641140Z SingleProcess AUTOTUNE benchmarking takes 0.2620 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:36:43.2454307Z Autotune Choices Stats: 2025-09-07T13:36:43.2455818Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_592", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009247999638319016, "best_triton_pos": 0} 2025-09-07T13:36:43.2657058Z AUTOTUNE mm(392x1152, 1152x320) 2025-09-07T13:36:43.2657365Z strides: [1152, 1], [1, 1152] 2025-09-07T13:36:43.2657653Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:43.2658615Z triton_mm_592 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:43.2659422Z mm 0.0098 ms 94.1% 2025-09-07T13:36:43.2660044Z triton_mm_596 0.0101 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:43.2661040Z triton_mm_600 0.0112 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:36:43.2662133Z triton_mm_591 0.0128 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:43.2663137Z triton_mm_606 0.0130 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:43.2664124Z triton_mm_595 0.0131 ms 70.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:36:43.2665271Z triton_mm_590 0.0134 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:43.2666496Z triton_mm_599 0.0136 ms 68.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:43.2667485Z triton_mm_589 0.0136 ms 68.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:36:43.2668345Z SingleProcess AUTOTUNE benchmarking takes 0.2604 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:36:43.8234055Z Autotune Choices Stats: 2025-09-07T13:36:43.8235891Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.00800000037997961, "best_triton_pos": 1, "best_triton_time": 0.008063999935984612, "best_triton_kernel": "triton_mm_440", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T13:36:43.8449990Z AUTOTUNE mm(392x576, 576x192) 2025-09-07T13:36:43.8450358Z strides: [576, 1], [1, 576] 2025-09-07T13:36:43.8450671Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:43.8450957Z mm 0.0080 ms 100.0% 2025-09-07T13:36:43.8451621Z triton_mm_440 0.0081 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:43.8452729Z triton_mm_444 0.0082 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:43.8453782Z triton_mm_443 0.0091 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:36:43.8454836Z triton_mm_439 0.0092 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:43.8456318Z triton_mm_448 0.0093 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:36:43.8457605Z triton_mm_438 0.0094 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:43.8458805Z triton_mm_447 0.0097 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:43.8459889Z triton_mm_437 0.0100 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:36:43.8460858Z triton_mm_454 0.0101 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:43.8461836Z SingleProcess AUTOTUNE benchmarking takes 0.2727 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T13:36:44.4427058Z Autotune Choices Stats: 2025-09-07T13:36:44.4428212Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_478", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.00886400043964386, "best_triton_pos": 0} 2025-09-07T13:36:44.4652983Z AUTOTUNE mm(392x1152, 1152x192) 2025-09-07T13:36:44.4653312Z strides: [1152, 1], [1, 1152] 2025-09-07T13:36:44.4653554Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:44.4654139Z triton_mm_478 0.0089 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:44.4654657Z mm 0.0090 ms 98.9% 2025-09-07T13:36:44.4655925Z triton_mm_482 0.0096 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T13:36:44.4656775Z triton_mm_486 0.0105 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:36:44.4657560Z triton_mm_477 0.0122 ms 72.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:44.4658332Z triton_mm_481 0.0122 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:36:44.4659089Z triton_mm_476 0.0127 ms 69.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:44.4659863Z triton_mm_492 0.0128 ms 69.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:44.4660642Z triton_mm_485 0.0130 ms 68.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:44.4661552Z triton_mm_475 0.0133 ms 66.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:36:44.4662258Z SingleProcess AUTOTUNE benchmarking takes 0.2734 seconds and 0.0004 seconds precompiling for 20 choices 2025-09-07T13:36:45.8999568Z Autotune Choices Stats: 2025-09-07T13:36:45.9001366Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.007968000136315823, "best_triton_pos": 1, "best_triton_time": 0.00854399986565113, "best_triton_kernel": "triton_mm_618", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T13:36:45.9219915Z AUTOTUNE mm(392x320, 320x1280) 2025-09-07T13:36:45.9220410Z strides: [320, 1], [1, 320] 2025-09-07T13:36:45.9220813Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T13:36:45.9221217Z mm 0.0080 ms 100.0% 2025-09-07T13:36:45.9222717Z triton_mm_618 0.0085 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:45.9224429Z triton_mm_614 0.0086 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T13:36:45.9226394Z triton_mm_619 0.0086 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T13:36:45.9227896Z triton_mm_617 0.0089 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:45.9229376Z triton_mm_621 0.0090 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T13:36:45.9230851Z triton_mm_625 0.0092 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:45.9232329Z triton_mm_624 0.0092 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T13:36:45.9234026Z triton_mm_609 0.0095 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T13:36:45.9235717Z triton_mm_608 0.0097 ms 82.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T13:36:45.9237011Z SingleProcess AUTOTUNE benchmarking takes 0.2815 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T13:36:49.1159578Z pass 2025-09-07T13:36:51.7753377Z accuracy pass_rate=87.50% 2025-09-07T13:36:51.7758873Z calls_captured gmean=367.65x mean=508.875x 2025-09-07T13:36:51.7762541Z unique_graphs gmean=1.09x mean=1.125x 2025-09-07T13:36:51.7766526Z graph_breaks gmean=0.00x mean=0.000x 2025-09-07T13:36:51.7769657Z unique_graph_breaks gmean=0.00x mean=0.000x 2025-09-07T13:36:51.7773011Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T13:36:51.7776714Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T13:36:51.7779915Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T13:36:51.7781119Z compilation_latency mean=37.563 seconds 2025-09-07T13:36:52.8151839Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *cudagraphs_low_precision-true* ]] 2025-09-07T13:36:52.8153146Z + [[ inference == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T13:36:52.8154438Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --inference --quant --backend inductor --device cuda --total-partitions 7 --partition-id 3 --output /var/lib/jenkins/workspace/test/test-reports/inductor_cudagraphs_low_precision_timm_models_quant_inference_cuda_h100_accuracy.csv 2025-09-07T13:36:53.8239865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:36:53.8241588Z import pynvml # type: ignore[import] 2025-09-07T13:36:56.8528007Z usage: timm_models.py 2025-09-07T13:36:56.8528834Z [-h] 2025-09-07T13:36:56.8529047Z [--filter FILTER] 2025-09-07T13:36:56.8529285Z [--exclude EXCLUDE] 2025-09-07T13:36:56.8529542Z [--exclude-exact EXCLUDE_EXACT] 2025-09-07T13:36:56.8530063Z [--total-partitions {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}] 2025-09-07T13:36:56.8530572Z [--partition-id PARTITION_ID] 2025-09-07T13:36:56.8530847Z [--devices DEVICES] 2025-09-07T13:36:56.8531098Z [--device-index DEVICE_INDEX] 2025-09-07T13:36:56.8531363Z [--repeat REPEAT] 2025-09-07T13:36:56.8531637Z [--iterations-per-run ITERATIONS_PER_RUN] 2025-09-07T13:36:56.8531942Z [--randomize-input] 2025-09-07T13:36:56.8532183Z [--threads THREADS] 2025-09-07T13:36:56.8532410Z [--nopython] 2025-09-07T13:36:56.8532620Z [--no-skip] 2025-09-07T13:36:56.8532828Z [--prims-nvfuser] 2025-09-07T13:36:56.8533073Z [--dump-raw-metrics] 2025-09-07T13:36:56.8533323Z [--log-operator-inputs] 2025-09-07T13:36:56.8533579Z [--channels-last] 2025-09-07T13:36:56.8533806Z [--batch-size BATCH_SIZE] 2025-09-07T13:36:56.8534080Z [--iterations ITERATIONS] 2025-09-07T13:36:56.8534351Z [--batch-size-file BATCH_SIZE_FILE] 2025-09-07T13:36:56.8534629Z [--cosine] 2025-09-07T13:36:56.8534829Z [--freezing] 2025-09-07T13:36:56.8535430Z [--inductor-config INDUCTOR_CONFIG] 2025-09-07T13:36:56.8535699Z [--ci] 2025-09-07T13:36:56.8535888Z [--dashboard] 2025-09-07T13:36:56.8536111Z [--skip-fp64-check] 2025-09-07T13:36:56.8536350Z [--fast] 2025-09-07T13:36:56.8536546Z [--only ONLY] 2025-09-07T13:36:56.8536750Z [--multiprocess] 2025-09-07T13:36:56.8536966Z [--ddp] 2025-09-07T13:36:56.8537155Z [--fsdp] 2025-09-07T13:36:56.8537382Z [--optimize-ddp-mode OPTIMIZE_DDP_MODE] 2025-09-07T13:36:56.8537880Z [--distributed-master-port DISTRIBUTED_MASTER_PORT] 2025-09-07T13:36:56.8538224Z [--dynamic-shapes] 2025-09-07T13:36:56.8538476Z [--propagate-real-tensors] 2025-09-07T13:36:56.8538743Z [--dynamic-batch-only] 2025-09-07T13:36:56.8538983Z [--specialize-int] 2025-09-07T13:36:56.8539211Z [--use-eval-mode] 2025-09-07T13:36:56.8539447Z [--skip-accuracy-check] 2025-09-07T13:36:56.8539728Z [--generate-aot-autograd-stats] 2025-09-07T13:36:56.8539964Z [--inductor-settings] 2025-09-07T13:36:56.8540169Z [--suppress-errors] 2025-09-07T13:36:56.8540356Z [--output OUTPUT] 2025-09-07T13:36:56.8540560Z [--output-directory OUTPUT_DIRECTORY] 2025-09-07T13:36:56.8540783Z [--disable-output] 2025-09-07T13:36:56.8540968Z [--baseline BASELINE] 2025-09-07T13:36:56.8541156Z [--part PART] 2025-09-07T13:36:56.8541436Z [--export-profiler-trace] 2025-09-07T13:36:56.8541677Z [--profiler-trace-name PROFILER_TRACE_NAME] 2025-09-07T13:36:56.8541930Z [--profile-details] 2025-09-07T13:36:56.8542126Z [--export-perfdoctor] 2025-09-07T13:36:56.8542333Z [--diff-branch DIFF_BRANCH] 2025-09-07T13:36:56.8542530Z [--tag TAG] 2025-09-07T13:36:56.8542696Z [--explain] 2025-09-07T13:36:56.8542867Z [--stats] 2025-09-07T13:36:56.8543046Z [--use-warm-peak-memory] 2025-09-07T13:36:56.8543247Z [--print-memory] 2025-09-07T13:36:56.8543446Z [--print-compilation-time] 2025-09-07T13:36:56.8543673Z [--print-dataframe-summary] 2025-09-07T13:36:56.8543913Z [--disable-cudagraphs] 2025-09-07T13:36:56.8544120Z [--disable-split-reductions] 2025-09-07T13:36:56.8544356Z [--disable-persistent-reductions] 2025-09-07T13:36:56.8544593Z [--disable-divisible-by-16] 2025-09-07T13:36:56.8544840Z [--inductor-compile-mode INDUCTOR_COMPILE_MODE] 2025-09-07T13:36:56.8545239Z [--print-graph-breaks] 2025-09-07T13:36:56.8545440Z [--log-graph-breaks] 2025-09-07T13:36:56.8545645Z [--trace-on-xla] 2025-09-07T13:36:56.8545845Z [--xla-tolerance XLA_TOLERANCE] 2025-09-07T13:36:56.8546168Z [--collect-outputs] 2025-09-07T13:36:56.8546382Z [--enable-activation-checkpointing] 2025-09-07T13:36:56.8546605Z [--timing] 2025-09-07T13:36:56.8546766Z [--progress] 2025-09-07T13:36:56.8546930Z [--timeout TIMEOUT] 2025-09-07T13:36:56.8547271Z [--per_process_memory_fraction PER_PROCESS_MEMORY_FRACTION] 2025-09-07T13:36:56.8547642Z [--no-translation-validation] 2025-09-07T13:36:56.8547854Z [--minify] 2025-09-07T13:36:56.8548021Z [--compiled-autograd] 2025-09-07T13:36:56.8548231Z [--profile_dynamo_cache_lookup] 2025-09-07T13:36:56.8548452Z [--snapshot-memory] 2025-09-07T13:36:56.8548641Z [--retain-output] 2025-09-07T13:36:56.8548839Z [--caching-precompile] 2025-09-07T13:36:56.8549085Z [--cold-start-latency | --warm-start-latency] 2025-09-07T13:36:56.8549326Z [--nnc] 2025-09-07T13:36:56.8549523Z [--float16 | --bfloat16 | --float32 | --amp] 2025-09-07T13:36:56.8549783Z [--amp-dtype {bfloat16,float16}] 2025-09-07T13:36:56.8550026Z [--verbose | --quiet] 2025-09-07T13:36:56.8552479Z [--coverage | --overhead | --speedup-dynamo-ts | --speedup-fx2trt | --speedup-fx2trt-fp16 | --print-fx | --print-aten-ops | --inductor | --quantization {int8dynamic,int8weightonly,int4weightonly,autoquant,noquant} | --export | --export-aot-inductor | --export-nativert | --torchscript-jit-trace | --xla | --backend {aot_eager,aot_eager_decomp_partition,aot_eager_decomp_partition_crossref,aot_eager_decomp_partition_with_mode,aot_eager_default_partitioner,aot_ts,cudagraphs,dynamo_accuracy_minifier_backend,dynamo_minifier_backend,eager,eager_debug,eager_noexcept,inductor,non_leaf_compile_error_TESTING_ONLY,openxla,openxla_eval,pre_dispatch_eager,relu_accuracy_error_TESTING_ONLY,relu_compile_error_TESTING_ONLY,relu_runtime_error_TESTING_ONLY,ts,tvm} | --nothing | --log-conv-args | --recompile-profiler | --find-batch-sizes] 2025-09-07T13:36:56.8554910Z (--accuracy | --performance | --tolerance) 2025-09-07T13:36:56.8555297Z (--training | --inference) 2025-09-07T13:36:56.8555594Z timm_models.py: error: argument --quantization: expected one argument 2025-09-07T13:36:57.7026311Z + true 2025-09-07T13:36:57.7027400Z + cp /var/lib/jenkins/workspace/test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.csv /var/lib/jenkins/workspace/test/test-reports/inductor_cudagraphs_low_precision_timm_models_quant_inference_cuda_h100_accuracy.csv 2025-09-07T13:36:57.7050158Z + for target in "${targets[@]}" 2025-09-07T13:36:57.7050504Z + target_flag=('--performance') 2025-09-07T13:36:57.7050769Z + local target_flag 2025-09-07T13:36:57.7051002Z + [[ performance == \p\e\r\f\o\r\m\a\n\c\e ]] 2025-09-07T13:36:57.7051311Z + target_flag+=(--cold-start-latency) 2025-09-07T13:36:57.7052522Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *freezing-true* ]] 2025-09-07T13:36:57.7054611Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *default-true* ]] 2025-09-07T13:36:57.7057475Z + python benchmarks/dynamo/timm_models.py --performance --cold-start-latency --inference --bfloat16 --backend inductor --disable-cudagraphs --device cuda --total-partitions 7 --partition-id 3 --output /var/lib/jenkins/workspace/test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance.csv 2025-09-07T13:36:58.7009289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:36:58.7010517Z import pynvml # type: ignore[import] 2025-09-07T13:37:03.5534300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:37:03.5536191Z import pynvml # type: ignore[import] 2025-09-07T13:37:06.5350004Z 2025-09-07T13:37:08.8802970Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:37:08.8803477Z loading model: 0it [00:02, ?it/s] 2025-09-07T13:37:08.9107260Z cuda eval hrnet_w18 2025-09-07T13:37:58.9111797Z 2025-09-07T13:37:59.0582078Z running benchmark: 0% 0/30 [00:00 2025-09-07T13:44:52.3885122Z and t.untyped_storage().data_ptr() not in existing_path_data_ptrs 2025-09-07T13:44:52.3885925Z RuntimeError: Error: accessing tensor output of CUDAGraphs that has been overwritten by a subsequent run. Stack trace: File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 436, in forward_pass 2025-09-07T13:44:52.3886637Z return mod(*inputs) 2025-09-07T13:44:52.3887018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 718, in forward 2025-09-07T13:44:52.3887422Z x = self.forward_features(x) 2025-09-07T13:44:52.3887833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 709, in forward_features 2025-09-07T13:44:52.3888242Z x = self.stages(x) 2025-09-07T13:44:52.3888678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 520, in forward 2025-09-07T13:44:52.3889062Z x = self.blocks(x) 2025-09-07T13:44:52.3889404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 458, in forward 2025-09-07T13:44:52.3889796Z x = x + self.drop_path1(self.attn(x)) 2025-09-07T13:44:52.3890178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 237, in forward 2025-09-07T13:44:52.3890629Z attn = q @ k * self.scale + self.get_attention_biases(x.device) 2025-09-07T13:44:52.3891112Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 216, in get_attention_biases 2025-09-07T13:44:52.3892079Z self.attention_bias_cache[device_key] = self.attention_biases[:, self.attention_bias_idxs]. To prevent overwriting, clone the tensor outside of torch.compile() or call torch.compiler.cudagraph_mark_step_begin() before each model invocation. 2025-09-07T13:44:52.3892831Z warmup_failed 2025-09-07T13:44:55.5107118Z Run failed with return code: 255 2025-09-07T13:44:55.5107464Z Output: None 2025-09-07T13:44:55.5107659Z Error: None 2025-09-07T13:44:56.4858307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:44:56.4859539Z import pynvml # type: ignore[import] 2025-09-07T13:44:59.4809666Z 2025-09-07T13:45:00.7310755Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:45:00.7311086Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:45:00.7344701Z cuda eval mixer_b16_224 2025-09-07T13:45:13.4359963Z 2025-09-07T13:45:13.5661063Z running benchmark: 0% 0/30 [00:00 2025-09-07T13:49:32.6400971Z and t.untyped_storage().data_ptr() not in existing_path_data_ptrs 2025-09-07T13:49:32.6401748Z RuntimeError: Error: accessing tensor output of CUDAGraphs that has been overwritten by a subsequent run. Stack trace: File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 436, in forward_pass 2025-09-07T13:49:32.6402436Z return mod(*inputs) 2025-09-07T13:49:32.6402802Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 718, in forward 2025-09-07T13:49:32.6403198Z x = self.forward_features(x) 2025-09-07T13:49:32.6403599Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 709, in forward_features 2025-09-07T13:49:32.6404021Z x = self.stages(x) 2025-09-07T13:49:32.6404382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 520, in forward 2025-09-07T13:49:32.6404832Z x = self.blocks(x) 2025-09-07T13:49:32.6405302Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 458, in forward 2025-09-07T13:49:32.6405698Z x = x + self.drop_path1(self.attn(x)) 2025-09-07T13:49:32.6406082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 237, in forward 2025-09-07T13:49:32.6406508Z attn = q @ k * self.scale + self.get_attention_biases(x.device) 2025-09-07T13:49:32.6406993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 216, in get_attention_biases 2025-09-07T13:49:32.6407952Z self.attention_bias_cache[device_key] = self.attention_biases[:, self.attention_bias_idxs]. To prevent overwriting, clone the tensor outside of torch.compile() or call torch.compiler.cudagraph_mark_step_begin() before each model invocation. 2025-09-07T13:49:32.6408717Z warmup_failed 2025-09-07T13:49:35.7782125Z Run failed with return code: 255 2025-09-07T13:49:35.7782704Z Output: None 2025-09-07T13:49:35.7783052Z Error: None 2025-09-07T13:49:36.7344920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T13:49:36.7346786Z import pynvml # type: ignore[import] 2025-09-07T13:49:40.1016368Z 2025-09-07T13:49:41.6483546Z loading model: 0it [00:00, ?it/s] 2025-09-07T13:49:41.6484006Z loading model: 0it [00:01, ?it/s] 2025-09-07T13:49:41.6520600Z cuda eval mixer_b16_224 2025-09-07T13:49:54.9250507Z 2025-09-07T13:49:55.0505272Z running benchmark: 0% 0/30 [00:00 2025-09-07T14:21:16.5074167Z and t.untyped_storage().data_ptr() not in existing_path_data_ptrs 2025-09-07T14:21:16.5075072Z RuntimeError: Error: accessing tensor output of CUDAGraphs that has been overwritten by a subsequent run. Stack trace: File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 436, in forward_pass 2025-09-07T14:21:16.5075776Z return mod(*inputs) 2025-09-07T14:21:16.5076139Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 718, in forward 2025-09-07T14:21:16.5076532Z x = self.forward_features(x) 2025-09-07T14:21:16.5076935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 709, in forward_features 2025-09-07T14:21:16.5077339Z x = self.stages(x) 2025-09-07T14:21:16.5077680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 520, in forward 2025-09-07T14:21:16.5078066Z x = self.blocks(x) 2025-09-07T14:21:16.5078408Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 458, in forward 2025-09-07T14:21:16.5078795Z x = x + self.drop_path1(self.attn(x)) 2025-09-07T14:21:16.5079293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 237, in forward 2025-09-07T14:21:16.5079728Z attn = q @ k * self.scale + self.get_attention_biases(x.device) 2025-09-07T14:21:16.5080272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/levit.py", line 216, in get_attention_biases 2025-09-07T14:21:16.5081249Z self.attention_bias_cache[device_key] = self.attention_biases[:, self.attention_bias_idxs]. To prevent overwriting, clone the tensor outside of torch.compile() or call torch.compiler.cudagraph_mark_step_begin() before each model invocation. 2025-09-07T14:21:16.5082023Z warmup_failed 2025-09-07T14:21:20.2598066Z Run failed with return code: 255 2025-09-07T14:21:20.2598429Z Output: None 2025-09-07T14:21:20.2598624Z Error: None 2025-09-07T14:21:21.1981575Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T14:21:21.1983682Z import pynvml # type: ignore[import] 2025-09-07T14:21:24.2029911Z 2025-09-07T14:21:25.4441070Z loading model: 0it [00:00, ?it/s] 2025-09-07T14:21:25.4441454Z loading model: 0it [00:01, ?it/s] 2025-09-07T14:21:25.4477867Z cuda eval mixer_b16_224 2025-09-07T14:21:36.3282418Z Autotune Choices Stats: 2025-09-07T14:21:36.3284245Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.17151999473571777, "best_triton_pos": 1, "best_triton_time": 0.211776003241539, "best_triton_kernel": "triton_mm_62", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T14:21:36.3334481Z AUTOTUNE mm(25088x768, 768x3072) 2025-09-07T14:21:36.3334782Z strides: [768, 1], [1, 768] 2025-09-07T14:21:36.3335215Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T14:21:36.3335502Z mm 0.1715 ms 100.0% 2025-09-07T14:21:36.3336129Z triton_mm_62 0.2118 ms 81.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:36.3337132Z triton_mm_61 0.2372 ms 72.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:36.3338137Z triton_mm_63 0.2407 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T14:21:36.3339166Z triton_mm_60 0.2940 ms 58.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T14:21:36.3340062Z triton_mm_56 0.2985 ms 57.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:36.3340936Z triton_mm_57 0.3046 ms 56.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T14:21:36.3341893Z triton_mm_54 0.3366 ms 51.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:36.3342757Z triton_mm_55 0.3389 ms 50.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T14:21:36.3343631Z triton_mm_58 0.3393 ms 50.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:36.3344553Z SingleProcess AUTOTUNE benchmarking takes 0.6964 seconds and 0.0004 seconds precompiling for 20 choices 2025-09-07T14:21:37.6521590Z Autotune Choices Stats: 2025-09-07T14:21:37.6524045Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.0530879981815815, "best_triton_pos": 1, "best_triton_time": 0.06905599683523178, "best_triton_kernel": "triton_mm_16", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T14:21:37.6573648Z AUTOTUNE mm(98304x200, 200x384) 2025-09-07T14:21:37.6574119Z strides: [200, 1], [384, 1] 2025-09-07T14:21:37.6574560Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T14:21:37.6575243Z mm 0.0531 ms 100.0% 2025-09-07T14:21:37.6576242Z triton_mm_16 0.0691 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:37.6577832Z triton_mm_17 0.0692 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T14:21:37.6579413Z triton_mm_24 0.0714 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:37.6580688Z triton_mm_23 0.0751 ms 70.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:37.6581641Z triton_mm_18 0.0779 ms 68.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:37.6582655Z triton_mm_25 0.0820 ms 64.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T14:21:37.6583544Z triton_mm_21 0.0825 ms 64.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T14:21:37.6584433Z triton_mm_20 0.0829 ms 64.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:37.6585448Z triton_mm_22 0.0881 ms 60.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T14:21:37.6586230Z SingleProcess AUTOTUNE benchmarking takes 0.4162 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T14:21:40.1680250Z Autotune Choices Stats: 2025-09-07T14:21:40.1682634Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_6", "best_kernel_desc": "ALLOW_TF32=True, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8", "best_time": 1.0034559965133667, "best_triton_pos": 0} 2025-09-07T14:21:40.1732056Z AUTOTUNE convolution(128x3x224x224, 768x3x16x16) 2025-09-07T14:21:40.1732610Z strides: [150528, 50176, 224, 1], [768, 256, 16, 1] 2025-09-07T14:21:40.1733125Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T14:21:40.1734411Z triton_convolution2d_6 1.0035 ms 100.0% ALLOW_TF32=True, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T14:21:40.1736787Z triton_convolution2d_1 1.0352 ms 96.9% ALLOW_TF32=True, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T14:21:40.1738054Z convolution 1.0531 ms 95.3% 2025-09-07T14:21:40.1739905Z triton_convolution2d_3 1.0701 ms 93.8% ALLOW_TF32=True, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T14:21:40.1741870Z triton_convolution2d_0 1.6867 ms 59.5% ALLOW_TF32=True, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T14:21:40.1751063Z triton_convolution2d_5 1.7051 ms 58.8% ALLOW_TF32=True, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T14:21:40.1752246Z triton_convolution2d_4 1.9131 ms 52.5% ALLOW_TF32=True, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T14:21:40.1753387Z triton_convolution2d_2 4.3898 ms 22.9% ALLOW_TF32=True, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T14:21:40.1754299Z SingleProcess AUTOTUNE benchmarking takes 0.3278 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T14:21:40.6084498Z Autotune Choices Stats: 2025-09-07T14:21:40.6087025Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.0522879995405674, "best_triton_pos": 1, "best_triton_time": 0.06918399780988693, "best_triton_kernel": "triton_mm_43", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T14:21:40.6133811Z AUTOTUNE mm(98304x384, 384x200) 2025-09-07T14:21:40.6134503Z strides: [384, 1], [200, 1] 2025-09-07T14:21:40.6135200Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T14:21:40.6135685Z mm 0.0523 ms 100.0% 2025-09-07T14:21:40.6136656Z triton_mm_43 0.0692 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:40.6138229Z triton_mm_44 0.0746 ms 70.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T14:21:40.6139745Z triton_mm_42 0.0795 ms 65.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:40.6141042Z triton_mm_41 0.0947 ms 55.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T14:21:40.6141912Z triton_mm_39 0.0987 ms 53.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:40.6142727Z triton_mm_40 0.0990 ms 52.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T14:21:40.6143535Z triton_mm_37 0.1040 ms 50.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:40.6144343Z triton_mm_35 0.1154 ms 45.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:40.6145289Z triton_mm_38 0.1157 ms 45.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T14:21:40.6146018Z SingleProcess AUTOTUNE benchmarking takes 0.4393 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T14:21:41.2948011Z Autotune Choices Stats: 2025-09-07T14:21:41.2950301Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.15695999562740326, "best_triton_pos": 1, "best_triton_time": 0.2043839991092682, "best_triton_kernel": "triton_mm_81", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T14:21:41.2997296Z AUTOTUNE mm(25088x3072, 3072x768) 2025-09-07T14:21:41.2997559Z strides: [3072, 1], [1, 3072] 2025-09-07T14:21:41.2997816Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T14:21:41.2998098Z mm 0.1570 ms 100.0% 2025-09-07T14:21:41.2998666Z triton_mm_81 0.2044 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:41.2999627Z triton_mm_82 0.2048 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T14:21:41.3000583Z triton_mm_80 0.2437 ms 64.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:41.3001531Z triton_mm_76 0.2657 ms 59.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T14:21:41.3002551Z triton_mm_75 0.2762 ms 56.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:41.3003499Z triton_mm_74 0.3482 ms 45.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T14:21:41.3004367Z triton_mm_78 0.3574 ms 43.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T14:21:41.3005538Z triton_mm_77 0.3642 ms 43.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:41.3006406Z triton_mm_73 0.3659 ms 42.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:41.3007175Z SingleProcess AUTOTUNE benchmarking takes 0.6853 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T14:21:41.5897973Z Autotune Choices Stats: 2025-09-07T14:21:41.5899538Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_923", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008671999908983707, "best_triton_pos": 0} 2025-09-07T14:21:41.5949433Z AUTOTUNE addmm(128x1000, 128x768, 768x1000) 2025-09-07T14:21:41.5949672Z strides: [0, 1], [768, 1], [1, 768] 2025-09-07T14:21:41.5949966Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T14:21:41.5950618Z triton_mm_923 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T14:21:41.5951276Z bias_addmm 0.0094 ms 92.5% 2025-09-07T14:21:41.5951869Z triton_mm_927 0.0094 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T14:21:41.5952825Z triton_mm_931 0.0105 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T14:21:41.5953774Z triton_mm_922 0.0111 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T14:21:41.5954921Z triton_mm_926 0.0114 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T14:21:41.5956137Z triton_mm_921 0.0116 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T14:21:41.5957084Z triton_mm_920 0.0120 ms 72.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T14:21:41.5958034Z triton_mm_930 0.0121 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T14:21:41.5958979Z triton_mm_937 0.0125 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T14:21:41.5959811Z SingleProcess AUTOTUNE benchmarking takes 0.2494 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T14:21:46.8813093Z 2025-09-07T14:21:47.0106916Z running benchmark: 0% 0/30 [00:00> $GITHUB_ENV 2025-09-07T14:23:52.2590753Z echo "DEVICE_TYPE=$DEVICE_TYPE" >> $GITHUB_ENV 2025-09-07T14:23:52.2605375Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T14:23:52.2605665Z env: 2025-09-07T14:23:52.2605834Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:23:52.2606097Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:23:52.2606436Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:23:52.2606837Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:23:52.2607202Z ##[endgroup] 2025-09-07T14:23:52.2639922Z + [[ -n '' ]] 2025-09-07T14:23:52.2640228Z + python3 -mpip install boto3==1.35.33 psutil==7.0.0 pynvml==12.0.0 2025-09-07T14:23:52.5351197Z Defaulting to user installation because normal site-packages is not writeable 2025-09-07T14:23:53.7178488Z Collecting boto3==1.35.33 2025-09-07T14:23:53.7763239Z Downloading boto3-1.35.33-py3-none-any.whl (139 kB) 2025-09-07T14:23:53.8119194Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 139.1/139.1 KB 3.9 MB/s eta 0:00:00 2025-09-07T14:23:53.9526910Z Collecting psutil==7.0.0 2025-09-07T14:23:53.9630669Z Downloading psutil-7.0.0-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (277 kB) 2025-09-07T14:23:54.0015292Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 278.0/278.0 KB 7.3 MB/s eta 0:00:00 2025-09-07T14:23:54.0217828Z Collecting pynvml==12.0.0 2025-09-07T14:23:54.0331905Z Downloading pynvml-12.0.0-py3-none-any.whl (26 kB) 2025-09-07T14:23:54.0733992Z Collecting s3transfer<0.11.0,>=0.10.0 2025-09-07T14:23:54.0838313Z Downloading s3transfer-0.10.4-py3-none-any.whl (83 kB) 2025-09-07T14:23:54.0996596Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 83.2/83.2 KB 5.2 MB/s eta 0:00:00 2025-09-07T14:23:54.1195739Z Collecting jmespath<2.0.0,>=0.7.1 2025-09-07T14:23:54.1296338Z Downloading jmespath-1.0.1-py3-none-any.whl (20 kB) 2025-09-07T14:23:54.8055711Z Collecting botocore<1.36.0,>=1.35.33 2025-09-07T14:23:54.8178359Z Downloading botocore-1.35.99-py3-none-any.whl (13.3 MB) 2025-09-07T14:23:55.2392264Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.3/13.3 MB 41.2 MB/s eta 0:00:00 2025-09-07T14:23:55.3212492Z Collecting nvidia-ml-py<13.0.0a0,>=12.0.0 2025-09-07T14:23:55.3326356Z Downloading nvidia_ml_py-12.575.51-py3-none-any.whl (47 kB) 2025-09-07T14:23:55.3414421Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.5/47.5 KB 5.2 MB/s eta 0:00:00 2025-09-07T14:23:55.3473652Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/lib/python3/dist-packages (from botocore<1.36.0,>=1.35.33->boto3==1.35.33) (2.8.1) 2025-09-07T14:23:55.3484858Z Requirement already satisfied: urllib3!=2.2.0,<3,>=1.25.4 in /usr/lib/python3/dist-packages (from botocore<1.36.0,>=1.35.33->boto3==1.35.33) (1.26.5) 2025-09-07T14:23:55.5880245Z Installing collected packages: nvidia-ml-py, pynvml, psutil, jmespath, botocore, s3transfer, boto3 2025-09-07T14:23:55.5881220Z Attempting uninstall: nvidia-ml-py 2025-09-07T14:23:55.5886229Z Found existing installation: nvidia-ml-py 11.525.84 2025-09-07T14:23:55.5922229Z Uninstalling nvidia-ml-py-11.525.84: 2025-09-07T14:23:55.5949020Z Successfully uninstalled nvidia-ml-py-11.525.84 2025-09-07T14:23:55.6632784Z Attempting uninstall: psutil 2025-09-07T14:23:55.6639160Z Found existing installation: psutil 5.9.8 2025-09-07T14:23:55.6799759Z Uninstalling psutil-5.9.8: 2025-09-07T14:23:55.6808850Z Successfully uninstalled psutil-5.9.8 2025-09-07T14:23:56.4037570Z Successfully installed boto3-1.35.33 botocore-1.35.99 jmespath-1.0.1 nvidia-ml-py-12.575.51 psutil-7.0.0 pynvml-12.0.0 s3transfer-0.10.4 2025-09-07T14:23:56.4994193Z + DEVICE_NAME= 2025-09-07T14:23:56.4994420Z + DEVICE_TYPE= 2025-09-07T14:23:56.4994637Z + command -v nvidia-smi 2025-09-07T14:23:56.4996402Z + python3 -mpip install torch==2.7.1 2025-09-07T14:23:56.4996683Z /usr/bin/nvidia-smi 2025-09-07T14:23:56.7708591Z Defaulting to user installation because normal site-packages is not writeable 2025-09-07T14:23:56.9878222Z Collecting torch==2.7.1 2025-09-07T14:23:57.0429207Z Downloading torch-2.7.1-cp310-cp310-manylinux_2_28_x86_64.whl (821.2 MB) 2025-09-07T14:24:09.3703337Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 821.2/821.2 MB 1.2 MB/s eta 0:00:00 2025-09-07T14:24:10.2604566Z Collecting nvidia-cufile-cu12==1.11.1.6 2025-09-07T14:24:10.2735403Z Downloading nvidia_cufile_cu12-1.11.1.6-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.1 MB) 2025-09-07T14:24:10.2884030Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 88.5 MB/s eta 0:00:00 2025-09-07T14:24:10.3142734Z Collecting nvidia-cuda-cupti-cu12==12.6.80 2025-09-07T14:24:10.3259545Z Downloading nvidia_cuda_cupti_cu12-12.6.80-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (8.9 MB) 2025-09-07T14:24:10.3969265Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.9/8.9 MB 129.4 MB/s eta 0:00:00 2025-09-07T14:24:10.4492465Z Collecting networkx 2025-09-07T14:24:10.4596126Z Downloading networkx-3.4.2-py3-none-any.whl (1.7 MB) 2025-09-07T14:24:10.4803967Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 90.2 MB/s eta 0:00:00 2025-09-07T14:24:10.5185953Z Collecting sympy>=1.13.3 2025-09-07T14:24:10.5288241Z Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB) 2025-09-07T14:24:10.5787894Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 131.2 MB/s eta 0:00:00 2025-09-07T14:24:10.6184428Z Collecting nvidia-cufft-cu12==11.3.0.4 2025-09-07T14:24:10.6301201Z Downloading nvidia_cufft_cu12-11.3.0.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (200.2 MB) 2025-09-07T14:24:12.7187329Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.2/200.2 MB 8.3 MB/s eta 0:00:00 2025-09-07T14:24:12.9453185Z Collecting filelock 2025-09-07T14:24:12.9558271Z Downloading filelock-3.19.1-py3-none-any.whl (15 kB) 2025-09-07T14:24:13.0038694Z Collecting nvidia-cusparse-cu12==12.5.4.2 2025-09-07T14:24:13.0143051Z Downloading nvidia_cusparse_cu12-12.5.4.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (216.6 MB) 2025-09-07T14:24:15.3573987Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 216.6/216.6 MB 7.3 MB/s eta 0:00:00 2025-09-07T14:24:15.5837755Z Collecting nvidia-cuda-nvrtc-cu12==12.6.77 2025-09-07T14:24:15.5958846Z Downloading nvidia_cuda_nvrtc_cu12-12.6.77-py3-none-manylinux2014_x86_64.whl (23.7 MB) 2025-09-07T14:24:15.7645652Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 94.5 MB/s eta 0:00:00 2025-09-07T14:24:15.8130289Z Collecting nvidia-cuda-runtime-cu12==12.6.77 2025-09-07T14:24:15.8237350Z Downloading nvidia_cuda_runtime_cu12-12.6.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (897 kB) 2025-09-07T14:24:15.8408819Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 897.7/897.7 KB 59.0 MB/s eta 0:00:00 2025-09-07T14:24:15.8436571Z Requirement already satisfied: typing-extensions>=4.10.0 in /home/eve/.local/lib/python3.10/site-packages (from torch==2.7.1) (4.15.0) 2025-09-07T14:24:15.8711712Z Collecting triton==3.3.1 2025-09-07T14:24:15.8821752Z Downloading triton-3.3.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (155.6 MB) 2025-09-07T14:24:17.2769475Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.6/155.6 MB 13.8 MB/s eta 0:00:00 2025-09-07T14:24:17.4535540Z Collecting nvidia-curand-cu12==10.3.7.77 2025-09-07T14:24:17.4656051Z Downloading nvidia_curand_cu12-10.3.7.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (56.3 MB) 2025-09-07T14:24:17.9232924Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.3/56.3 MB 41.4 MB/s eta 0:00:00 2025-09-07T14:24:18.0024846Z Collecting nvidia-nvjitlink-cu12==12.6.85 2025-09-07T14:24:18.0134824Z Downloading nvidia_nvjitlink_cu12-12.6.85-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (19.7 MB) 2025-09-07T14:24:18.1511030Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.7/19.7 MB 109.4 MB/s eta 0:00:00 2025-09-07T14:24:18.1955487Z Collecting nvidia-nccl-cu12==2.26.2 2025-09-07T14:24:18.2102650Z Downloading nvidia_nccl_cu12-2.26.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (201.3 MB) 2025-09-07T14:24:20.3271375Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 201.3/201.3 MB 8.2 MB/s eta 0:00:00 2025-09-07T14:24:20.5428826Z Collecting jinja2 2025-09-07T14:24:20.5534793Z Downloading jinja2-3.1.6-py3-none-any.whl (134 kB) 2025-09-07T14:24:20.5615346Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.9/134.9 KB 20.9 MB/s eta 0:00:00 2025-09-07T14:24:20.6055722Z Collecting fsspec 2025-09-07T14:24:20.6157667Z Downloading fsspec-2025.9.0-py3-none-any.whl (199 kB) 2025-09-07T14:24:20.6242308Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.3/199.3 KB 28.4 MB/s eta 0:00:00 2025-09-07T14:24:20.6435596Z Collecting nvidia-cusparselt-cu12==0.6.3 2025-09-07T14:24:20.6589159Z Downloading nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_x86_64.whl (156.8 MB) 2025-09-07T14:24:22.0790525Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 156.8/156.8 MB 13.5 MB/s eta 0:00:00 2025-09-07T14:24:22.2514710Z Collecting nvidia-cublas-cu12==12.6.4.1 2025-09-07T14:24:22.2625831Z Downloading nvidia_cublas_cu12-12.6.4.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (393.1 MB) 2025-09-07T14:24:27.2891971Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 393.1/393.1 MB 3.1 MB/s eta 0:00:00 2025-09-07T14:24:27.6818200Z Collecting nvidia-cusolver-cu12==11.7.1.2 2025-09-07T14:24:27.6943145Z Downloading nvidia_cusolver_cu12-11.7.1.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (158.2 MB) 2025-09-07T14:24:31.4535317Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 158.2/158.2 MB 3.3 MB/s eta 0:00:00 2025-09-07T14:24:31.9679792Z Collecting nvidia-cudnn-cu12==9.5.1.17 2025-09-07T14:24:31.9793136Z Downloading nvidia_cudnn_cu12-9.5.1.17-py3-none-manylinux_2_28_x86_64.whl (571.0 MB) 2025-09-07T14:24:41.3957368Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 571.0/571.0 MB 1.5 MB/s eta 0:00:00 2025-09-07T14:24:41.9539255Z Collecting nvidia-nvtx-cu12==12.6.77 2025-09-07T14:24:41.9659068Z Downloading nvidia_nvtx_cu12-12.6.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89 kB) 2025-09-07T14:24:41.9737316Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.3/89.3 KB 13.4 MB/s eta 0:00:00 2025-09-07T14:24:41.9990452Z Requirement already satisfied: setuptools>=40.8.0 in /usr/lib/python3/dist-packages (from triton==3.3.1->torch==2.7.1) (59.6.0) 2025-09-07T14:24:42.0249056Z Collecting mpmath<1.4,>=1.1.0 2025-09-07T14:24:42.0360710Z Downloading mpmath-1.3.0-py3-none-any.whl (536 kB) 2025-09-07T14:24:42.0476200Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 54.4 MB/s eta 0:00:00 2025-09-07T14:24:42.2277197Z Collecting MarkupSafe>=2.0 2025-09-07T14:24:42.2380705Z Downloading MarkupSafe-3.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20 kB) 2025-09-07T14:24:42.5636387Z Installing collected packages: nvidia-cusparselt-cu12, mpmath, triton, sympy, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufile-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, MarkupSafe, fsspec, filelock, nvidia-cusparse-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch 2025-09-07T14:24:46.6147120Z WARNING: The scripts proton and proton-viewer are installed in '/home/eve/.local/bin' which is not on PATH. 2025-09-07T14:24:46.6148260Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2025-09-07T14:24:50.1557433Z WARNING: The script isympy is installed in '/home/eve/.local/bin' which is not on PATH. 2025-09-07T14:24:50.1558164Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2025-09-07T14:25:25.7876473Z WARNING: The scripts torchfrtrace and torchrun are installed in '/home/eve/.local/bin' which is not on PATH. 2025-09-07T14:25:25.7877281Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2025-09-07T14:25:25.8739918Z Successfully installed MarkupSafe-3.0.2 filelock-3.19.1 fsspec-2025.9.0 jinja2-3.1.6 mpmath-1.3.0 networkx-3.4.2 nvidia-cublas-cu12-12.6.4.1 nvidia-cuda-cupti-cu12-12.6.80 nvidia-cuda-nvrtc-cu12-12.6.77 nvidia-cuda-runtime-cu12-12.6.77 nvidia-cudnn-cu12-9.5.1.17 nvidia-cufft-cu12-11.3.0.4 nvidia-cufile-cu12-1.11.1.6 nvidia-curand-cu12-10.3.7.77 nvidia-cusolver-cu12-11.7.1.2 nvidia-cusparse-cu12-12.5.4.2 nvidia-cusparselt-cu12-0.6.3 nvidia-nccl-cu12-2.26.2 nvidia-nvjitlink-cu12-12.6.85 nvidia-nvtx-cu12-12.6.77 sympy-1.14.0 torch-2.7.1 triton-3.3.1 2025-09-07T14:25:26.5741497Z + echo DEVICE_NAME= 2025-09-07T14:25:26.5742052Z + echo DEVICE_TYPE= 2025-09-07T14:25:26.6152331Z ##[group]Run set -eux 2025-09-07T14:25:26.6152551Z set -eux 2025-09-07T14:25:26.6152718Z  2025-09-07T14:25:26.6152899Z if [[ -z "${GITHUB_TOKEN}" ]]; then 2025-09-07T14:25:26.6153165Z  echo "Missing github-token input" 2025-09-07T14:25:26.6153400Z  exit 1 2025-09-07T14:25:26.6153614Z fi 2025-09-07T14:25:26.6168072Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T14:25:26.6168358Z env: 2025-09-07T14:25:26.6168526Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:26.6168776Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:26.6169116Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:26.6169543Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:26.6169912Z DEVICE_NAME: 2025-09-07T14:25:26.6170085Z DEVICE_TYPE: 2025-09-07T14:25:26.6170433Z GITHUB_TOKEN: *** 2025-09-07T14:25:26.6170770Z ##[endgroup] 2025-09-07T14:25:26.6640657Z + [[ -z *** ]] 2025-09-07T14:25:26.7547088Z ##[group]Run pytorch/test-infra/.github/actions/get-workflow-job-id@main 2025-09-07T14:25:26.7547417Z with: 2025-09-07T14:25:26.7547812Z github-token: *** 2025-09-07T14:25:26.7547999Z env: 2025-09-07T14:25:26.7548163Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:26.7548420Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:26.7548763Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:26.7549188Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:26.7549562Z DEVICE_NAME: 2025-09-07T14:25:26.7549743Z DEVICE_TYPE: 2025-09-07T14:25:26.7549908Z ##[endgroup] 2025-09-07T14:25:26.8246991Z ##[group]Run set -eux 2025-09-07T14:25:26.8247224Z set -eux 2025-09-07T14:25:26.8247422Z  2025-09-07T14:25:26.8260595Z python3 "${GITHUB_ACTION_PATH}/../../scripts/get_workflow_job_id.py" "${GITHUB_RUN_ID}" "${RUNNER_NAME}" 2025-09-07T14:25:26.8275489Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T14:25:26.8275771Z env: 2025-09-07T14:25:26.8275939Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:26.8276191Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:26.8276524Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:26.8276943Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:26.8277298Z DEVICE_NAME: 2025-09-07T14:25:26.8277480Z DEVICE_TYPE: 2025-09-07T14:25:26.8277936Z GITHUB_TOKEN: *** 2025-09-07T14:25:26.8278113Z ##[endgroup] 2025-09-07T14:25:26.8731108Z + python3 /home/eve/_work/_actions/pytorch/test-infra/main/.github/actions/get-workflow-job-id/../../scripts/get_workflow_job_id.py 17525296438 i-0d73070610f53945f-1005 2025-09-07T14:25:27.9242546Z setting job-id=49775781837 2025-09-07T14:25:27.9242995Z setting job-name=test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100) 2025-09-07T14:25:27.9617601Z ##[group]Run set -eux 2025-09-07T14:25:27.9617817Z set -eux 2025-09-07T14:25:27.9617988Z  2025-09-07T14:25:27.9618151Z if [[ -n "" ]]; then 2025-09-07T14:25:27.9618356Z  source "" 2025-09-07T14:25:27.9618532Z fi 2025-09-07T14:25:27.9618691Z  2025-09-07T14:25:27.9618995Z python3 "${GITHUB_ACTION_PATH}/../../scripts/benchmarks/gather_metadata.py" \ 2025-09-07T14:25:27.9619377Z  --schema-version "${SCHEMA_VERSION}" \ 2025-09-07T14:25:27.9619637Z  --repo "${REPO}" \ 2025-09-07T14:25:27.9619879Z  --head-branch "${HEAD_BRANCH}" \ 2025-09-07T14:25:27.9620131Z  --head-sha "${HEAD_SHA}" \ 2025-09-07T14:25:27.9620598Z  --workflow-id "${WORKFLOW_RUN_ID}" \ 2025-09-07T14:25:27.9620881Z  --run-attempt "${RUN_ATTEMPT}" \ 2025-09-07T14:25:27.9621127Z  --job-id "${JOB_ID}" \ 2025-09-07T14:25:27.9621449Z  --job-name "${JOB_NAME}" 2025-09-07T14:25:27.9635952Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T14:25:27.9636248Z env: 2025-09-07T14:25:27.9636411Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:27.9636661Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:27.9637003Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:27.9637415Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:27.9637773Z DEVICE_NAME: 2025-09-07T14:25:27.9637943Z DEVICE_TYPE: 2025-09-07T14:25:27.9638116Z SCHEMA_VERSION: v3 2025-09-07T14:25:27.9638296Z REPO: pytorch/pytorch 2025-09-07T14:25:27.9638491Z HEAD_BRANCH: refs/heads/main 2025-09-07T14:25:27.9638741Z HEAD_SHA: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T14:25:27.9639007Z WORKFLOW_RUN_ID: 17525296438 2025-09-07T14:25:27.9639202Z RUN_ATTEMPT: 1 2025-09-07T14:25:27.9639373Z JOB_ID: 49775781837 2025-09-07T14:25:27.9639824Z JOB_NAME: test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100) 2025-09-07T14:25:27.9640155Z ##[endgroup] 2025-09-07T14:25:28.0102394Z + [[ -n '' ]] 2025-09-07T14:25:28.0104044Z + python3 /home/eve/_work/_actions/pytorch/test-infra/main/.github/actions/upload-benchmark-results/../../scripts/benchmarks/gather_metadata.py --schema-version v3 --repo pytorch/pytorch --head-branch refs/heads/main --head-sha 93fb23d6fae7c4e82c4239a1033e522088742634 --workflow-id 17525296438 --run-attempt 1 --job-id 49775781837 --job-name 'test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100)' 2025-09-07T14:25:28.1024638Z ##[group]Run set -eux 2025-09-07T14:25:28.1024863Z set -eux 2025-09-07T14:25:28.1025215Z  2025-09-07T14:25:28.1025398Z if [[ -n "" ]]; then 2025-09-07T14:25:28.1025640Z  source "" 2025-09-07T14:25:28.1025829Z fi 2025-09-07T14:25:28.1026001Z  2025-09-07T14:25:28.1026322Z python3 "${GITHUB_ACTION_PATH}/../../scripts/benchmarks/gather_runners_info.py" 2025-09-07T14:25:28.1040019Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T14:25:28.1040307Z env: 2025-09-07T14:25:28.1040467Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:28.1040724Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:28.1041065Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:28.1041490Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:28.1041841Z DEVICE_NAME: 2025-09-07T14:25:28.1042003Z DEVICE_TYPE: 2025-09-07T14:25:28.1042314Z ##[endgroup] 2025-09-07T14:25:28.1548130Z + [[ -n '' ]] 2025-09-07T14:25:28.1548738Z + python3 /home/eve/_work/_actions/pytorch/test-infra/main/.github/actions/upload-benchmark-results/../../scripts/benchmarks/gather_runners_info.py 2025-09-07T14:25:28.8933052Z /home/eve/.local/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:276: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.) 2025-09-07T14:25:28.8934146Z cpu = _conversion_method_template(device=torch.device("cpu")) 2025-09-07T14:25:30.3962061Z ##[group]Run set -eux 2025-09-07T14:25:30.3962332Z set -eux 2025-09-07T14:25:30.3962533Z  2025-09-07T14:25:30.3962762Z # TODO (huydhn): Implement this part 2025-09-07T14:25:30.3963116Z echo "dependencies={}" >> "${GITHUB_OUTPUT}" 2025-09-07T14:25:30.3977928Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T14:25:30.3978233Z env: 2025-09-07T14:25:30.3978401Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:30.3978658Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:30.3979211Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:30.3979635Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:30.3980000Z DEVICE_NAME: 2025-09-07T14:25:30.3980183Z DEVICE_TYPE: 2025-09-07T14:25:30.3980347Z ##[endgroup] 2025-09-07T14:25:30.4457425Z + echo 'dependencies={}' 2025-09-07T14:25:30.5370692Z ##[group]Run set -eux 2025-09-07T14:25:30.5370920Z set -eux 2025-09-07T14:25:30.5371091Z  2025-09-07T14:25:30.5371259Z if [[ -n "" ]]; then 2025-09-07T14:25:30.5371468Z  source "" 2025-09-07T14:25:30.5371661Z fi 2025-09-07T14:25:30.5371821Z  2025-09-07T14:25:30.5372023Z if [[ ! -d "${BENCHMARK_RESULTS_DIR}" ]]; then 2025-09-07T14:25:30.5372359Z  echo "${BENCHMARK_RESULTS_DIR} does not exist, skipping" 2025-09-07T14:25:30.5372738Z  # We don't want the job to fail if the directory doesn't exist 2025-09-07T14:25:30.5373028Z  exit 0 2025-09-07T14:25:30.5373204Z fi 2025-09-07T14:25:30.5373372Z  2025-09-07T14:25:30.5373553Z if [[ "${DRY_RUN}" == "true" ]]; then 2025-09-07T14:25:30.5373913Z  python3 "${GITHUB_ACTION_PATH}/../../scripts/upload_benchmark_results.py" \ 2025-09-07T14:25:30.5374479Z  --benchmark-results-dir "${BENCHMARK_RESULTS_DIR}" \ 2025-09-07T14:25:30.5374797Z  --metadata "${BENCHMARK_METADATA}" \ 2025-09-07T14:25:30.5375222Z  --runners "${RUNNER_INFO}" \ 2025-09-07T14:25:30.5375504Z  --dependencies "${DEPENDENCIES}" \ 2025-09-07T14:25:30.5375744Z  --dry-run 2025-09-07T14:25:30.5375927Z else 2025-09-07T14:25:30.5376212Z  python3 "${GITHUB_ACTION_PATH}/../../scripts/upload_benchmark_results.py" \ 2025-09-07T14:25:30.5376627Z  --benchmark-results-dir "${BENCHMARK_RESULTS_DIR}" \ 2025-09-07T14:25:30.5376938Z  --metadata "${BENCHMARK_METADATA}" \ 2025-09-07T14:25:30.5377194Z  --runners "${RUNNER_INFO}" \ 2025-09-07T14:25:30.5377447Z  --dependencies "${DEPENDENCIES}" 2025-09-07T14:25:30.5377676Z fi 2025-09-07T14:25:30.5390950Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T14:25:30.5391234Z env: 2025-09-07T14:25:30.5391395Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:30.5391648Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:30.5391989Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:30.5392410Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:30.5392756Z DEVICE_NAME: 2025-09-07T14:25:30.5392928Z DEVICE_TYPE: 2025-09-07T14:25:30.5393117Z BENCHMARK_RESULTS_DIR: test/test-reports 2025-09-07T14:25:30.5393482Z DRY_RUN: false 2025-09-07T14:25:30.5394398Z BENCHMARK_METADATA: {"timestamp": 1757255128, "schema_version": "v3", "name": "test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100)", "repo": "pytorch/pytorch", "head_branch": "refs/heads/main", "head_sha": "93fb23d6fae7c4e82c4239a1033e522088742634", "workflow_id": 17525296438, "run_attempt": 1, "job_id": 49775781837} 2025-09-07T14:25:30.5395842Z RUNNER_INFO: [{"cpu_info": "x86_64", "cpu_count": 192, "avail_mem_in_gb": 1999, "extra_info": {"hostname": "c9e10662379e"}, "name": "cuda", "type": "NVIDIA H100 80GB HBM3", "gpu_count": 1, "avail_gpu_mem_in_gb": 79}] 2025-09-07T14:25:30.5396417Z DEPENDENCIES: {} 2025-09-07T14:25:30.5396595Z ##[endgroup] 2025-09-07T14:25:30.5862953Z + [[ -n '' ]] 2025-09-07T14:25:30.5863212Z + [[ ! -d test/test-reports ]] 2025-09-07T14:25:30.5863494Z + [[ false == \t\r\u\e ]] 2025-09-07T14:25:30.5866516Z + python3 /home/eve/_work/_actions/pytorch/test-infra/main/.github/actions/upload-benchmark-results/../../scripts/upload_benchmark_results.py --benchmark-results-dir test/test-reports --metadata '{"timestamp": 1757255128, "schema_version": "v3", "name": "test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100)", "repo": "pytorch/pytorch", "head_branch": "refs/heads/main", "head_sha": "93fb23d6fae7c4e82c4239a1033e522088742634", "workflow_id": 17525296438, "run_attempt": 1, "job_id": 49775781837}' --runners '[{"cpu_info": "x86_64", "cpu_count": 192, "avail_mem_in_gb": 1999, "extra_info": {"hostname": "c9e10662379e"}, "name": "cuda", "type": "NVIDIA H100 80GB HBM3", "gpu_count": 1, "avail_gpu_mem_in_gb": 79}]' --dependencies '{}' 2025-09-07T14:25:30.7061449Z INFO:root:Upload test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_performance.json 2025-09-07T14:25:30.7421168Z INFO:botocore.credentials:Found credentials from IAM Role: gh-ci-github-action-runners-runner-role 2025-09-07T14:25:30.9688945Z INFO:root:Upload test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_accuracy.json 2025-09-07T14:25:31.1122252Z INFO:root:Upload test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_accuracy.json 2025-09-07T14:25:31.2596235Z INFO:root:Upload test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json 2025-09-07T14:25:31.4263187Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json 2025-09-07T14:25:31.6017994Z INFO:root:Upload test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance.json 2025-09-07T14:25:31.7445502Z INFO:root:Upload test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_accuracy.json 2025-09-07T14:25:31.8775690Z INFO:root:Upload test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.json 2025-09-07T14:25:32.0546685Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_performance.json 2025-09-07T14:25:32.1851585Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.json 2025-09-07T14:25:32.3170104Z INFO:root:Upload test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_performance.json 2025-09-07T14:25:32.4491573Z INFO:root:Upload test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json 2025-09-07T14:25:32.6031911Z INFO:root:Upload test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_dynamic_timm_models_amp_training_cuda_h100_accuracy.json 2025-09-07T14:25:32.7378504Z INFO:root:Upload test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json 2025-09-07T14:25:32.8972875Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance.json 2025-09-07T14:25:33.0357306Z INFO:root:Upload test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json 2025-09-07T14:25:33.1976068Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_performance.json 2025-09-07T14:25:33.3298670Z INFO:root:Upload test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.json 2025-09-07T14:25:33.4604349Z INFO:root:Upload test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_accuracy.json 2025-09-07T14:25:33.6173369Z INFO:root:Upload test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_dynamic_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json 2025-09-07T14:25:33.7778607Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json 2025-09-07T14:25:33.9405630Z INFO:root:Upload test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json 2025-09-07T14:25:34.1291875Z INFO:root:Upload test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_max_autotune_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json 2025-09-07T14:25:34.3161368Z INFO:root:Upload test/test-reports/inductor_export_timm_models_bfloat16_inference_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_export_timm_models_bfloat16_inference_cuda_h100_accuracy.json 2025-09-07T14:25:34.4585569Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json 2025-09-07T14:25:34.6236097Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json 2025-09-07T14:25:34.7924552Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_performance.json 2025-09-07T14:25:34.9176701Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_accuracy.json 2025-09-07T14:25:35.0535760Z INFO:root:Upload test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_max_autotune_timm_models_amp_training_cuda_h100_accuracy.json 2025-09-07T14:25:35.1588031Z INFO:root:Upload test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_max_autotune_timm_models_amp_training_cuda_h100_performance.json 2025-09-07T14:25:35.3094796Z INFO:root:Upload test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.json 2025-09-07T14:25:35.4343047Z INFO:root:Upload test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_performance.json 2025-09-07T14:25:35.5678851Z INFO:root:Upload test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_dynamic_timm_models_amp_training_cuda_h100_performance.json 2025-09-07T14:25:35.6918955Z INFO:root:Upload test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json 2025-09-07T14:25:35.8242039Z INFO:root:Upload test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_performance.json 2025-09-07T14:25:36.0423196Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.json 2025-09-07T14:25:36.1879863Z INFO:root:Upload test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_performance.json 2025-09-07T14:25:36.3427794Z INFO:root:Upload test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_performance.json 2025-09-07T14:25:36.4735508Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.json 2025-09-07T14:25:36.6089975Z INFO:root:Upload test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781837/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json 2025-09-07T14:25:36.8583894Z ##[group]Run cat test/**/*_toprint.log || true 2025-09-07T14:25:36.8584211Z cat test/**/*_toprint.log || true 2025-09-07T14:25:36.8598896Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T14:25:36.8599190Z env: 2025-09-07T14:25:36.8599360Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:36.8599614Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:36.8599961Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:36.8600381Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:36.8600735Z DEVICE_NAME: 2025-09-07T14:25:36.8600904Z DEVICE_TYPE: 2025-09-07T14:25:36.8601071Z ##[endgroup] 2025-09-07T14:25:36.9162030Z cat: 'test/**/*_toprint.log': No such file or directory 2025-09-07T14:25:36.9513757Z ##[group]Run kill "$MONITOR_SCRIPT_PID" 2025-09-07T14:25:36.9514087Z kill "$MONITOR_SCRIPT_PID" 2025-09-07T14:25:36.9527696Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T14:25:36.9527988Z env: 2025-09-07T14:25:36.9528153Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:36.9528413Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:36.9528743Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:36.9529171Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:36.9529538Z DEVICE_NAME: 2025-09-07T14:25:36.9529709Z DEVICE_TYPE: 2025-09-07T14:25:36.9529880Z MONITOR_SCRIPT_PID: 7939 2025-09-07T14:25:36.9530065Z ##[endgroup] 2025-09-07T14:25:37.0088393Z Prepare all required actions 2025-09-07T14:25:37.0088769Z Getting action download info 2025-09-07T14:25:37.2346266Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-09-07T14:25:37.9245379Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-09-07T14:25:39.4914020Z ##[group]Run ./.github/actions/upload-test-artifacts 2025-09-07T14:25:39.4914299Z with: 2025-09-07T14:25:39.4914588Z file-suffix: test-inductor_timm_perf_cuda_h100-4-7-linux.aws.h100_49775781837 2025-09-07T14:25:39.4914930Z s3-bucket: gha-artifacts 2025-09-07T14:25:39.4915303Z env: 2025-09-07T14:25:39.4915470Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:39.4915725Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:39.4916239Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:39.4916682Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:39.4917047Z DEVICE_NAME: 2025-09-07T14:25:39.4917213Z DEVICE_TYPE: 2025-09-07T14:25:39.4917380Z ##[endgroup] 2025-09-07T14:25:39.5962048Z ##[group]Run # Remove any previous test jsons if they exist 2025-09-07T14:25:39.5962436Z # Remove any previous test jsons if they exist 2025-09-07T14:25:39.5962742Z rm -f test-jsons-*.zip 2025-09-07T14:25:39.5963084Z zip -r "test-jsons-${FILE_SUFFIX}.zip" test/test-reports -i '*.json' 2025-09-07T14:25:39.5977097Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T14:25:39.5977388Z env: 2025-09-07T14:25:39.5977554Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:39.5977810Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:39.5978144Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:39.5978574Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:39.5978955Z DEVICE_NAME: 2025-09-07T14:25:39.5979129Z DEVICE_TYPE: 2025-09-07T14:25:39.5979411Z FILE_SUFFIX: test-inductor_timm_perf_cuda_h100-4-7-linux.aws.h100_49775781837 2025-09-07T14:25:39.5979725Z ##[endgroup] 2025-09-07T14:25:39.6503448Z adding: test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_performance.json (deflated 98%) 2025-09-07T14:25:39.6517441Z adding: test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T14:25:39.6531255Z adding: test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T14:25:39.6593040Z adding: test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T14:25:39.6661268Z adding: test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T14:25:39.6681845Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance.json (deflated 98%) 2025-09-07T14:25:39.6695728Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T14:25:39.6709719Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T14:25:39.6730090Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_performance.json (deflated 98%) 2025-09-07T14:25:39.6744015Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T14:25:39.6764612Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_performance.json (deflated 98%) 2025-09-07T14:25:39.6823416Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T14:25:39.6836941Z adding: test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T14:25:39.6898335Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T14:25:39.6917001Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance.json (deflated 98%) 2025-09-07T14:25:39.6983007Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T14:25:39.7003331Z adding: test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_performance.json (deflated 99%) 2025-09-07T14:25:39.7017853Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T14:25:39.7031986Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T14:25:39.7096947Z adding: test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T14:25:39.7152472Z adding: test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T14:25:39.7202601Z adding: test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T14:25:39.7280587Z adding: test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T14:25:39.7294370Z adding: test/test-reports/inductor_export_timm_models_bfloat16_inference_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T14:25:39.7343907Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T14:25:39.7410039Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T14:25:39.7430710Z adding: test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_performance.json (deflated 98%) 2025-09-07T14:25:39.7444467Z adding: test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T14:25:39.7458420Z adding: test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T14:25:39.7478962Z adding: test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_performance.json (deflated 98%) 2025-09-07T14:25:39.7492936Z adding: test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T14:25:39.7513721Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_performance.json (deflated 98%) 2025-09-07T14:25:39.7533884Z adding: test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_performance.json (deflated 98%) 2025-09-07T14:25:39.7587544Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T14:25:39.7607750Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_performance.json (deflated 98%) 2025-09-07T14:25:39.7621830Z adding: test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T14:25:39.7640406Z adding: test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_performance.json (deflated 98%) 2025-09-07T14:25:39.7661422Z adding: test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_performance.json (deflated 99%) 2025-09-07T14:25:39.7675198Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T14:25:39.7719483Z adding: test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T14:25:39.8152253Z ##[group]Run # Remove any previous test reports if they exist 2025-09-07T14:25:39.8152669Z # Remove any previous test reports if they exist 2025-09-07T14:25:39.8153002Z rm -f test-reports-*.zip 2025-09-07T14:25:39.8153406Z zip -r "test-reports-${FILE_SUFFIX}.zip" test/test-reports -i '*.xml' -i '*.csv' 2025-09-07T14:25:39.8166940Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T14:25:39.8167230Z env: 2025-09-07T14:25:39.8167516Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:39.8167766Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:39.8168100Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:39.8168540Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:39.8168897Z DEVICE_NAME: 2025-09-07T14:25:39.8169070Z DEVICE_TYPE: 2025-09-07T14:25:39.8169354Z FILE_SUFFIX: test-inductor_timm_perf_cuda_h100-4-7-linux.aws.h100_49775781837 2025-09-07T14:25:39.8169672Z ##[endgroup] 2025-09-07T14:25:39.8604133Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_performance_compilation_metrics.csv (deflated 48%) 2025-09-07T14:25:39.8605229Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.csv (deflated 49%) 2025-09-07T14:25:39.8606067Z adding: test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.csv (deflated 51%) 2025-09-07T14:25:39.8607022Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.csv (deflated 49%) 2025-09-07T14:25:39.8607999Z adding: test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_performance.csv (deflated 48%) 2025-09-07T14:25:39.8608926Z adding: test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.csv (deflated 51%) 2025-09-07T14:25:39.8609751Z adding: test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_performance.csv (deflated 47%) 2025-09-07T14:25:39.8610518Z adding: test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_performance.csv (deflated 48%) 2025-09-07T14:25:39.8611337Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_performance_compilation_metrics.csv (deflated 48%) 2025-09-07T14:25:39.8612130Z adding: test/test-reports/inductor_cudagraphs_low_precision_timm_models_quant_inference_cuda_h100_accuracy.csv (deflated 50%) 2025-09-07T14:25:39.8612851Z adding: test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_accuracy.csv (deflated 51%) 2025-09-07T14:25:39.8613527Z adding: test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_performance.csv (deflated 49%) 2025-09-07T14:25:39.8614487Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.csv (deflated 49%) 2025-09-07T14:25:39.8615457Z adding: test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.csv (deflated 48%) 2025-09-07T14:25:39.8616286Z adding: test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.csv (deflated 51%) 2025-09-07T14:25:39.8617102Z adding: test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_accuracy.csv (deflated 52%) 2025-09-07T14:25:39.8618124Z adding: test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.csv (deflated 49%) 2025-09-07T14:25:39.8618936Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_performance.csv (deflated 46%) 2025-09-07T14:25:39.8619602Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_performance.csv (deflated 47%) 2025-09-07T14:25:39.8620252Z adding: test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_accuracy.csv (deflated 62%) 2025-09-07T14:25:39.8620934Z adding: test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_performance_compilation_metrics.csv (deflated 48%) 2025-09-07T14:25:39.8621712Z adding: test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_performance.csv (deflated 49%) 2025-09-07T14:25:39.8622459Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_accuracy.csv (deflated 52%) 2025-09-07T14:25:39.8623122Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance.csv (deflated 48%) 2025-09-07T14:25:39.8623799Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.csv (deflated 50%) 2025-09-07T14:25:39.8624523Z adding: test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.csv (deflated 51%) 2025-09-07T14:25:39.8625354Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_accuracy.csv (deflated 49%) 2025-09-07T14:25:39.8625998Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_performance.csv (deflated 48%) 2025-09-07T14:25:39.8626699Z adding: test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_performance_compilation_metrics.csv (deflated 50%) 2025-09-07T14:25:39.8627379Z adding: test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_accuracy.csv (deflated 49%) 2025-09-07T14:25:39.8628035Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.csv (deflated 51%) 2025-09-07T14:25:39.8628684Z adding: test/test-reports/inductor_export_timm_models_bfloat16_inference_cuda_h100_accuracy.csv (deflated 52%) 2025-09-07T14:25:39.8629389Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.csv (deflated 49%) 2025-09-07T14:25:39.8630111Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_performance.csv (deflated 47%) 2025-09-07T14:25:39.8630777Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance.csv (deflated 49%) 2025-09-07T14:25:39.8631439Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.csv (deflated 49%) 2025-09-07T14:25:39.8632148Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_performance_compilation_metrics.csv (deflated 47%) 2025-09-07T14:25:39.8632858Z adding: test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_performance.csv (deflated 49%) 2025-09-07T14:25:39.8633579Z adding: test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_accuracy.csv (deflated 49%) 2025-09-07T14:25:39.8634267Z adding: test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.csv (deflated 49%) 2025-09-07T14:25:39.8635129Z adding: test/test-reports/inductor_cudagraphs_low_precision_timm_models_quant_inference_cuda_h100_performance.csv (deflated 49%) 2025-09-07T14:25:39.8635791Z adding: test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_performance.csv (deflated 46%) 2025-09-07T14:25:39.9328198Z ##[group]Run # Remove any previous usage logs if they exist 2025-09-07T14:25:39.9328733Z # Remove any previous usage logs if they exist 2025-09-07T14:25:39.9329248Z rm -f logs-*.zip 2025-09-07T14:25:39.9329526Z zip "logs-${FILE_SUFFIX}.zip" 'usage_log.txt' || true 2025-09-07T14:25:39.9329907Z zip -r "logs-${FILE_SUFFIX}.zip" test/test-reports -i '*.log' || true 2025-09-07T14:25:39.9344375Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T14:25:39.9344667Z env: 2025-09-07T14:25:39.9344838Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:39.9345224Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:39.9345554Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:39.9345970Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:39.9346330Z DEVICE_NAME: 2025-09-07T14:25:39.9346506Z DEVICE_TYPE: 2025-09-07T14:25:39.9346778Z FILE_SUFFIX: test-inductor_timm_perf_cuda_h100-4-7-linux.aws.h100_49775781837 2025-09-07T14:25:39.9347250Z ##[endgroup] 2025-09-07T14:25:39.9924414Z adding: usage_log.txt (deflated 91%) 2025-09-07T14:25:39.9942829Z 2025-09-07T14:25:39.9943256Z zip error: Nothing to do! (logs-test-inductor_timm_perf_cuda_h100-4-7-linux.aws.h100_49775781837.zip) 2025-09-07T14:25:40.0693991Z ##[group]Run # Remove any previous debugging artifacts if they exist 2025-09-07T14:25:40.0694385Z # Remove any previous debugging artifacts if they exist 2025-09-07T14:25:40.0694693Z rm -f debug-*.zip 2025-09-07T14:25:40.0694909Z if [ -d 'test/debug' ]; then 2025-09-07T14:25:40.0695354Z  zip -r "debug-${FILE_SUFFIX}.zip" test/debug 2025-09-07T14:25:40.0695601Z fi 2025-09-07T14:25:40.0711486Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T14:25:40.0711772Z env: 2025-09-07T14:25:40.0711943Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:40.0712185Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:40.0712524Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:40.0712960Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:40.0713315Z DEVICE_NAME: 2025-09-07T14:25:40.0713483Z DEVICE_TYPE: 2025-09-07T14:25:40.0713768Z FILE_SUFFIX: test-inductor_timm_perf_cuda_h100-4-7-linux.aws.h100_49775781837 2025-09-07T14:25:40.0714090Z ##[endgroup] 2025-09-07T14:25:40.2080999Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-09-07T14:25:40.2081262Z with: 2025-09-07T14:25:40.2081439Z s3-bucket: gha-artifacts 2025-09-07T14:25:40.2081691Z s3-prefix: pytorch/pytorch/17525296438/1/artifact 2025-09-07T14:25:40.2081960Z retention-days: 14 2025-09-07T14:25:40.2082157Z if-no-files-found: warn 2025-09-07T14:25:40.2082373Z path: test-jsons-*.zip 2025-09-07T14:25:40.2082567Z name: artifact 2025-09-07T14:25:40.2082749Z region: us-east-1 2025-09-07T14:25:40.2082943Z env: 2025-09-07T14:25:40.2083121Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:40.2083384Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:40.2083759Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:40.2084219Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:40.2084601Z DEVICE_NAME: 2025-09-07T14:25:40.2085097Z DEVICE_TYPE: 2025-09-07T14:25:40.2085273Z ##[endgroup] 2025-09-07T14:25:40.5120321Z NOTE: s3-prefix specified, ignoring name parameter 2025-09-07T14:25:40.5120669Z With the provided path, there will be 1 file uploaded 2025-09-07T14:25:40.5121003Z Uploading to s3 prefix: pytorch/pytorch/17525296438/1/artifact 2025-09-07T14:25:40.5129673Z Starting upload of test-jsons-test-inductor_timm_perf_cuda_h100-4-7-linux.aws.h100_49775781837.zip 2025-09-07T14:25:40.8928121Z Finished upload of test-jsons-test-inductor_timm_perf_cuda_h100-4-7-linux.aws.h100_49775781837.zip 2025-09-07T14:25:41.1664301Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-09-07T14:25:41.1664570Z with: 2025-09-07T14:25:41.1664761Z s3-bucket: gha-artifacts 2025-09-07T14:25:41.1665416Z s3-prefix: pytorch/pytorch/17525296438/1/artifact 2025-09-07T14:25:41.1665693Z retention-days: 14 2025-09-07T14:25:41.1665905Z if-no-files-found: error 2025-09-07T14:25:41.1666121Z path: test-reports-*.zip 2025-09-07T14:25:41.1666327Z name: artifact 2025-09-07T14:25:41.1666496Z region: us-east-1 2025-09-07T14:25:41.1666683Z env: 2025-09-07T14:25:41.1666844Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:41.1667104Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:41.1667446Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:41.1667880Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:41.1668254Z DEVICE_NAME: 2025-09-07T14:25:41.1668428Z DEVICE_TYPE: 2025-09-07T14:25:41.1668594Z ##[endgroup] 2025-09-07T14:25:41.4677194Z NOTE: s3-prefix specified, ignoring name parameter 2025-09-07T14:25:41.4677630Z With the provided path, there will be 1 file uploaded 2025-09-07T14:25:41.4678056Z Uploading to s3 prefix: pytorch/pytorch/17525296438/1/artifact 2025-09-07T14:25:41.4686282Z Starting upload of test-reports-test-inductor_timm_perf_cuda_h100-4-7-linux.aws.h100_49775781837.zip 2025-09-07T14:25:41.6406540Z Finished upload of test-reports-test-inductor_timm_perf_cuda_h100-4-7-linux.aws.h100_49775781837.zip 2025-09-07T14:25:41.6824333Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-09-07T14:25:41.6824597Z with: 2025-09-07T14:25:41.6824768Z s3-bucket: gha-artifacts 2025-09-07T14:25:41.6825192Z s3-prefix: pytorch/pytorch/17525296438/1/artifact 2025-09-07T14:25:41.6825456Z retention-days: 14 2025-09-07T14:25:41.6825649Z if-no-files-found: ignore 2025-09-07T14:25:41.6825862Z path: logs-*.zip 2025-09-07T14:25:41.6826041Z name: artifact 2025-09-07T14:25:41.6826218Z region: us-east-1 2025-09-07T14:25:41.6826393Z env: 2025-09-07T14:25:41.6826573Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:41.6826830Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:41.6827192Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:41.6827634Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:41.6827997Z DEVICE_NAME: 2025-09-07T14:25:41.6828177Z DEVICE_TYPE: 2025-09-07T14:25:41.6828346Z ##[endgroup] 2025-09-07T14:25:41.9837785Z NOTE: s3-prefix specified, ignoring name parameter 2025-09-07T14:25:41.9838191Z With the provided path, there will be 1 file uploaded 2025-09-07T14:25:41.9838606Z Uploading to s3 prefix: pytorch/pytorch/17525296438/1/artifact 2025-09-07T14:25:41.9846625Z Starting upload of logs-test-inductor_timm_perf_cuda_h100-4-7-linux.aws.h100_49775781837.zip 2025-09-07T14:25:42.1633243Z Finished upload of logs-test-inductor_timm_perf_cuda_h100-4-7-linux.aws.h100_49775781837.zip 2025-09-07T14:25:42.2077355Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-09-07T14:25:42.2077671Z with: 2025-09-07T14:25:42.2077869Z s3-bucket: gha-artifacts 2025-09-07T14:25:42.2078160Z s3-prefix: pytorch/pytorch/17525296438/1/artifact 2025-09-07T14:25:42.2078461Z retention-days: 14 2025-09-07T14:25:42.2078687Z if-no-files-found: ignore 2025-09-07T14:25:42.2078933Z path: debug-*.zip 2025-09-07T14:25:42.2079131Z name: artifact 2025-09-07T14:25:42.2079503Z region: us-east-1 2025-09-07T14:25:42.2079699Z env: 2025-09-07T14:25:42.2079889Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:42.2080143Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:42.2080479Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:42.2080912Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:42.2081272Z DEVICE_NAME: 2025-09-07T14:25:42.2081442Z DEVICE_TYPE: 2025-09-07T14:25:42.2081613Z ##[endgroup] 2025-09-07T14:25:42.4960578Z No files were found with the provided path: debug-*.zip. No artifacts will be uploaded. 2025-09-07T14:25:42.5435878Z ##[group]Run # shellcheck disable=SC2156 2025-09-07T14:25:42.5436235Z # shellcheck disable=SC2156 2025-09-07T14:25:42.5436756Z find . -iname "core.[1-9]*" -exec docker exec "${DOCKER_CONTAINER_ID}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \; 2025-09-07T14:25:42.5451649Z shell: /usr/bin/bash -e {0} 2025-09-07T14:25:42.5451880Z env: 2025-09-07T14:25:42.5452051Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:42.5452313Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:42.5452645Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:42.5453067Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:42.5453434Z DEVICE_NAME: 2025-09-07T14:25:42.5453622Z DEVICE_TYPE: 2025-09-07T14:25:42.5453789Z ##[endgroup] 2025-09-07T14:25:43.2153429Z Prepare all required actions 2025-09-07T14:25:43.2153763Z Getting action download info 2025-09-07T14:25:43.3475532Z ##[group]Run ./.github/actions/upload-utilization-stats 2025-09-07T14:25:43.3475825Z with: 2025-09-07T14:25:43.3475999Z job_id: 49775781837 2025-09-07T14:25:43.3476313Z job_name: test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100) 2025-09-07T14:25:43.3476713Z workflow_name: inductor-perf-nightly-h100 2025-09-07T14:25:43.3476975Z workflow_run_id: 17525296438 2025-09-07T14:25:43.3477212Z workflow_attempt: 1 2025-09-07T14:25:43.3477404Z env: 2025-09-07T14:25:43.3477576Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:43.3477843Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:43.3478214Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:43.3478691Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:43.3479092Z DEVICE_NAME: 2025-09-07T14:25:43.3479292Z DEVICE_TYPE: 2025-09-07T14:25:43.3479474Z ##[endgroup] 2025-09-07T14:25:43.4915523Z ##[group]Run echo "workflow_id: 17525296438" 2025-09-07T14:25:43.4915849Z echo "workflow_id: 17525296438" 2025-09-07T14:25:43.4916137Z echo "workflow_attempt: 1" 2025-09-07T14:25:43.4916468Z echo "workflow_Name: inductor-perf-nightly-h100" 2025-09-07T14:25:43.4916805Z echo "job_id: 49775781837" 2025-09-07T14:25:43.4917228Z echo "job_name: test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100)" 2025-09-07T14:25:43.4917671Z echo "artifact_prefix: " 2025-09-07T14:25:43.4917942Z python3 --version 2025-09-07T14:25:43.4932073Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T14:25:43.4932366Z env: 2025-09-07T14:25:43.4932532Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:43.4932788Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:43.4933129Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:43.4933552Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:43.4933928Z DEVICE_NAME: 2025-09-07T14:25:43.4934092Z DEVICE_TYPE: 2025-09-07T14:25:43.4934255Z ##[endgroup] 2025-09-07T14:25:43.5406306Z workflow_id: 17525296438 2025-09-07T14:25:43.5406585Z workflow_attempt: 1 2025-09-07T14:25:43.5406858Z workflow_Name: inductor-perf-nightly-h100 2025-09-07T14:25:43.5407173Z job_id: 49775781837 2025-09-07T14:25:43.5407754Z job_name: test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100) 2025-09-07T14:25:43.5408157Z artifact_prefix: 2025-09-07T14:25:43.5423812Z Python 3.10.12 2025-09-07T14:25:43.5850163Z ##[group]Run nick-fields/retry@v3.0.0 2025-09-07T14:25:43.5850409Z with: 2025-09-07T14:25:43.5850566Z shell: bash 2025-09-07T14:25:43.5850747Z timeout_minutes: 5 2025-09-07T14:25:43.5850939Z max_attempts: 5 2025-09-07T14:25:43.5851124Z retry_wait_seconds: 30 2025-09-07T14:25:43.5851561Z command: set -eu python3 -m pip install python-dateutil==2.8.2 boto3==1.35.42 pandas==2.1.3 dataclasses_json==0.6.7 2025-09-07T14:25:43.5852022Z polling_interval_seconds: 1 2025-09-07T14:25:43.5852253Z warning_on_retry: true 2025-09-07T14:25:43.5852452Z continue_on_error: false 2025-09-07T14:25:43.5852642Z env: 2025-09-07T14:25:43.5852807Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:43.5853055Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:43.5853399Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:43.5853849Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:43.5854219Z DEVICE_NAME: 2025-09-07T14:25:43.5854398Z DEVICE_TYPE: 2025-09-07T14:25:43.5854564Z ##[endgroup] 2025-09-07T14:25:43.9253367Z Defaulting to user installation because normal site-packages is not writeable 2025-09-07T14:25:44.4893649Z Collecting python-dateutil==2.8.2 2025-09-07T14:25:44.5509386Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB) 2025-09-07T14:25:44.9805380Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 247.7/247.7 KB 558.0 kB/s eta 0:00:00 2025-09-07T14:25:45.8077750Z Collecting boto3==1.35.42 2025-09-07T14:25:45.8211221Z Downloading boto3-1.35.42-py3-none-any.whl (139 kB) 2025-09-07T14:25:46.3752669Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 139.2/139.2 KB 233.6 kB/s eta 0:00:00 2025-09-07T14:25:46.9954489Z Collecting pandas==2.1.3 2025-09-07T14:25:47.0083336Z Downloading pandas-2.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB) 2025-09-07T14:25:47.7074724Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.3/12.3 MB 15.3 MB/s eta 0:00:00 2025-09-07T14:25:47.7450024Z Requirement already satisfied: dataclasses_json==0.6.7 in /home/eve/.local/lib/python3.10/site-packages (0.6.7) 2025-09-07T14:25:47.7465998Z Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil==2.8.2) (1.16.0) 2025-09-07T14:25:47.7511026Z Requirement already satisfied: botocore<1.36.0,>=1.35.42 in /home/eve/.local/lib/python3.10/site-packages (from boto3==1.35.42) (1.35.99) 2025-09-07T14:25:47.7515901Z Requirement already satisfied: s3transfer<0.11.0,>=0.10.0 in /home/eve/.local/lib/python3.10/site-packages (from boto3==1.35.42) (0.10.4) 2025-09-07T14:25:47.7520530Z Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /home/eve/.local/lib/python3.10/site-packages (from boto3==1.35.42) (1.0.1) 2025-09-07T14:25:48.2291157Z Collecting pytz>=2020.1 2025-09-07T14:25:48.2412832Z Downloading pytz-2025.2-py2.py3-none-any.whl (509 kB) 2025-09-07T14:25:48.6610247Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 509.2/509.2 KB 1.2 MB/s eta 0:00:00 2025-09-07T14:25:48.8511668Z Collecting tzdata>=2022.1 2025-09-07T14:25:48.8630989Z Downloading tzdata-2025.2-py2.py3-none-any.whl (347 kB) 2025-09-07T14:25:48.8925253Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 347.8/347.8 KB 12.3 MB/s eta 0:00:00 2025-09-07T14:25:49.2939237Z Collecting numpy<2,>=1.22.4 2025-09-07T14:25:49.3063768Z Downloading numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB) 2025-09-07T14:25:50.1051818Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 13.5 MB/s eta 0:00:00 2025-09-07T14:25:50.1325166Z Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /home/eve/.local/lib/python3.10/site-packages (from dataclasses_json==0.6.7) (3.26.1) 2025-09-07T14:25:50.1331695Z Requirement already satisfied: typing-inspect<1,>=0.4.0 in /home/eve/.local/lib/python3.10/site-packages (from dataclasses_json==0.6.7) (0.9.0) 2025-09-07T14:25:50.1406768Z Requirement already satisfied: urllib3!=2.2.0,<3,>=1.25.4 in /usr/lib/python3/dist-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.26.5) 2025-09-07T14:25:50.1490893Z Requirement already satisfied: packaging>=17.0 in /home/eve/.local/lib/python3.10/site-packages (from marshmallow<4.0.0,>=3.18.0->dataclasses_json==0.6.7) (25.0) 2025-09-07T14:25:50.1591012Z Requirement already satisfied: mypy-extensions>=0.3.0 in /home/eve/.local/lib/python3.10/site-packages (from typing-inspect<1,>=0.4.0->dataclasses_json==0.6.7) (1.1.0) 2025-09-07T14:25:50.1595695Z Requirement already satisfied: typing-extensions>=3.7.4 in /home/eve/.local/lib/python3.10/site-packages (from typing-inspect<1,>=0.4.0->dataclasses_json==0.6.7) (4.15.0) 2025-09-07T14:25:50.4418253Z Installing collected packages: pytz, tzdata, python-dateutil, numpy, pandas, boto3 2025-09-07T14:25:53.7065996Z WARNING: The script f2py is installed in '/home/eve/.local/bin' which is not on PATH. 2025-09-07T14:25:53.7066698Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2025-09-07T14:25:57.7928225Z Attempting uninstall: boto3 2025-09-07T14:25:57.7934547Z Found existing installation: boto3 1.35.33 2025-09-07T14:25:57.8153943Z Uninstalling boto3-1.35.33: 2025-09-07T14:25:57.8173880Z Successfully uninstalled boto3-1.35.33 2025-09-07T14:25:58.6084482Z Successfully installed boto3-1.35.42 numpy-1.26.4 pandas-2.1.3 python-dateutil-2.8.2 pytz-2025.2 tzdata-2025.2 2025-09-07T14:25:59.6693325Z Command completed after 1 attempt(s). 2025-09-07T14:25:59.7130444Z ##[group]Run python3 -m tools.stats.upload_utilization_stats.upload_utilization_stats \ 2025-09-07T14:25:59.7133058Z python3 -m tools.stats.upload_utilization_stats.upload_utilization_stats \ 2025-09-07T14:25:59.7133554Z  --workflow-run-id "17525296438" \ 2025-09-07T14:25:59.7133908Z  --workflow-name "inductor-perf-nightly-h100" \ 2025-09-07T14:25:59.7134225Z  --workflow-run-attempt "1" \ 2025-09-07T14:25:59.7134475Z  --job-id "49775781837" \ 2025-09-07T14:25:59.7134837Z  --job-name "test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100)" \ 2025-09-07T14:25:59.7135384Z  --local-path "" \ 2025-09-07T14:25:59.7135605Z  --artifact-prefix "" 2025-09-07T14:25:59.7150214Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T14:25:59.7150521Z env: 2025-09-07T14:25:59.7150698Z GIT_DEFAULT_BRANCH: main 2025-09-07T14:25:59.7150964Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T14:25:59.7151290Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5231 2025-09-07T14:25:59.7151705Z DOCKER_CONTAINER_ID: f6780263fb6a43de96266e63b4f163682f3d42ec146647fa8e1572b4947dba33 2025-09-07T14:25:59.7152077Z DEVICE_NAME: 2025-09-07T14:25:59.7152253Z DEVICE_TYPE: 2025-09-07T14:25:59.7152413Z ##[endgroup] 2025-09-07T14:26:02.2507768Z repo: pytorch/pytorch 2025-09-07T14:26:02.2508121Z Search for test log in s3 bucket: ossci-utilization 2025-09-07T14:26:02.2508617Z Downloading logs-test-inductor_timm_perf_cuda_h100-4-7-linux.aws.h100_49775781837.zip 2025-09-07T14:26:02.2509318Z extracting usage_log.txt from zip file logs-test-inductor_timm_perf_cuda_h100-4-7-linux.aws.h100_49775781837.zip 2025-09-07T14:26:02.2509857Z Converted Log Model: UtilizationMetadata: 2025-09-07T14:26:02.2511159Z UtilizationMetadata(level='metadata', workflow_id='17525296438', job_id='49775781837', workflow_name='inductor-perf-nightly-h100', job_name='test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100)', usage_collect_interval=4.0, data_model_version=1.5, start_at=1757239765, gpu_count=1, cpu_count=192, gpu_type='pynvml', error=None) 2025-09-07T14:26:02.2512468Z [Db Segments] detected pytest cmd: 4, generated segments: 4 2025-09-07T14:26:02.2512818Z [db model] Peek db timeseries 2025-09-07T14:26:02.2513057Z :{ 2025-09-07T14:26:02.2513241Z "created_at": 1757255161, 2025-09-07T14:26:02.2513485Z "type": "utilization", 2025-09-07T14:26:02.2513711Z "tags": [ 2025-09-07T14:26:02.2513910Z "record" 2025-09-07T14:26:02.2514109Z ], 2025-09-07T14:26:02.2514332Z "time_stamp": 1757239765, 2025-09-07T14:26:02.2514650Z "repo": "pytorch/pytorch", 2025-09-07T14:26:02.2514921Z "workflow_id": 17525296438, 2025-09-07T14:26:02.2515597Z "run_attempt": 1, 2025-09-07T14:26:02.2515818Z "job_id": 49775781837, 2025-09-07T14:26:02.2516091Z "workflow_name": "inductor-perf-nightly-h100", 2025-09-07T14:26:02.2516531Z "job_name": "test-weekly / test (inductor_timm_perf_cuda_h100, 4, 7, linux.aws.h100)", 2025-09-07T14:26:02.2516919Z "json_data": "{}" 2025-09-07T14:26:02.2517117Z } 2025-09-07T14:26:02.2517542Z Writing 1 documents to S3 ossci-utilization/util_metadata/v_1.5/pytorch/pytorch/17525296438/1/49775781837/metadata 2025-09-07T14:26:02.2518335Z Done! Finish writing document to S3 ossci-utilization/util_metadata/v_1.5/pytorch/pytorch/17525296438/1/49775781837/metadata 2025-09-07T14:26:02.2519159Z Writing 1020 documents to S3 ossci-utilization/util_timeseries/v_1.5/pytorch/pytorch/17525296438/1/49775781837/time_series 2025-09-07T14:26:02.2520013Z Done! Finish writing document to S3 ossci-utilization/util_timeseries/v_1.5/pytorch/pytorch/17525296438/1/49775781837/time_series 2025-09-07T14:26:02.3522328Z Post job cleanup. 2025-09-07T14:26:02.4486975Z Post job cleanup. 2025-09-07T14:26:02.5401217Z [command]/usr/bin/git version 2025-09-07T14:26:02.5438608Z git version 2.50.1 2025-09-07T14:26:02.5477107Z Temporarily overriding HOME='/home/eve/_work/_temp/586809e3-8a0c-469c-a6c0-d5621f8a7e62' before making global git config changes 2025-09-07T14:26:02.5477859Z Adding repository directory to the temporary git global config as a safe directory 2025-09-07T14:26:02.5481467Z [command]/usr/bin/git config --global --add safe.directory /home/eve/_work/pytorch/pytorch 2025-09-07T14:26:02.5928227Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-09-07T14:26:02.5963457Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-09-07T14:26:02.6248314Z Entering 'android/libs/fbjni' 2025-09-07T14:26:02.6298713Z Entering 'third_party/FP16' 2025-09-07T14:26:02.6349931Z Entering 'third_party/FXdiv' 2025-09-07T14:26:02.6401107Z Entering 'third_party/NNPACK' 2025-09-07T14:26:02.6452664Z Entering 'third_party/NVTX' 2025-09-07T14:26:02.6504907Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T14:26:02.6556086Z Entering 'third_party/XNNPACK' 2025-09-07T14:26:02.6621725Z Entering 'third_party/aiter' 2025-09-07T14:26:02.6672881Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T14:26:02.6732283Z Entering 'third_party/benchmark' 2025-09-07T14:26:02.6785582Z Entering 'third_party/composable_kernel' 2025-09-07T14:26:02.6843166Z Entering 'third_party/cpp-httplib' 2025-09-07T14:26:02.6894524Z Entering 'third_party/cpuinfo' 2025-09-07T14:26:02.6945094Z Entering 'third_party/cudnn_frontend' 2025-09-07T14:26:02.6994068Z Entering 'third_party/cutlass' 2025-09-07T14:26:02.7051682Z Entering 'third_party/fbgemm' 2025-09-07T14:26:02.7102444Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T14:26:02.7151219Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T14:26:02.7204333Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T14:26:02.7251920Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T14:26:02.7308102Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T14:26:02.7353956Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T14:26:02.7401020Z Entering 'third_party/fbgemm/external/json' 2025-09-07T14:26:02.7451639Z Entering 'third_party/flash-attention' 2025-09-07T14:26:02.7500041Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T14:26:02.7553557Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T14:26:02.7613152Z Entering 'third_party/flatbuffers' 2025-09-07T14:26:02.7664889Z Entering 'third_party/fmt' 2025-09-07T14:26:02.7714590Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T14:26:02.7762619Z Entering 'third_party/gloo' 2025-09-07T14:26:02.7813699Z Entering 'third_party/googletest' 2025-09-07T14:26:02.7865749Z Entering 'third_party/ideep' 2025-09-07T14:26:02.7914014Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T14:26:02.7970344Z Entering 'third_party/ittapi' 2025-09-07T14:26:02.8020377Z Entering 'third_party/kineto' 2025-09-07T14:26:02.8069760Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T14:26:02.8116342Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T14:26:02.8167100Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T14:26:02.8216712Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T14:26:02.8265868Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T14:26:02.8313582Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T14:26:02.8365102Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T14:26:02.8413677Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T14:26:02.8460253Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T14:26:02.8510007Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T14:26:02.8562075Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T14:26:02.8608906Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T14:26:02.8657876Z Entering 'third_party/kleidiai' 2025-09-07T14:26:02.8708546Z Entering 'third_party/mimalloc' 2025-09-07T14:26:02.8757762Z Entering 'third_party/nlohmann' 2025-09-07T14:26:02.8808123Z Entering 'third_party/onnx' 2025-09-07T14:26:02.8873152Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T14:26:02.8926531Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T14:26:02.8978831Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T14:26:02.9026639Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T14:26:02.9073909Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T14:26:02.9121520Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T14:26:02.9170305Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T14:26:02.9217312Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T14:26:02.9264585Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T14:26:02.9310360Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T14:26:02.9360449Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T14:26:02.9409978Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T14:26:02.9477979Z Entering 'third_party/pocketfft' 2025-09-07T14:26:02.9527927Z Entering 'third_party/protobuf' 2025-09-07T14:26:02.9579587Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T14:26:02.9627856Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T14:26:02.9679153Z Entering 'third_party/psimd' 2025-09-07T14:26:02.9729858Z Entering 'third_party/pthreadpool' 2025-09-07T14:26:02.9780595Z Entering 'third_party/pybind11' 2025-09-07T14:26:02.9831601Z Entering 'third_party/python-peachpy' 2025-09-07T14:26:02.9881211Z Entering 'third_party/sleef' 2025-09-07T14:26:02.9930974Z Entering 'third_party/tensorpipe' 2025-09-07T14:26:02.9980562Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T14:26:03.0029233Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T14:26:03.0076127Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T14:26:03.0123898Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T14:26:03.0169536Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T14:26:03.0244863Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-09-07T14:26:03.0270159Z http.https://github.com/.extraheader 2025-09-07T14:26:03.0282733Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-09-07T14:26:03.0312105Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-09-07T14:26:03.0582603Z Entering 'android/libs/fbjni' 2025-09-07T14:26:03.0611947Z http.https://github.com/.extraheader 2025-09-07T14:26:03.0650158Z Entering 'third_party/FP16' 2025-09-07T14:26:03.0677752Z http.https://github.com/.extraheader 2025-09-07T14:26:03.0712417Z Entering 'third_party/FXdiv' 2025-09-07T14:26:03.0740048Z http.https://github.com/.extraheader 2025-09-07T14:26:03.0774580Z Entering 'third_party/NNPACK' 2025-09-07T14:26:03.0802875Z http.https://github.com/.extraheader 2025-09-07T14:26:03.0841040Z Entering 'third_party/NVTX' 2025-09-07T14:26:03.0868665Z http.https://github.com/.extraheader 2025-09-07T14:26:03.0906017Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T14:26:03.0933927Z http.https://github.com/.extraheader 2025-09-07T14:26:03.0970713Z Entering 'third_party/XNNPACK' 2025-09-07T14:26:03.0999190Z http.https://github.com/.extraheader 2025-09-07T14:26:03.1049266Z Entering 'third_party/aiter' 2025-09-07T14:26:03.1078052Z http.https://github.com/.extraheader 2025-09-07T14:26:03.1115039Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T14:26:03.1142143Z http.https://github.com/.extraheader 2025-09-07T14:26:03.1188016Z Entering 'third_party/benchmark' 2025-09-07T14:26:03.1215696Z http.https://github.com/.extraheader 2025-09-07T14:26:03.1251267Z Entering 'third_party/composable_kernel' 2025-09-07T14:26:03.1279863Z http.https://github.com/.extraheader 2025-09-07T14:26:03.1326101Z Entering 'third_party/cpp-httplib' 2025-09-07T14:26:03.1353652Z http.https://github.com/.extraheader 2025-09-07T14:26:03.1389643Z Entering 'third_party/cpuinfo' 2025-09-07T14:26:03.1417602Z http.https://github.com/.extraheader 2025-09-07T14:26:03.1452889Z Entering 'third_party/cudnn_frontend' 2025-09-07T14:26:03.1481292Z http.https://github.com/.extraheader 2025-09-07T14:26:03.1518148Z Entering 'third_party/cutlass' 2025-09-07T14:26:03.1548011Z http.https://github.com/.extraheader 2025-09-07T14:26:03.1594144Z Entering 'third_party/fbgemm' 2025-09-07T14:26:03.1622908Z http.https://github.com/.extraheader 2025-09-07T14:26:03.1661102Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T14:26:03.1688434Z http.https://github.com/.extraheader 2025-09-07T14:26:03.1722967Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T14:26:03.1748835Z http.https://github.com/.extraheader 2025-09-07T14:26:03.1789797Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T14:26:03.1815670Z http.https://github.com/.extraheader 2025-09-07T14:26:03.1852574Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T14:26:03.1878716Z http.https://github.com/.extraheader 2025-09-07T14:26:03.1920918Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T14:26:03.1947405Z http.https://github.com/.extraheader 2025-09-07T14:26:03.1982969Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T14:26:03.2011087Z http.https://github.com/.extraheader 2025-09-07T14:26:03.2045414Z Entering 'third_party/fbgemm/external/json' 2025-09-07T14:26:03.2070696Z http.https://github.com/.extraheader 2025-09-07T14:26:03.2111383Z Entering 'third_party/flash-attention' 2025-09-07T14:26:03.2139395Z http.https://github.com/.extraheader 2025-09-07T14:26:03.2174747Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T14:26:03.2201250Z http.https://github.com/.extraheader 2025-09-07T14:26:03.2243915Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T14:26:03.2270994Z http.https://github.com/.extraheader 2025-09-07T14:26:03.2317060Z Entering 'third_party/flatbuffers' 2025-09-07T14:26:03.2344862Z http.https://github.com/.extraheader 2025-09-07T14:26:03.2382723Z Entering 'third_party/fmt' 2025-09-07T14:26:03.2410209Z http.https://github.com/.extraheader 2025-09-07T14:26:03.2445240Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T14:26:03.2472806Z http.https://github.com/.extraheader 2025-09-07T14:26:03.2508578Z Entering 'third_party/gloo' 2025-09-07T14:26:03.2536013Z http.https://github.com/.extraheader 2025-09-07T14:26:03.2571243Z Entering 'third_party/googletest' 2025-09-07T14:26:03.2599409Z http.https://github.com/.extraheader 2025-09-07T14:26:03.2635617Z Entering 'third_party/ideep' 2025-09-07T14:26:03.2666355Z http.https://github.com/.extraheader 2025-09-07T14:26:03.2730566Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T14:26:03.2757256Z http.https://github.com/.extraheader 2025-09-07T14:26:03.2802520Z Entering 'third_party/ittapi' 2025-09-07T14:26:03.2830730Z http.https://github.com/.extraheader 2025-09-07T14:26:03.2867529Z Entering 'third_party/kineto' 2025-09-07T14:26:03.2894500Z http.https://github.com/.extraheader 2025-09-07T14:26:03.2930758Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T14:26:03.2957758Z http.https://github.com/.extraheader 2025-09-07T14:26:03.2992410Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T14:26:03.3019880Z http.https://github.com/.extraheader 2025-09-07T14:26:03.3056297Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T14:26:03.3083199Z http.https://github.com/.extraheader 2025-09-07T14:26:03.3119452Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T14:26:03.3145603Z http.https://github.com/.extraheader 2025-09-07T14:26:03.3182318Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T14:26:03.3208928Z http.https://github.com/.extraheader 2025-09-07T14:26:03.3243487Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T14:26:03.3270119Z http.https://github.com/.extraheader 2025-09-07T14:26:03.3310593Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T14:26:03.3337220Z http.https://github.com/.extraheader 2025-09-07T14:26:03.3373018Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T14:26:03.3399420Z http.https://github.com/.extraheader 2025-09-07T14:26:03.3435370Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T14:26:03.3461919Z http.https://github.com/.extraheader 2025-09-07T14:26:03.3498818Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T14:26:03.3526178Z http.https://github.com/.extraheader 2025-09-07T14:26:03.3564693Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T14:26:03.3591652Z http.https://github.com/.extraheader 2025-09-07T14:26:03.3627135Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T14:26:03.3652932Z http.https://github.com/.extraheader 2025-09-07T14:26:03.3690908Z Entering 'third_party/kleidiai' 2025-09-07T14:26:03.3719139Z http.https://github.com/.extraheader 2025-09-07T14:26:03.3754439Z Entering 'third_party/mimalloc' 2025-09-07T14:26:03.3782442Z http.https://github.com/.extraheader 2025-09-07T14:26:03.3818040Z Entering 'third_party/nlohmann' 2025-09-07T14:26:03.3845476Z http.https://github.com/.extraheader 2025-09-07T14:26:03.3883385Z Entering 'third_party/onnx' 2025-09-07T14:26:03.3910346Z http.https://github.com/.extraheader 2025-09-07T14:26:03.3961105Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T14:26:03.3988892Z http.https://github.com/.extraheader 2025-09-07T14:26:03.4030412Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T14:26:03.4058377Z http.https://github.com/.extraheader 2025-09-07T14:26:03.4094730Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T14:26:03.4121774Z http.https://github.com/.extraheader 2025-09-07T14:26:03.4157703Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T14:26:03.4183077Z http.https://github.com/.extraheader 2025-09-07T14:26:03.4218771Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T14:26:03.4245200Z http.https://github.com/.extraheader 2025-09-07T14:26:03.4280960Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T14:26:03.4307770Z http.https://github.com/.extraheader 2025-09-07T14:26:03.4344558Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T14:26:03.4371847Z http.https://github.com/.extraheader 2025-09-07T14:26:03.4407243Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T14:26:03.4434172Z http.https://github.com/.extraheader 2025-09-07T14:26:03.4469417Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T14:26:03.4495588Z http.https://github.com/.extraheader 2025-09-07T14:26:03.4529717Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T14:26:03.4556042Z http.https://github.com/.extraheader 2025-09-07T14:26:03.4594827Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T14:26:03.4621535Z http.https://github.com/.extraheader 2025-09-07T14:26:03.4661171Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T14:26:03.4687352Z http.https://github.com/.extraheader 2025-09-07T14:26:03.4742645Z Entering 'third_party/pocketfft' 2025-09-07T14:26:03.4770526Z http.https://github.com/.extraheader 2025-09-07T14:26:03.4805064Z Entering 'third_party/protobuf' 2025-09-07T14:26:03.4832461Z http.https://github.com/.extraheader 2025-09-07T14:26:03.4869047Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T14:26:03.4895253Z http.https://github.com/.extraheader 2025-09-07T14:26:03.4931768Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T14:26:03.4958449Z http.https://github.com/.extraheader 2025-09-07T14:26:03.4997801Z Entering 'third_party/psimd' 2025-09-07T14:26:03.5025529Z http.https://github.com/.extraheader 2025-09-07T14:26:03.5059987Z Entering 'third_party/pthreadpool' 2025-09-07T14:26:03.5088603Z http.https://github.com/.extraheader 2025-09-07T14:26:03.5124023Z Entering 'third_party/pybind11' 2025-09-07T14:26:03.5151486Z http.https://github.com/.extraheader 2025-09-07T14:26:03.5186671Z Entering 'third_party/python-peachpy' 2025-09-07T14:26:03.5214511Z http.https://github.com/.extraheader 2025-09-07T14:26:03.5249698Z Entering 'third_party/sleef' 2025-09-07T14:26:03.5277017Z http.https://github.com/.extraheader 2025-09-07T14:26:03.5312538Z Entering 'third_party/tensorpipe' 2025-09-07T14:26:03.5340923Z http.https://github.com/.extraheader 2025-09-07T14:26:03.5377064Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T14:26:03.5403138Z http.https://github.com/.extraheader 2025-09-07T14:26:03.5438389Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T14:26:03.5464466Z http.https://github.com/.extraheader 2025-09-07T14:26:03.5499603Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T14:26:03.5525853Z http.https://github.com/.extraheader 2025-09-07T14:26:03.5561321Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T14:26:03.5586981Z http.https://github.com/.extraheader 2025-09-07T14:26:03.5620893Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T14:26:03.5647396Z http.https://github.com/.extraheader 2025-09-07T14:26:03.5828190Z Post job cleanup. 2025-09-07T14:26:03.6733254Z [command]/usr/bin/git version 2025-09-07T14:26:03.6771979Z git version 2.50.1 2025-09-07T14:26:03.6813598Z Temporarily overriding HOME='/home/eve/_work/_temp/065b57bf-cdd3-447f-bf84-340816afbf24' before making global git config changes 2025-09-07T14:26:03.6814263Z Adding repository directory to the temporary git global config as a safe directory 2025-09-07T14:26:03.6818418Z [command]/usr/bin/git config --global --add safe.directory /home/eve/_work/pytorch/pytorch 2025-09-07T14:26:03.6854348Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-09-07T14:26:03.6898827Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-09-07T14:26:03.7170905Z Entering 'android/libs/fbjni' 2025-09-07T14:26:03.7222762Z Entering 'third_party/FP16' 2025-09-07T14:26:03.7271571Z Entering 'third_party/FXdiv' 2025-09-07T14:26:03.7320213Z Entering 'third_party/NNPACK' 2025-09-07T14:26:03.7371035Z Entering 'third_party/NVTX' 2025-09-07T14:26:03.7421056Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T14:26:03.7471575Z Entering 'third_party/XNNPACK' 2025-09-07T14:26:03.7536094Z Entering 'third_party/aiter' 2025-09-07T14:26:03.7586840Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T14:26:03.7643854Z Entering 'third_party/benchmark' 2025-09-07T14:26:03.7694443Z Entering 'third_party/composable_kernel' 2025-09-07T14:26:03.7755771Z Entering 'third_party/cpp-httplib' 2025-09-07T14:26:03.7805698Z Entering 'third_party/cpuinfo' 2025-09-07T14:26:03.7857182Z Entering 'third_party/cudnn_frontend' 2025-09-07T14:26:03.7906429Z Entering 'third_party/cutlass' 2025-09-07T14:26:03.7963234Z Entering 'third_party/fbgemm' 2025-09-07T14:26:03.8014599Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T14:26:03.8062273Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T14:26:03.8114342Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T14:26:03.8160758Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T14:26:03.8219504Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T14:26:03.8267874Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T14:26:03.8314751Z Entering 'third_party/fbgemm/external/json' 2025-09-07T14:26:03.8364836Z Entering 'third_party/flash-attention' 2025-09-07T14:26:03.8414836Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T14:26:03.8467943Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T14:26:03.8525668Z Entering 'third_party/flatbuffers' 2025-09-07T14:26:03.8578799Z Entering 'third_party/fmt' 2025-09-07T14:26:03.8628044Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T14:26:03.8678149Z Entering 'third_party/gloo' 2025-09-07T14:26:03.8727874Z Entering 'third_party/googletest' 2025-09-07T14:26:03.8779516Z Entering 'third_party/ideep' 2025-09-07T14:26:03.8827360Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T14:26:03.8883530Z Entering 'third_party/ittapi' 2025-09-07T14:26:03.8933120Z Entering 'third_party/kineto' 2025-09-07T14:26:03.8982603Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T14:26:03.9031177Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T14:26:03.9080349Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T14:26:03.9129621Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T14:26:03.9177498Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T14:26:03.9224132Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T14:26:03.9276103Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T14:26:03.9324098Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T14:26:03.9372724Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T14:26:03.9421202Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T14:26:03.9471640Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T14:26:03.9518290Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T14:26:03.9568728Z Entering 'third_party/kleidiai' 2025-09-07T14:26:03.9620083Z Entering 'third_party/mimalloc' 2025-09-07T14:26:03.9670571Z Entering 'third_party/nlohmann' 2025-09-07T14:26:03.9722589Z Entering 'third_party/onnx' 2025-09-07T14:26:03.9788479Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T14:26:03.9843292Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T14:26:03.9897267Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T14:26:03.9945839Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T14:26:03.9992236Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T14:26:04.0039714Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T14:26:04.0088132Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T14:26:04.0135665Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T14:26:04.0182503Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T14:26:04.0228128Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T14:26:04.0277522Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T14:26:04.0328433Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T14:26:04.0394873Z Entering 'third_party/pocketfft' 2025-09-07T14:26:04.0445221Z Entering 'third_party/protobuf' 2025-09-07T14:26:04.0495695Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T14:26:04.0543454Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T14:26:04.0593927Z Entering 'third_party/psimd' 2025-09-07T14:26:04.0643587Z Entering 'third_party/pthreadpool' 2025-09-07T14:26:04.0693004Z Entering 'third_party/pybind11' 2025-09-07T14:26:04.0742844Z Entering 'third_party/python-peachpy' 2025-09-07T14:26:04.0792128Z Entering 'third_party/sleef' 2025-09-07T14:26:04.0842194Z Entering 'third_party/tensorpipe' 2025-09-07T14:26:04.0890809Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T14:26:04.0939008Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T14:26:04.0987382Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T14:26:04.1034481Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T14:26:04.1080175Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T14:26:04.1153487Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-09-07T14:26:04.1186544Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-09-07T14:26:04.1447771Z Entering 'android/libs/fbjni' 2025-09-07T14:26:04.1498022Z Entering 'third_party/FP16' 2025-09-07T14:26:04.1546216Z Entering 'third_party/FXdiv' 2025-09-07T14:26:04.1593586Z Entering 'third_party/NNPACK' 2025-09-07T14:26:04.1644202Z Entering 'third_party/NVTX' 2025-09-07T14:26:04.1693176Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T14:26:04.1741248Z Entering 'third_party/XNNPACK' 2025-09-07T14:26:04.1806777Z Entering 'third_party/aiter' 2025-09-07T14:26:04.1856077Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T14:26:04.1915499Z Entering 'third_party/benchmark' 2025-09-07T14:26:04.1966243Z Entering 'third_party/composable_kernel' 2025-09-07T14:26:04.2023495Z Entering 'third_party/cpp-httplib' 2025-09-07T14:26:04.2073107Z Entering 'third_party/cpuinfo' 2025-09-07T14:26:04.2122552Z Entering 'third_party/cudnn_frontend' 2025-09-07T14:26:04.2173016Z Entering 'third_party/cutlass' 2025-09-07T14:26:04.2230386Z Entering 'third_party/fbgemm' 2025-09-07T14:26:04.2280735Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T14:26:04.2329218Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T14:26:04.2384219Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T14:26:04.2431389Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T14:26:04.2486556Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T14:26:04.2533969Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T14:26:04.2580883Z Entering 'third_party/fbgemm/external/json' 2025-09-07T14:26:04.2632516Z Entering 'third_party/flash-attention' 2025-09-07T14:26:04.2681785Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T14:26:04.2734860Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T14:26:04.2794587Z Entering 'third_party/flatbuffers' 2025-09-07T14:26:04.2847404Z Entering 'third_party/fmt' 2025-09-07T14:26:04.2896049Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T14:26:04.2946445Z Entering 'third_party/gloo' 2025-09-07T14:26:04.2994817Z Entering 'third_party/googletest' 2025-09-07T14:26:04.3043141Z Entering 'third_party/ideep' 2025-09-07T14:26:04.3090784Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T14:26:04.3146396Z Entering 'third_party/ittapi' 2025-09-07T14:26:04.3195851Z Entering 'third_party/kineto' 2025-09-07T14:26:04.3244430Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T14:26:04.3291142Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T14:26:04.3340492Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T14:26:04.3389538Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T14:26:04.3437913Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T14:26:04.3484393Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T14:26:04.3535558Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T14:26:04.3583724Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T14:26:04.3631700Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T14:26:04.3679580Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T14:26:04.3730471Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T14:26:04.3777505Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T14:26:04.3826918Z Entering 'third_party/kleidiai' 2025-09-07T14:26:04.3876411Z Entering 'third_party/mimalloc' 2025-09-07T14:26:04.3926430Z Entering 'third_party/nlohmann' 2025-09-07T14:26:04.3977744Z Entering 'third_party/onnx' 2025-09-07T14:26:04.4041295Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T14:26:04.4096656Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T14:26:04.4146809Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T14:26:04.4193563Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T14:26:04.4240232Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T14:26:04.4286765Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T14:26:04.4335114Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T14:26:04.4382394Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T14:26:04.4429613Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T14:26:04.4475485Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T14:26:04.4525603Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T14:26:04.4575910Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T14:26:04.4643056Z Entering 'third_party/pocketfft' 2025-09-07T14:26:04.4693712Z Entering 'third_party/protobuf' 2025-09-07T14:26:04.4745395Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T14:26:04.4792865Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T14:26:04.4843337Z Entering 'third_party/psimd' 2025-09-07T14:26:04.4893388Z Entering 'third_party/pthreadpool' 2025-09-07T14:26:04.4943437Z Entering 'third_party/pybind11' 2025-09-07T14:26:04.4991979Z Entering 'third_party/python-peachpy' 2025-09-07T14:26:04.5040811Z Entering 'third_party/sleef' 2025-09-07T14:26:04.5090658Z Entering 'third_party/tensorpipe' 2025-09-07T14:26:04.5139576Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T14:26:04.5187733Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T14:26:04.5234277Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T14:26:04.5281767Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T14:26:04.5327129Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T14:26:04.5522907Z Cleaning up orphan processes