2025-09-07T07:45:24.9494781Z Current runner version: '2.328.0' 2025-09-07T07:45:24.9500354Z Runner name: 'i-05a095f6e498981b2-1003' 2025-09-07T07:45:24.9501138Z Runner group name: 'default' 2025-09-07T07:45:24.9501948Z Machine name: '784802b6db88' 2025-09-07T07:45:24.9504696Z ##[group]GITHUB_TOKEN Permissions 2025-09-07T07:45:24.9506775Z Contents: read 2025-09-07T07:45:24.9507332Z Metadata: read 2025-09-07T07:45:24.9507884Z ##[endgroup] 2025-09-07T07:45:24.9510078Z Secret source: Actions 2025-09-07T07:45:24.9510761Z Prepare workflow directory 2025-09-07T07:45:25.1937454Z Prepare all required actions 2025-09-07T07:45:25.1972184Z Getting action download info 2025-09-07T07:45:25.5114981Z Download action repository 'pytorch/test-infra@main' (SHA:548a4bc624d43a01cdf165a63b041f0ae014ddbd) 2025-09-07T07:46:38.2202831Z Download action repository 'pytorch/pytorch@main' (SHA:ada43ed39c80b746b4822c92640a1882619e2795) 2025-09-07T07:54:00.3426215Z Download action repository 'actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065' (SHA:a26af69be951a213d495a4c3e4e4022e16d87065) 2025-09-07T07:54:07.6227137Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722) 2025-09-07T07:54:11.4339239Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076) 2025-09-07T07:54:12.9521406Z Download action repository 'seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-09-07T07:54:16.1602556Z Getting action download info 2025-09-07T07:54:16.3283614Z Download action repository 'actions/checkout@v4' (SHA:08eba0b27e820071cde6df949e0beb9ba4906955) 2025-09-07T07:54:20.9770431Z Getting action download info 2025-09-07T07:54:21.0977182Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e) 2025-09-07T07:54:23.0296902Z Getting action download info 2025-09-07T07:54:23.1402545Z Download action repository 'nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482' (SHA:3e91a01664abd3c5cd539100d10d33b9c5b68482) 2025-09-07T07:54:24.7983935Z Getting action download info 2025-09-07T07:54:24.9295020Z Uses: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/heads/main (93fb23d6fae7c4e82c4239a1033e522088742634) 2025-09-07T07:54:24.9299022Z ##[group] Inputs 2025-09-07T07:54:24.9299371Z build-environment: linux-jammy-cuda12.8-py3.10-gcc9-sm90 2025-09-07T07:54:24.9304604Z test-matrix: {"include": [{"config": "inductor_huggingface_perf_cuda_h100", "shard": 1, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 2, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 3, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 4, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 5, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 1, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 2, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 3, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 4, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 5, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 6, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 7, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 1, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 2, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 3, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 4, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 5, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 6, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 7, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 8, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 9, "num_shards": 9, "runner": "linux.aws.h100"}]} 2025-09-07T07:54:24.9310079Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T07:54:24.9310758Z sync-tag: 2025-09-07T07:54:24.9311474Z timeout-minutes: 1440 2025-09-07T07:54:24.9311692Z use-gha: 2025-09-07T07:54:24.9312542Z dashboard-tag: training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true 2025-09-07T07:54:24.9313468Z s3-bucket: gha-artifacts 2025-09-07T07:54:24.9313673Z aws-role-to-assume: 2025-09-07T07:54:24.9314498Z disable-monitor: false 2025-09-07T07:54:24.9314765Z monitor-log-interval: 15 2025-09-07T07:54:24.9314995Z monitor-data-collect-interval: 4 2025-09-07T07:54:24.9315246Z ##[endgroup] 2025-09-07T07:54:24.9315557Z Complete job name: test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100) 2025-09-07T07:54:25.0204630Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main 2025-09-07T07:54:25.0205302Z with: 2025-09-07T07:54:25.0205768Z github-secret: *** 2025-09-07T07:54:25.0206297Z instructions: All testing is done inside the container, to start an interactive session run: docker exec -it $(docker container ps --format '{{.ID}}') bash 2025-09-07T07:54:25.0206855Z activate-with-label: false 2025-09-07T07:54:25.0207060Z label: with-ssh 2025-09-07T07:54:25.0207240Z remove-existing-keys: true 2025-09-07T07:54:25.0207438Z fail-silently: true 2025-09-07T07:54:25.0207835Z env: 2025-09-07T07:54:25.0208007Z GIT_DEFAULT_BRANCH: main 2025-09-07T07:54:25.0208200Z ##[endgroup] 2025-09-07T07:54:25.1278402Z Please see https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-for-Github-Actions for more info. 2025-09-07T07:54:25.1279350Z Not on pull request and ciflow reference could not be extracted, skipping adding ssh keys 2025-09-07T07:54:25.1487060Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main 2025-09-07T07:54:25.1487410Z with: 2025-09-07T07:54:25.1487577Z no-sudo: true 2025-09-07T07:54:25.1487756Z submodules: recursive 2025-09-07T07:54:25.1488005Z fetch-depth: 0 2025-09-07T07:54:25.1488167Z env: 2025-09-07T07:54:25.1488327Z GIT_DEFAULT_BRANCH: main 2025-09-07T07:54:25.1488536Z ##[endgroup] 2025-09-07T07:54:25.2385372Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T07:54:25.2386145Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T07:54:25.2410473Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T07:54:25.2410799Z env: 2025-09-07T07:54:25.2410967Z GIT_DEFAULT_BRANCH: main 2025-09-07T07:54:25.2411165Z ##[endgroup] 2025-09-07T07:54:25.3705584Z ##[group]Run actions/checkout@v4 2025-09-07T07:54:25.3705835Z with: 2025-09-07T07:54:25.3706023Z ref: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T07:54:25.3706265Z fetch-depth: 0 2025-09-07T07:54:25.3706450Z submodules: recursive 2025-09-07T07:54:25.3706649Z show-progress: false 2025-09-07T07:54:25.3706868Z repository: pytorch/pytorch 2025-09-07T07:54:25.3707187Z token: *** 2025-09-07T07:54:25.3707623Z ssh-strict: true 2025-09-07T07:54:25.3707812Z ssh-user: git 2025-09-07T07:54:25.3707987Z persist-credentials: true 2025-09-07T07:54:25.3708186Z clean: true 2025-09-07T07:54:25.3708367Z sparse-checkout-cone-mode: true 2025-09-07T07:54:25.3708598Z fetch-tags: false 2025-09-07T07:54:25.3708766Z lfs: false 2025-09-07T07:54:25.3708948Z set-safe-directory: true 2025-09-07T07:54:25.3709153Z env: 2025-09-07T07:54:25.3709310Z GIT_DEFAULT_BRANCH: main 2025-09-07T07:54:25.3709493Z ##[endgroup] 2025-09-07T07:54:25.4685120Z Syncing repository: pytorch/pytorch 2025-09-07T07:54:25.4686301Z ##[group]Getting Git version info 2025-09-07T07:54:25.4686665Z Working directory is '/home/charlie/_work/pytorch/pytorch' 2025-09-07T07:54:25.4687152Z [command]/usr/bin/git version 2025-09-07T07:54:25.4697114Z git version 2.50.1 2025-09-07T07:54:25.4723390Z ##[endgroup] 2025-09-07T07:54:25.4736357Z Temporarily overriding HOME='/home/charlie/_work/_temp/f6bd1fe4-9239-4bef-839e-f43df1db8517' before making global git config changes 2025-09-07T07:54:25.4737151Z Adding repository directory to the temporary git global config as a safe directory 2025-09-07T07:54:25.4741609Z [command]/usr/bin/git config --global --add safe.directory /home/charlie/_work/pytorch/pytorch 2025-09-07T07:54:25.5021305Z Deleting the contents of '/home/charlie/_work/pytorch/pytorch' 2025-09-07T07:54:25.5024367Z ##[group]Initializing the repository 2025-09-07T07:54:25.5027647Z [command]/usr/bin/git init /home/charlie/_work/pytorch/pytorch 2025-09-07T07:54:25.7321791Z hint: Using 'master' as the name for the initial branch. This default branch name 2025-09-07T07:54:25.7322972Z hint: is subject to change. To configure the initial branch name to use in all 2025-09-07T07:54:25.7323568Z hint: of your new repositories, which will suppress this warning, call: 2025-09-07T07:54:25.7324071Z hint: 2025-09-07T07:54:25.7324311Z hint: git config --global init.defaultBranch 2025-09-07T07:54:25.7324579Z hint: 2025-09-07T07:54:25.7324852Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2025-09-07T07:54:25.7325278Z hint: 'development'. The just-created branch can be renamed via this command: 2025-09-07T07:54:25.7325632Z hint: 2025-09-07T07:54:25.7325850Z hint: git branch -m 2025-09-07T07:54:25.7326071Z hint: 2025-09-07T07:54:25.7326393Z hint: Disable this message with "git config set advice.defaultBranchName false" 2025-09-07T07:54:25.7326893Z Initialized empty Git repository in /home/charlie/_work/pytorch/pytorch/.git/ 2025-09-07T07:54:25.7335968Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch 2025-09-07T07:54:25.8201211Z ##[endgroup] 2025-09-07T07:54:25.8201715Z ##[group]Disabling automatic garbage collection 2025-09-07T07:54:25.8204627Z [command]/usr/bin/git config --local gc.auto 0 2025-09-07T07:54:25.9314335Z ##[endgroup] 2025-09-07T07:54:25.9314682Z ##[group]Setting up auth 2025-09-07T07:54:25.9320730Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-09-07T07:54:25.9355318Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-09-07T07:54:25.9623048Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-09-07T07:54:25.9653242Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-09-07T07:54:25.9896584Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-09-07T07:54:26.0791763Z ##[endgroup] 2025-09-07T07:54:26.0792533Z ##[group]Fetching the repository 2025-09-07T07:54:26.0803545Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-09-07T07:55:55.5650620Z From https://github.com/pytorch/pytorch 2025-09-07T07:55:55.5651123Z * [new branch] 160583 -> origin/160583 2025-09-07T07:55:55.5651745Z * [new branch] 2.6.0.dev20241004+ -> origin/2.6.0.dev20241004+ 2025-09-07T07:55:55.5652172Z * [new branch] 5addvllmbuild -> origin/5addvllmbuild 2025-09-07T07:55:55.5652681Z * [new branch] AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest 2025-09-07T07:55:55.5654964Z * [new branch] HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes 2025-09-07T07:55:55.5656188Z * [new branch] ISSUE-154849 -> origin/ISSUE-154849 2025-09-07T07:55:55.5658793Z * [new branch] JackCaoG/dynamo_make_fx_non_core_aten_ops -> origin/JackCaoG/dynamo_make_fx_non_core_aten_ops 2025-09-07T07:55:55.5660409Z * [new branch] NicoshevSVE128 -> origin/NicoshevSVE128 2025-09-07T07:55:55.5662024Z * [new branch] PR-AOTInductorNoneBug -> origin/PR-AOTInductorNoneBug 2025-09-07T07:55:55.5663866Z * [new branch] PR-AOTInductorNoneBugFix -> origin/PR-AOTInductorNoneBugFix 2025-09-07T07:55:55.5665564Z * [new branch] PR-FixConfigsIssue -> origin/PR-FixConfigsIssue 2025-09-07T07:55:55.5667015Z * [new branch] PR-NoneBugFix-viable -> origin/PR-NoneBugFix-viable 2025-09-07T07:55:55.5668601Z * [new branch] PR-ResetToZero -> origin/PR-ResetToZero 2025-09-07T07:55:55.5670290Z * [new branch] Update-Flash-Packaging -> origin/Update-Flash-Packaging 2025-09-07T07:55:55.5671905Z * [new branch] VLA_exp -> origin/VLA_exp 2025-09-07T07:55:55.5674213Z * [new branch] actually-run-mps-aot-inductor -> origin/actually-run-mps-aot-inductor 2025-09-07T07:55:55.5675821Z * [new branch] add-missing-args-normalization -> origin/add-missing-args-normalization 2025-09-07T07:55:55.5677597Z * [new branch] add-user-guide-structure -> origin/add-user-guide-structure 2025-09-07T07:55:55.5679382Z * [new branch] add-vllm-nightly-build -> origin/add-vllm-nightly-build 2025-09-07T07:55:55.5680909Z * [new branch] add_compile_benchmarking -> origin/add_compile_benchmarking 2025-09-07T07:55:55.5682579Z * [new branch] addmm-heuristic -> origin/addmm-heuristic 2025-09-07T07:55:55.5684342Z * [new branch] addsimde -> origin/addsimde 2025-09-07T07:55:55.5686796Z * [new branch] addvllmtest -> origin/addvllmtest 2025-09-07T07:55:55.5688348Z * [new branch] adi/acl_upgrade -> origin/adi/acl_upgrade 2025-09-07T07:55:55.5689899Z * [new branch] adi/test -> origin/adi/test 2025-09-07T07:55:55.5691566Z * [new branch] adi/test_bgemm -> origin/adi/test_bgemm 2025-09-07T07:55:55.5693091Z * [new branch] adi/test_fusions -> origin/adi/test_fusions 2025-09-07T07:55:55.5695163Z * [new branch] adi/test_onednn_v3.9 -> origin/adi/test_onednn_v3.9 2025-09-07T07:55:55.5696926Z * [new branch] adi/test_presve_change -> origin/adi/test_presve_change 2025-09-07T07:55:55.5698290Z * [new branch] adi/test_timm -> origin/adi/test_timm 2025-09-07T07:55:55.5699822Z * [new branch] adi/testpresve_change -> origin/adi/testpresve_change 2025-09-07T07:55:55.5702655Z * [new branch] aditew01/test/vec_bf16 -> origin/aditew01/test/vec_bf16 2025-09-07T07:55:55.5704447Z * [new branch] ah-globalfeedback-hook -> origin/ah-globalfeedback-hook 2025-09-07T07:55:55.5706169Z * [new branch] alt-disable -> origin/alt-disable 2025-09-07T07:55:55.5708739Z * [new branch] angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files 2025-09-07T07:55:55.5710122Z * [new branch] angelayi/aoti_inductor_fx -> origin/angelayi/aoti_inductor_fx 2025-09-07T07:55:55.5711820Z * [new branch] angelayi/benchmark -> origin/angelayi/benchmark 2025-09-07T07:55:55.5713482Z * [new branch] angelayi/benchmark2 -> origin/angelayi/benchmark2 2025-09-07T07:55:55.5715436Z * [new branch] angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization 2025-09-07T07:55:55.5716921Z * [new branch] angelayi/cpp_loader -> origin/angelayi/cpp_loader 2025-09-07T07:55:55.5718719Z * [new branch] angelayi/custom_op_subgraph -> origin/angelayi/custom_op_subgraph 2025-09-07T07:55:55.5720199Z * [new branch] angelayi/customop -> origin/angelayi/customop 2025-09-07T07:55:55.5721727Z * [new branch] angelayi/fake_cache_empty -> origin/angelayi/fake_cache_empty 2025-09-07T07:55:55.5723302Z * [new branch] angelayi/is_symbolic_tracing -> origin/angelayi/is_symbolic_tracing 2025-09-07T07:55:55.5725214Z * [new branch] angelayi/item -> origin/angelayi/item 2025-09-07T07:55:55.5726723Z * [new branch] angelayi/no_so_weight -> origin/angelayi/no_so_weight 2025-09-07T07:55:55.5728206Z * [new branch] angelayi/opoverload -> origin/angelayi/opoverload 2025-09-07T07:55:55.5729710Z * [new branch] angelayi/pattern -> origin/angelayi/pattern 2025-09-07T07:55:55.5731404Z * [new branch] angelayi/pytree -> origin/angelayi/pytree 2025-09-07T07:55:55.5733165Z * [new branch] angelayi/scan_layers -> origin/angelayi/scan_layers 2025-09-07T07:55:55.5735220Z * [new branch] angelayi/symint_input -> origin/angelayi/symint_input 2025-09-07T07:55:55.5736714Z * [new branch] angelayi/test_cpp -> origin/angelayi/test_cpp 2025-09-07T07:55:55.5738312Z * [new branch] angelayi/torch_size -> origin/angelayi/torch_size 2025-09-07T07:55:55.5739928Z * [new branch] aoti-cuda-alloc -> origin/aoti-cuda-alloc 2025-09-07T07:55:55.5741524Z * [new branch] aoti_target_windows -> origin/aoti_target_windows 2025-09-07T07:55:55.5743157Z * [new branch] aoti_weight_sharing -> origin/aoti_weight_sharing 2025-09-07T07:55:55.5745203Z * [new branch] atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124 2025-09-07T07:55:55.5747027Z * [new branch] atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1 2025-09-07T07:55:55.5748423Z * [new branch] atalman-patch-1 -> origin/atalman-patch-1 2025-09-07T07:55:55.5750111Z * [new branch] atalman-patch-3 -> origin/atalman-patch-3 2025-09-07T07:55:55.5751690Z * [new branch] atalman-patch-4 -> origin/atalman-patch-4 2025-09-07T07:55:55.5753350Z * [new branch] atalman-patch-5 -> origin/atalman-patch-5 2025-09-07T07:55:55.5755393Z * [new branch] atalman-patch-6 -> origin/atalman-patch-6 2025-09-07T07:55:55.5756885Z * [new branch] atalman_inductor_2.3.0 -> origin/atalman_inductor_2.3.0 2025-09-07T07:55:55.5758685Z * [new branch] atalman_inductor_2.3.1 -> origin/atalman_inductor_2.3.1 2025-09-07T07:55:55.5760295Z * [new branch] atalman_inductor_2.4.0 -> origin/atalman_inductor_2.4.0 2025-09-07T07:55:55.5762031Z * [new branch] atalman_inductor_2.4.x -> origin/atalman_inductor_2.4.x 2025-09-07T07:55:55.5763693Z * [new branch] autoupdate-transformers-pin-via-pr -> origin/autoupdate-transformers-pin-via-pr 2025-09-07T07:55:55.5766131Z * [new branch] bahuang/dtensor_demo -> origin/bahuang/dtensor_demo 2025-09-07T07:55:55.5767668Z * [new branch] bahuang/test -> origin/bahuang/test 2025-09-07T07:55:55.5770035Z * [new branch] base/1.5 -> origin/base/1.5 2025-09-07T07:55:55.5771744Z * [new branch] batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention 2025-09-07T07:55:55.5773301Z * [new branch] bc-lint-config -> origin/bc-lint-config 2025-09-07T07:55:55.5775296Z * [new branch] bc-lint-test-new-config -> origin/bc-lint-test-new-config 2025-09-07T07:55:55.5777056Z * [new branch] benchmark-updates -> origin/benchmark-updates 2025-09-07T07:55:55.5778681Z * [new branch] benchmarker_compat_with_do_bench -> origin/benchmarker_compat_with_do_bench 2025-09-07T07:55:55.5780252Z * [new branch] benchmarking-script -> origin/benchmarking-script 2025-09-07T07:55:55.5782636Z * [new branch] bertmaher/pinbump26 -> origin/bertmaher/pinbump26 2025-09-07T07:55:55.5785168Z * [new branch] bertrand/cutlass -> origin/bertrand/cutlass 2025-09-07T07:55:55.5787455Z * [new branch] bf/cg-custom-wrapper -> origin/bf/cg-custom-wrapper 2025-09-07T07:55:55.5788946Z * [new branch] bf/cg-or-error -> origin/bf/cg-or-error 2025-09-07T07:55:55.5790461Z * [new branch] bf/cg-remove-check -> origin/bf/cg-remove-check 2025-09-07T07:55:55.5791966Z * [new branch] bf/cg-skip-1-kernel -> origin/bf/cg-skip-1-kernel 2025-09-07T07:55:55.5793539Z * [new branch] bf/cudagraph -> origin/bf/cudagraph 2025-09-07T07:55:55.5795501Z * [new branch] bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation 2025-09-07T07:55:55.5797247Z * [new branch] bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark 2025-09-07T07:55:55.5798526Z * [new branch] bf/cudagraph-partition -> origin/bf/cudagraph-partition 2025-09-07T07:55:55.5800182Z * [new branch] bf/default-recompile-reason -> origin/bf/default-recompile-reason 2025-09-07T07:55:55.5801812Z * [new branch] bf/donated-buffer-bench -> origin/bf/donated-buffer-bench 2025-09-07T07:55:55.5803264Z * [new branch] bf/exp -> origin/bf/exp 2025-09-07T07:55:55.5805328Z * [new branch] bf/pa-non-divisible -> origin/bf/pa-non-divisible 2025-09-07T07:55:55.5806994Z * [new branch] bf/partition-move-cpu -> origin/bf/partition-move-cpu 2025-09-07T07:55:55.5808474Z * [new branch] bf/partition-turn-on -> origin/bf/partition-turn-on 2025-09-07T07:55:55.5809873Z * [new branch] bf/remove-check-55b0c39d -> origin/bf/remove-check-55b0c39d 2025-09-07T07:55:55.5811327Z * [new branch] bf/rope -> origin/bf/rope 2025-09-07T07:55:55.5813036Z * [new branch] bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492 2025-09-07T07:55:55.5814968Z * [new branch] bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb 2025-09-07T07:55:55.5816478Z * [new branch] bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129 2025-09-07T07:55:55.5818111Z * [new branch] bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d 2025-09-07T07:55:55.5819876Z * [new branch] bisect_perf_hf_T5_5268754e -> origin/bisect_perf_hf_T5_5268754e 2025-09-07T07:55:55.5821462Z * [new branch] bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c 2025-09-07T07:55:55.5823095Z * [new branch] bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c 2025-09-07T07:55:55.5825107Z * [new branch] bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f 2025-09-07T07:55:55.5826529Z * [new branch] bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0 2025-09-07T07:55:55.5828293Z * [new branch] bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149 2025-09-07T07:55:55.5829867Z * [new branch] bisect_perf_hf_T5_d65f194a -> origin/bisect_perf_hf_T5_d65f194a 2025-09-07T07:55:55.5831434Z * [new branch] bisect_perf_hf_T5_da94ab0b -> origin/bisect_perf_hf_T5_da94ab0b 2025-09-07T07:55:55.5833118Z * [new branch] bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new 2025-09-07T07:55:55.5835001Z * [new branch] bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8 2025-09-07T07:55:55.5836557Z * [new branch] bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2 2025-09-07T07:55:55.5838355Z * [new branch] bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563 2025-09-07T07:55:55.5840676Z * [new branch] bowbao/bench_updates_stage -> origin/bowbao/bench_updates_stage 2025-09-07T07:55:55.5842161Z * [new branch] bowbao/dort_rewriter -> origin/bowbao/dort_rewriter 2025-09-07T07:55:55.5843534Z * [new branch] bowbao/wip_prs -> origin/bowbao/wip_prs 2025-09-07T07:55:55.5846260Z * [new branch] brister/break_tensorbox -> origin/brister/break_tensorbox 2025-09-07T07:55:55.5847747Z * [new branch] brister/custom_fx_backend -> origin/brister/custom_fx_backend 2025-09-07T07:55:55.5849316Z * [new branch] brister/fx_custom_triton -> origin/brister/fx_custom_triton 2025-09-07T07:55:55.5850781Z * [new branch] brister/tensor_box_output -> origin/brister/tensor_box_output 2025-09-07T07:55:55.5852300Z * [new branch] brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check 2025-09-07T07:55:55.5854065Z * [new branch] c57382a49 -> origin/c57382a49 2025-09-07T07:55:55.5855805Z * [new branch] ca_0431d47eaa -> origin/ca_0431d47eaa 2025-09-07T07:55:55.5857446Z * [new branch] ca_fix_0431d47eaa -> origin/ca_fix_0431d47eaa 2025-09-07T07:55:55.5860146Z * [new branch] camyll/revert-94bc900da97ad7f3c35b3b819bb53b23c74b581a-for-release-2.8 -> origin/camyll/revert-94bc900da97ad7f3c35b3b819bb53b23c74b581a-for-release-2.8 2025-09-07T07:55:55.5862158Z * [new branch] camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push 2025-09-07T07:55:55.5885024Z * [new branch] cherry-pick-149654-by-pytorch_bot_bot_ -> origin/cherry-pick-149654-by-pytorch_bot_bot_ 2025-09-07T07:55:55.5885674Z * [new branch] cherry-pick-151939-by-pytorch_bot_bot_ -> origin/cherry-pick-151939-by-pytorch_bot_bot_ 2025-09-07T07:55:55.5886248Z * [new branch] cherry-pick-154174-by-pytorch_bot_bot_ -> origin/cherry-pick-154174-by-pytorch_bot_bot_ 2025-09-07T07:55:55.5886808Z * [new branch] cherry-pick-156260-by-pytorch_bot_bot_ -> origin/cherry-pick-156260-by-pytorch_bot_bot_ 2025-09-07T07:55:55.5887383Z * [new branch] cherry-pick-157453-by-pytorch_bot_bot_ -> origin/cherry-pick-157453-by-pytorch_bot_bot_ 2025-09-07T07:55:55.5887936Z * [new branch] cherry-pick-157513-by-pytorch_bot_bot_ -> origin/cherry-pick-157513-by-pytorch_bot_bot_ 2025-09-07T07:55:55.5888488Z * [new branch] cherry-pick-157695-by-pytorch_bot_bot_ -> origin/cherry-pick-157695-by-pytorch_bot_bot_ 2025-09-07T07:55:55.5889038Z * [new branch] cherry-pick-157732-by-pytorch_bot_bot_ -> origin/cherry-pick-157732-by-pytorch_bot_bot_ 2025-09-07T07:55:55.5889580Z * [new branch] cherry-pick-158537-by-pytorch_bot_bot_ -> origin/cherry-pick-158537-by-pytorch_bot_bot_ 2025-09-07T07:55:55.5890123Z * [new branch] cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_ 2025-09-07T07:55:55.5890835Z * [new branch] cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_ 2025-09-07T07:55:55.5891302Z * [new branch] chilli/flex_vllm -> origin/chilli/flex_vllm 2025-09-07T07:55:55.5891769Z * [new branch] cleanup-inductor-benchmark-images -> origin/cleanup-inductor-benchmark-images 2025-09-07T07:55:55.5892238Z * [new branch] codex-testing -> origin/codex-testing 2025-09-07T07:55:55.5892736Z * [new branch] codex/add-helper-function-to-sizevars.py -> origin/codex/add-helper-function-to-sizevars.py 2025-09-07T07:55:55.5893423Z * [new branch] codex/add-helper-function-to-sizevars.py_2025-09-05 -> origin/codex/add-helper-function-to-sizevars.py_2025-09-05 2025-09-07T07:55:55.5894276Z * [new branch] codex/add-metadata-field-for-file-path -> origin/codex/add-metadata-field-for-file-path 2025-09-07T07:55:55.5896037Z * [new branch] codex/add-test-for-inductor-local-cache-behavior -> origin/codex/add-test-for-inductor-local-cache-behavior 2025-09-07T07:55:55.5897511Z * [new branch] codex/create-test-for-tensor-memory-leak-in-cudagraph -> origin/codex/create-test-for-tensor-memory-leak-in-cudagraph 2025-09-07T07:55:55.5898835Z * [new branch] codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch 2025-09-07T07:55:55.5900385Z * [new branch] codex/fix-issue-160415-in-pytorch -> origin/codex/fix-issue-160415-in-pytorch 2025-09-07T07:55:55.5902002Z * [new branch] codex/fix-noqengine-quantized-engine-support -> origin/codex/fix-noqengine-quantized-engine-support 2025-09-07T07:55:55.5903313Z * [new branch] codex/fix-pin_memory-error-handling -> origin/codex/fix-pin_memory-error-handling 2025-09-07T07:55:55.5905174Z * [new branch] codex/propose-fix-for-issue-160332 -> origin/codex/propose-fix-for-issue-160332 2025-09-07T07:55:55.5906852Z * [new branch] codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run 2025-09-07T07:55:55.5908664Z * [new branch] codex/remove-allow-untyped-defs-and-fix-type-errors -> origin/codex/remove-allow-untyped-defs-and-fix-type-errors 2025-09-07T07:55:55.5910304Z * [new branch] compile_fsdp2_disable_stream_and_event -> origin/compile_fsdp2_disable_stream_and_event 2025-09-07T07:55:55.5911973Z * [new branch] context_test -> origin/context_test 2025-09-07T07:55:55.5914518Z * [new branch] copilot/fix-157446 -> origin/copilot/fix-157446 2025-09-07T07:55:55.5916268Z * [new branch] copy_graph -> origin/copy_graph 2025-09-07T07:55:55.5918680Z * [new branch] cpio/fix_new_ami_tests -> origin/cpio/fix_new_ami_tests 2025-09-07T07:55:55.5920914Z * [new branch] csl/always_produce_xml -> origin/csl/always_produce_xml 2025-09-07T07:55:55.5922466Z * [new branch] csl/build_test_more_procs -> origin/csl/build_test_more_procs 2025-09-07T07:55:55.5924139Z * [new branch] csl/build_test_more_procs2 -> origin/csl/build_test_more_procs2 2025-09-07T07:55:55.5925823Z * [new branch] csl/disable_flaky_cpp_test -> origin/csl/disable_flaky_cpp_test 2025-09-07T07:55:55.5927207Z * [new branch] csl/disable_periodic_test -> origin/csl/disable_periodic_test 2025-09-07T07:55:55.5928713Z * [new branch] csl/exclude_rocm_viable_strict -> origin/csl/exclude_rocm_viable_strict 2025-09-07T07:55:55.5930186Z * [new branch] csl/katex -> origin/csl/katex 2025-09-07T07:55:55.5931743Z * [new branch] csl/larger_runner -> origin/csl/larger_runner 2025-09-07T07:55:55.5933282Z * [new branch] csl/lintrunner_stuff -> origin/csl/lintrunner_stuff 2025-09-07T07:55:55.5935208Z * [new branch] csl/mps_sharding -> origin/csl/mps_sharding 2025-09-07T07:55:55.5936612Z * [new branch] csl/multistage_docker -> origin/csl/multistage_docker 2025-09-07T07:55:55.5938165Z * [new branch] csl/name_link_check_job -> origin/csl/name_link_check_job 2025-09-07T07:55:55.5939687Z * [new branch] csl/no_keep_goin_rocm -> origin/csl/no_keep_goin_rocm 2025-09-07T07:55:55.5941238Z * [new branch] csl/not_600_timeout -> origin/csl/not_600_timeout 2025-09-07T07:55:55.5942740Z * [new branch] csl/revert_open -> origin/csl/revert_open 2025-09-07T07:55:55.5944619Z * [new branch] csl/skip_build -> origin/csl/skip_build 2025-09-07T07:55:55.5946147Z * [new branch] csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner 2025-09-07T07:55:55.5947703Z * [new branch] csl/win_sccache -> origin/csl/win_sccache 2025-09-07T07:55:55.5949477Z * [new branch] cublasltrelax2 -> origin/cublasltrelax2 2025-09-07T07:55:55.5951209Z * [new branch] cublasrelax2 -> origin/cublasrelax2 2025-09-07T07:55:55.5952972Z * [new branch] cudnnsdparefactor -> origin/cudnnsdparefactor 2025-09-07T07:55:55.5955259Z * [new branch] custom_lowering_dict -> origin/custom_lowering_dict 2025-09-07T07:55:55.5956913Z * [new branch] czhuge_muon_dev -> origin/czhuge_muon_dev 2025-09-07T07:55:55.5959430Z * [new branch] d4l3k/delete_hook -> origin/d4l3k/delete_hook 2025-09-07T07:55:55.5961197Z * [new branch] dcp_zoc -> origin/dcp_zoc 2025-09-07T07:55:55.5963011Z * [new branch] debug-guard -> origin/debug-guard 2025-09-07T07:55:55.5965261Z * [new branch] delete-quant-docs -> origin/delete-quant-docs 2025-09-07T07:55:55.5970382Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.2 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.2 2025-09-07T07:55:55.5972021Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.3 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.3 2025-09-07T07:55:55.5974048Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.4 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.4 2025-09-07T07:55:55.5975968Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.56.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.56.0 2025-09-07T07:55:55.5977301Z * [new branch] dependabot/pip/dot-ci/docker/protobuf-5.29.5 -> origin/dependabot/pip/dot-ci/docker/protobuf-5.29.5 2025-09-07T07:55:55.5980359Z * [new branch] dependabot/pip/dot-github/requirements/protobuf-5.29.5 -> origin/dependabot/pip/dot-github/requirements/protobuf-5.29.5 2025-09-07T07:55:55.5982479Z * [new branch] desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper 2025-09-07T07:55:55.5984267Z * [new branch] desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64 2025-09-07T07:55:55.5987468Z * [new branch] dev/joona/MPSNDArrayAdd -> origin/dev/joona/MPSNDArrayAdd 2025-09-07T07:55:55.5989126Z * [new branch] dev/joona/Unranked -> origin/dev/joona/Unranked 2025-09-07T07:55:55.5990880Z * [new branch] dev/joona/cat -> origin/dev/joona/cat 2025-09-07T07:55:55.5992619Z * [new branch] dev/joona/cat_remove_graph -> origin/dev/joona/cat_remove_graph 2025-09-07T07:55:55.5994482Z * [new branch] dev/joona/embeddingbag -> origin/dev/joona/embeddingbag 2025-09-07T07:55:55.5996240Z * [new branch] dev/joona/getTensorsString -> origin/dev/joona/getTensorsString 2025-09-07T07:55:55.5998049Z * [new branch] dev/joona/maxpool2dwithindices_errmsg -> origin/dev/joona/maxpool2dwithindices_errmsg 2025-09-07T07:55:55.5999737Z * [new branch] dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14 2025-09-07T07:55:55.6001459Z * [new branch] dev/joona/sdpa -> origin/dev/joona/sdpa 2025-09-07T07:55:55.6003412Z * [new branch] dev/joona/topk_newapi -> origin/dev/joona/topk_newapi 2025-09-07T07:55:55.6005522Z * [new branch] dev/joona/type_inf -> origin/dev/joona/type_inf 2025-09-07T07:55:55.6007156Z * [new branch] dev/joona/upsize3d -> origin/dev/joona/upsize3d 2025-09-07T07:55:55.6008985Z * [new branch] disable -> origin/disable 2025-09-07T07:55:55.6010766Z * [new branch] e2e-baseline -> origin/e2e-baseline 2025-09-07T07:55:55.6012465Z * [new branch] eigen_for_sparse_addmm_v2 -> origin/eigen_for_sparse_addmm_v2 2025-09-07T07:55:55.6015192Z * [new branch] embg/test_inductor_ci_128B -> origin/embg/test_inductor_ci_128B 2025-09-07T07:55:55.6016728Z * [new branch] embg/test_inductor_ci_base -> origin/embg/test_inductor_ci_base 2025-09-07T07:55:55.6018331Z * [new branch] embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control 2025-09-07T07:55:55.6019728Z * [new branch] embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B 2025-09-07T07:55:55.6021168Z * [new branch] embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B 2025-09-07T07:55:55.6023054Z * [new branch] eqy-patch-1 -> origin/eqy-patch-1 2025-09-07T07:55:55.6025265Z * [new branch] eqy-patch-2 -> origin/eqy-patch-2 2025-09-07T07:55:55.6026897Z * [new branch] eqy-patch-3 -> origin/eqy-patch-3 2025-09-07T07:55:55.6028726Z * [new branch] eqy-patch-4 -> origin/eqy-patch-4 2025-09-07T07:55:55.6030666Z * [new branch] example-convert-torch.nn -> origin/example-convert-torch.nn 2025-09-07T07:55:55.6033184Z * [new branch] exclamaforte/add-contiguous-threshold -> origin/exclamaforte/add-contiguous-threshold 2025-09-07T07:55:55.6034968Z * [new branch] exclamaforte/amd-ma -> origin/exclamaforte/amd-ma 2025-09-07T07:55:55.6036515Z * [new branch] exclamaforte/bump-transformer-version -> origin/exclamaforte/bump-transformer-version 2025-09-07T07:55:55.6038372Z * [new branch] exclamaforte/clear-feedback-savers -> origin/exclamaforte/clear-feedback-savers 2025-09-07T07:55:55.6039683Z * [new branch] exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run 2025-09-07T07:55:55.6041326Z * [new branch] exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor 2025-09-07T07:55:55.6042888Z * [new branch] exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion 2025-09-07T07:55:55.6044944Z * [new branch] exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning 2025-09-07T07:55:55.6046345Z * [new branch] exclamaforte/fix-exhuastive-autotuning-reland -> origin/exclamaforte/fix-exhuastive-autotuning-reland 2025-09-07T07:55:55.6047972Z * [new branch] exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg 2025-09-07T07:55:55.6049290Z * [new branch] exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run 2025-09-07T07:55:55.6050890Z * [new branch] exclamaforte/fusion-data -> origin/exclamaforte/fusion-data 2025-09-07T07:55:55.6052568Z * [new branch] exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run 2025-09-07T07:55:55.6054141Z * [new branch] exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model 2025-09-07T07:55:55.6055896Z * [new branch] exclamaforte/gemm-model -> origin/exclamaforte/gemm-model 2025-09-07T07:55:55.6057519Z * [new branch] exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection 2025-09-07T07:55:55.6058949Z * [new branch] exclamaforte/gemm-to-amd -> origin/exclamaforte/gemm-to-amd 2025-09-07T07:55:55.6060481Z * [new branch] exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model 2025-09-07T07:55:55.6062003Z * [new branch] exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor 2025-09-07T07:55:55.6063449Z * [new branch] exclamaforte/max-autotune-ieee -> origin/exclamaforte/max-autotune-ieee 2025-09-07T07:55:55.6065411Z * [new branch] exclamaforte/memory-counter -> origin/exclamaforte/memory-counter 2025-09-07T07:55:55.6066845Z * [new branch] exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo 2025-09-07T07:55:55.6068360Z * [new branch] exclamaforte/profiler-combo -> origin/exclamaforte/profiler-combo 2025-09-07T07:55:55.6070011Z * [new branch] exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode 2025-09-07T07:55:55.6071491Z * [new branch] exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs 2025-09-07T07:55:55.6073095Z * [new branch] exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2 2025-09-07T07:55:55.6075749Z * [new branch] exclamforte/gemm-model-final -> origin/exclamforte/gemm-model-final 2025-09-07T07:55:55.6077505Z * [new branch] exec -> origin/exec 2025-09-07T07:55:55.6079373Z * [new branch] executorch-module-shim -> origin/executorch-module-shim 2025-09-07T07:55:55.6081259Z * [new branch] experimental-mosaic -> origin/experimental-mosaic 2025-09-07T07:55:55.6083037Z * [new branch] export-D58091437 -> origin/export-D58091437 2025-09-07T07:55:55.6085254Z * [new branch] export-D61047529 -> origin/export-D61047529 2025-09-07T07:55:55.6087057Z * [new branch] export-D70112642 -> origin/export-D70112642 2025-09-07T07:55:55.6088926Z * [new branch] export-D71412006 -> origin/export-D71412006 2025-09-07T07:55:55.6091390Z * [new branch] export-D73042989 -> origin/export-D73042989 2025-09-07T07:55:55.6093119Z * [new branch] export-D75183591 -> origin/export-D75183591 2025-09-07T07:55:55.6095270Z * [new branch] export-D75617432 -> origin/export-D75617432 2025-09-07T07:55:55.6097004Z * [new branch] export-D75659965 -> origin/export-D75659965 2025-09-07T07:55:55.6098927Z * [new branch] export-D76080931 -> origin/export-D76080931 2025-09-07T07:55:55.6100722Z * [new branch] export-D76797250 -> origin/export-D76797250 2025-09-07T07:55:55.6102532Z * [new branch] export-D76885271 -> origin/export-D76885271 2025-09-07T07:55:55.6104559Z * [new branch] export-D76885620 -> origin/export-D76885620 2025-09-07T07:55:55.6106497Z * [new branch] export-D76936623 -> origin/export-D76936623 2025-09-07T07:55:55.6108332Z * [new branch] export-D76958268 -> origin/export-D76958268 2025-09-07T07:55:55.6110138Z * [new branch] export-D78375400 -> origin/export-D78375400 2025-09-07T07:55:55.6111974Z * [new branch] export-D78431305 -> origin/export-D78431305 2025-09-07T07:55:55.6113904Z * [new branch] export-D78580107 -> origin/export-D78580107 2025-09-07T07:55:55.6115963Z * [new branch] export-D78822171 -> origin/export-D78822171 2025-09-07T07:55:55.6118378Z * [new branch] export-D78822351 -> origin/export-D78822351 2025-09-07T07:55:55.6120113Z * [new branch] export-D78822507 -> origin/export-D78822507 2025-09-07T07:55:55.6121861Z * [new branch] export-D78826994 -> origin/export-D78826994 2025-09-07T07:55:55.6124302Z * [new branch] export-D78894324 -> origin/export-D78894324 2025-09-07T07:55:55.6126205Z * [new branch] export-D78929245 -> origin/export-D78929245 2025-09-07T07:55:55.6127879Z * [new branch] export-D78934925 -> origin/export-D78934925 2025-09-07T07:55:55.6129571Z * [new branch] export-D78953203 -> origin/export-D78953203 2025-09-07T07:55:55.6131298Z * [new branch] export-D78953229 -> origin/export-D78953229 2025-09-07T07:55:55.6132906Z * [new branch] export-D78957093 -> origin/export-D78957093 2025-09-07T07:55:55.6134868Z * [new branch] export-D78957389 -> origin/export-D78957389 2025-09-07T07:55:55.6136601Z * [new branch] export-D78996107 -> origin/export-D78996107 2025-09-07T07:55:55.6138333Z * [new branch] export-D79026433 -> origin/export-D79026433 2025-09-07T07:55:55.6140064Z * [new branch] export-D79230339 -> origin/export-D79230339 2025-09-07T07:55:55.6141799Z * [new branch] export-D79319835 -> origin/export-D79319835 2025-09-07T07:55:55.6143433Z * [new branch] export-D79328456 -> origin/export-D79328456 2025-09-07T07:55:55.6145482Z * [new branch] export-D79534608 -> origin/export-D79534608 2025-09-07T07:55:55.6147362Z * [new branch] export-D79785974 -> origin/export-D79785974 2025-09-07T07:55:55.6149139Z * [new branch] export-D80025417 -> origin/export-D80025417 2025-09-07T07:55:55.6150878Z * [new branch] export-D80120333 -> origin/export-D80120333 2025-09-07T07:55:55.6152691Z * [new branch] export-D80214882 -> origin/export-D80214882 2025-09-07T07:55:55.6154755Z * [new branch] export-D80319069 -> origin/export-D80319069 2025-09-07T07:55:55.6156484Z * [new branch] export-D80321215 -> origin/export-D80321215 2025-09-07T07:55:55.6158653Z * [new branch] export-D80503451 -> origin/export-D80503451 2025-09-07T07:55:55.6160100Z * [new branch] export-D80771648 -> origin/export-D80771648 2025-09-07T07:55:55.6161777Z * [new branch] export-D80823877 -> origin/export-D80823877 2025-09-07T07:55:55.6163600Z * [new branch] export-D80948073 -> origin/export-D80948073 2025-09-07T07:55:55.6165801Z * [new branch] export-D80958642 -> origin/export-D80958642 2025-09-07T07:55:55.6167411Z * [new branch] export-D80970483 -> origin/export-D80970483 2025-09-07T07:55:55.6169171Z * [new branch] export-D81054193 -> origin/export-D81054193 2025-09-07T07:55:55.6170905Z * [new branch] export-D81060182 -> origin/export-D81060182 2025-09-07T07:55:55.6172776Z * [new branch] export-D81078973 -> origin/export-D81078973 2025-09-07T07:55:55.6174819Z * [new branch] export-D81204584 -> origin/export-D81204584 2025-09-07T07:55:55.6176581Z * [new branch] export-D81284190 -> origin/export-D81284190 2025-09-07T07:55:55.6178288Z * [new branch] export-D81299840 -> origin/export-D81299840 2025-09-07T07:55:55.6180057Z * [new branch] export-D81429090 -> origin/export-D81429090 2025-09-07T07:55:55.6181745Z * [new branch] export-D81698719 -> origin/export-D81698719 2025-09-07T07:55:55.6183895Z * [new branch] export-D81747409 -> origin/export-D81747409 2025-09-07T07:55:55.6186106Z * [new branch] exported-model-train-idempotent -> origin/exported-model-train-idempotent 2025-09-07T07:55:55.6188186Z * [new branch] ezyang/wip-aot-descriptors -> origin/ezyang/wip-aot-descriptors 2025-09-07T07:55:55.6189837Z * [new branch] fa_u8_brgemm -> origin/fa_u8_brgemm 2025-09-07T07:55:55.6191625Z * [new branch] fastmath_baseline -> origin/fastmath_baseline 2025-09-07T07:55:55.6194134Z * [new branch] fbcode/warm -> origin/fbcode/warm 2025-09-07T07:55:55.6196317Z * [new branch] fca -> origin/fca 2025-09-07T07:55:55.6198149Z * [new branch] fca2_ca5984c -> origin/fca2_ca5984c 2025-09-07T07:55:55.6199935Z * [new branch] fca5 -> origin/fca5 2025-09-07T07:55:55.6202321Z * [new branch] feature/function-numa-binding -> origin/feature/function-numa-binding 2025-09-07T07:55:55.6204121Z * [new branch] feature/function-numa-binding-take2 -> origin/feature/function-numa-binding-take2 2025-09-07T07:55:55.6205830Z * [new branch] feature/numa-nproc-fix -> origin/feature/numa-nproc-fix 2025-09-07T07:55:55.6207307Z * [new branch] feature/numa-signpost-serialize -> origin/feature/numa-signpost-serialize 2025-09-07T07:55:55.6208820Z * [new branch] feature/parallel-numa-binding -> origin/feature/parallel-numa-binding 2025-09-07T07:55:55.6211105Z * [new branch] fengyuan/external-proj -> origin/fengyuan/external-proj 2025-09-07T07:55:55.6212734Z * [new branch] fengyuan/out-of-tree-xpu-ops-improve-test -> origin/fengyuan/out-of-tree-xpu-ops-improve-test 2025-09-07T07:55:55.6214517Z * [new branch] fengyuan/out-of-tree-xpu-ops-remove-dtype -> origin/fengyuan/out-of-tree-xpu-ops-remove-dtype 2025-09-07T07:55:55.6215958Z * [new branch] fengyuan/test-xpu -> origin/fengyuan/test-xpu 2025-09-07T07:55:55.6218122Z * [new branch] ffast_math_baseline -> origin/ffast_math_baseline 2025-09-07T07:55:55.6219819Z * [new branch] ffast_math_target -> origin/ffast_math_target 2025-09-07T07:55:55.6222232Z * [new branch] findhao/base_commit -> origin/findhao/base_commit 2025-09-07T07:55:55.6224139Z * [new branch] findhao/base_commit1 -> origin/findhao/base_commit1 2025-09-07T07:55:55.6225669Z * [new branch] findhao/multistream2 -> origin/findhao/multistream2 2025-09-07T07:55:55.6226977Z * [new branch] findhao/multistream5 -> origin/findhao/multistream5 2025-09-07T07:55:55.6228535Z * [new branch] findhao/multistream6 -> origin/findhao/multistream6 2025-09-07T07:55:55.6230058Z * [new branch] findhao/operatorbench3 -> origin/findhao/operatorbench3 2025-09-07T07:55:55.6231659Z * [new branch] findhao/operatorbench5 -> origin/findhao/operatorbench5 2025-09-07T07:55:55.6233221Z * [new branch] findhao/tritonparse -> origin/findhao/tritonparse 2025-09-07T07:55:55.6235408Z * [new branch] fix -> origin/fix 2025-09-07T07:55:55.6237257Z * [new branch] fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format 2025-09-07T07:55:55.6239123Z * [new branch] fix-config-ignore -> origin/fix-config-ignore 2025-09-07T07:55:55.6240881Z * [new branch] fix-dict-guard -> origin/fix-dict-guard 2025-09-07T07:55:55.6242686Z * [new branch] fix-inductor-periodic-0528 -> origin/fix-inductor-periodic-0528 2025-09-07T07:55:55.6244690Z * [new branch] fix-mps-benchmark -> origin/fix-mps-benchmark 2025-09-07T07:55:55.6246450Z * [new branch] fix-rlease-feature-template -> origin/fix-rlease-feature-template 2025-09-07T07:55:55.6248266Z * [new branch] fix-run-condition-upload-results -> origin/fix-run-condition-upload-results 2025-09-07T07:55:55.6249961Z * [new branch] fix-torchbench -> origin/fix-torchbench 2025-09-07T07:55:55.6251660Z * [new branch] fix_153389 -> origin/fix_153389 2025-09-07T07:55:55.6253426Z * [new branch] fix_fsdp_rs_bucket2 -> origin/fix_fsdp_rs_bucket2 2025-09-07T07:55:55.6255557Z * [new branch] fix_inductor_peridic_tests -> origin/fix_inductor_peridic_tests 2025-09-07T07:55:55.6257205Z * [new branch] fix_ubn_159469 -> origin/fix_ubn_159469 2025-09-07T07:55:55.6259055Z * [new branch] fixes-triage -> origin/fixes-triage 2025-09-07T07:55:55.6260784Z * [new branch] fixflashinfer -> origin/fixflashinfer 2025-09-07T07:55:55.6262688Z * [new branch] flash_decoding_cpu -> origin/flash_decoding_cpu 2025-09-07T07:55:55.6264651Z * [new branch] flex-flash -> origin/flex-flash 2025-09-07T07:55:55.6266478Z * [new branch] flex-lowering -> origin/flex-lowering 2025-09-07T07:55:55.6268255Z * [new branch] flex-warning -> origin/flex-warning 2025-09-07T07:55:55.6270106Z * [new branch] flex_attention_functorch_grad -> origin/flex_attention_functorch_grad 2025-09-07T07:55:55.6272008Z * [new branch] flex_flash -> origin/flex_flash 2025-09-07T07:55:55.6273942Z * [new branch] flexdecode-gqa-groups -> origin/flexdecode-gqa-groups 2025-09-07T07:55:55.6276752Z * [new branch] fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule 2025-09-07T07:55:55.6278575Z * [new branch] fsdp2_trace_rules -> origin/fsdp2_trace_rules 2025-09-07T07:55:55.6280358Z * [new branch] fsdpv2_3d -> origin/fsdpv2_3d 2025-09-07T07:55:55.6282273Z * [new branch] fsdpv2_3d_m1 -> origin/fsdpv2_3d_m1 2025-09-07T07:55:55.6284282Z * [new branch] fx_cpp -> origin/fx_cpp 2025-09-07T07:55:55.6287030Z * [new branch] fy/fix-win -> origin/fy/fix-win 2025-09-07T07:55:55.6290533Z * [new branch] gh/AlnisM/1/base -> origin/gh/AlnisM/1/base 2025-09-07T07:55:55.6292280Z * [new branch] gh/AlnisM/1/head -> origin/gh/AlnisM/1/head 2025-09-07T07:55:55.6295094Z * [new branch] gh/CaoE/2/base -> origin/gh/CaoE/2/base 2025-09-07T07:55:55.6296577Z * [new branch] gh/CaoE/2/head -> origin/gh/CaoE/2/head 2025-09-07T07:55:55.6298159Z * [new branch] gh/CaoE/2/orig -> origin/gh/CaoE/2/orig 2025-09-07T07:55:55.6301048Z * [new branch] gh/ColinPeppler/79/base -> origin/gh/ColinPeppler/79/base 2025-09-07T07:55:55.6302761Z * [new branch] gh/ColinPeppler/79/head -> origin/gh/ColinPeppler/79/head 2025-09-07T07:55:55.6304349Z * [new branch] gh/ColinPeppler/79/orig -> origin/gh/ColinPeppler/79/orig 2025-09-07T07:55:55.6307070Z * [new branch] gh/ColinPeppler/80/base -> origin/gh/ColinPeppler/80/base 2025-09-07T07:55:55.6308798Z * [new branch] gh/ColinPeppler/80/head -> origin/gh/ColinPeppler/80/head 2025-09-07T07:55:55.6310374Z * [new branch] gh/ColinPeppler/80/orig -> origin/gh/ColinPeppler/80/orig 2025-09-07T07:55:55.6313113Z * [new branch] gh/EikanWang/67/base -> origin/gh/EikanWang/67/base 2025-09-07T07:55:55.6315027Z * [new branch] gh/EikanWang/67/head -> origin/gh/EikanWang/67/head 2025-09-07T07:55:55.6317530Z * [new branch] gh/EikanWang/80/base -> origin/gh/EikanWang/80/base 2025-09-07T07:55:55.6319054Z * [new branch] gh/EikanWang/80/head -> origin/gh/EikanWang/80/head 2025-09-07T07:55:55.6320609Z * [new branch] gh/EikanWang/80/orig -> origin/gh/EikanWang/80/orig 2025-09-07T07:55:55.6322877Z * [new branch] gh/EikanWang/81/base -> origin/gh/EikanWang/81/base 2025-09-07T07:55:55.6324719Z * [new branch] gh/EikanWang/81/head -> origin/gh/EikanWang/81/head 2025-09-07T07:55:55.6326271Z * [new branch] gh/EikanWang/81/orig -> origin/gh/EikanWang/81/orig 2025-09-07T07:55:55.6328476Z * [new branch] gh/EikanWang/82/base -> origin/gh/EikanWang/82/base 2025-09-07T07:55:55.6330050Z * [new branch] gh/EikanWang/82/head -> origin/gh/EikanWang/82/head 2025-09-07T07:55:55.6331606Z * [new branch] gh/EikanWang/82/orig -> origin/gh/EikanWang/82/orig 2025-09-07T07:55:55.6334957Z * [new branch] gh/Gasoonjia/1/base -> origin/gh/Gasoonjia/1/base 2025-09-07T07:55:55.6336509Z * [new branch] gh/Gasoonjia/1/head -> origin/gh/Gasoonjia/1/head 2025-09-07T07:55:55.6339325Z * [new branch] gh/H-Huang/131/base -> origin/gh/H-Huang/131/base 2025-09-07T07:55:55.6340988Z * [new branch] gh/H-Huang/131/head -> origin/gh/H-Huang/131/head 2025-09-07T07:55:55.6342491Z * [new branch] gh/H-Huang/131/orig -> origin/gh/H-Huang/131/orig 2025-09-07T07:55:55.6345052Z * [new branch] gh/H-Huang/132/base -> origin/gh/H-Huang/132/base 2025-09-07T07:55:55.6346608Z * [new branch] gh/H-Huang/132/head -> origin/gh/H-Huang/132/head 2025-09-07T07:55:55.6348130Z * [new branch] gh/H-Huang/132/orig -> origin/gh/H-Huang/132/orig 2025-09-07T07:55:55.6350446Z * [new branch] gh/H-Huang/180/base -> origin/gh/H-Huang/180/base 2025-09-07T07:55:55.6351953Z * [new branch] gh/H-Huang/180/head -> origin/gh/H-Huang/180/head 2025-09-07T07:55:55.6353522Z * [new branch] gh/H-Huang/180/orig -> origin/gh/H-Huang/180/orig 2025-09-07T07:55:55.6356153Z * [new branch] gh/H-Huang/182/base -> origin/gh/H-Huang/182/base 2025-09-07T07:55:55.6357713Z * [new branch] gh/H-Huang/182/head -> origin/gh/H-Huang/182/head 2025-09-07T07:55:55.6359250Z * [new branch] gh/H-Huang/182/orig -> origin/gh/H-Huang/182/orig 2025-09-07T07:55:55.6361751Z * [new branch] gh/H-Huang/187/base -> origin/gh/H-Huang/187/base 2025-09-07T07:55:55.6363142Z * [new branch] gh/H-Huang/187/head -> origin/gh/H-Huang/187/head 2025-09-07T07:55:55.6365015Z * [new branch] gh/H-Huang/187/orig -> origin/gh/H-Huang/187/orig 2025-09-07T07:55:55.6367156Z * [new branch] gh/H-Huang/202/base -> origin/gh/H-Huang/202/base 2025-09-07T07:55:55.6368826Z * [new branch] gh/H-Huang/202/head -> origin/gh/H-Huang/202/head 2025-09-07T07:55:55.6370340Z * [new branch] gh/H-Huang/202/orig -> origin/gh/H-Huang/202/orig 2025-09-07T07:55:55.6372566Z * [new branch] gh/H-Huang/203/base -> origin/gh/H-Huang/203/base 2025-09-07T07:55:55.6374383Z * [new branch] gh/H-Huang/203/head -> origin/gh/H-Huang/203/head 2025-09-07T07:55:55.6376183Z * [new branch] gh/H-Huang/203/orig -> origin/gh/H-Huang/203/orig 2025-09-07T07:55:55.6378484Z * [new branch] gh/H-Huang/204/base -> origin/gh/H-Huang/204/base 2025-09-07T07:55:55.6380002Z * [new branch] gh/H-Huang/204/head -> origin/gh/H-Huang/204/head 2025-09-07T07:55:55.6381450Z * [new branch] gh/H-Huang/204/orig -> origin/gh/H-Huang/204/orig 2025-09-07T07:55:55.6383922Z * [new branch] gh/H-Huang/205/base -> origin/gh/H-Huang/205/base 2025-09-07T07:55:55.6385779Z * [new branch] gh/H-Huang/205/head -> origin/gh/H-Huang/205/head 2025-09-07T07:55:55.6387250Z * [new branch] gh/H-Huang/205/orig -> origin/gh/H-Huang/205/orig 2025-09-07T07:55:55.6389535Z * [new branch] gh/H-Huang/206/base -> origin/gh/H-Huang/206/base 2025-09-07T07:55:55.6391026Z * [new branch] gh/H-Huang/206/head -> origin/gh/H-Huang/206/head 2025-09-07T07:55:55.6392515Z * [new branch] gh/H-Huang/206/orig -> origin/gh/H-Huang/206/orig 2025-09-07T07:55:55.6395245Z * [new branch] gh/H-Huang/207/base -> origin/gh/H-Huang/207/base 2025-09-07T07:55:55.6396735Z * [new branch] gh/H-Huang/207/head -> origin/gh/H-Huang/207/head 2025-09-07T07:55:55.6398383Z * [new branch] gh/H-Huang/207/orig -> origin/gh/H-Huang/207/orig 2025-09-07T07:55:55.6400680Z * [new branch] gh/H-Huang/208/base -> origin/gh/H-Huang/208/base 2025-09-07T07:55:55.6402234Z * [new branch] gh/H-Huang/208/head -> origin/gh/H-Huang/208/head 2025-09-07T07:55:55.6403837Z * [new branch] gh/H-Huang/208/orig -> origin/gh/H-Huang/208/orig 2025-09-07T07:55:55.6406307Z * [new branch] gh/H-Huang/209/base -> origin/gh/H-Huang/209/base 2025-09-07T07:55:55.6407753Z * [new branch] gh/H-Huang/209/head -> origin/gh/H-Huang/209/head 2025-09-07T07:55:55.6409279Z * [new branch] gh/H-Huang/209/orig -> origin/gh/H-Huang/209/orig 2025-09-07T07:55:55.6411636Z * [new branch] gh/H-Huang/210/base -> origin/gh/H-Huang/210/base 2025-09-07T07:55:55.6413147Z * [new branch] gh/H-Huang/210/head -> origin/gh/H-Huang/210/head 2025-09-07T07:55:55.6414970Z * [new branch] gh/H-Huang/210/orig -> origin/gh/H-Huang/210/orig 2025-09-07T07:55:55.6417196Z * [new branch] gh/H-Huang/211/base -> origin/gh/H-Huang/211/base 2025-09-07T07:55:55.6418713Z * [new branch] gh/H-Huang/211/head -> origin/gh/H-Huang/211/head 2025-09-07T07:55:55.6420331Z * [new branch] gh/H-Huang/211/orig -> origin/gh/H-Huang/211/orig 2025-09-07T07:55:55.6422519Z * [new branch] gh/H-Huang/212/base -> origin/gh/H-Huang/212/base 2025-09-07T07:55:55.6424280Z * [new branch] gh/H-Huang/212/head -> origin/gh/H-Huang/212/head 2025-09-07T07:55:55.6426047Z * [new branch] gh/H-Huang/212/orig -> origin/gh/H-Huang/212/orig 2025-09-07T07:55:55.6428314Z * [new branch] gh/H-Huang/213/base -> origin/gh/H-Huang/213/base 2025-09-07T07:55:55.6429865Z * [new branch] gh/H-Huang/213/head -> origin/gh/H-Huang/213/head 2025-09-07T07:55:55.6431382Z * [new branch] gh/H-Huang/213/orig -> origin/gh/H-Huang/213/orig 2025-09-07T07:55:55.6433591Z * [new branch] gh/H-Huang/214/base -> origin/gh/H-Huang/214/base 2025-09-07T07:55:55.6435536Z * [new branch] gh/H-Huang/214/head -> origin/gh/H-Huang/214/head 2025-09-07T07:55:55.6437183Z * [new branch] gh/H-Huang/214/orig -> origin/gh/H-Huang/214/orig 2025-09-07T07:55:55.6440053Z * [new branch] gh/IvanKobzarev/112/base -> origin/gh/IvanKobzarev/112/base 2025-09-07T07:55:55.6441602Z * [new branch] gh/IvanKobzarev/112/head -> origin/gh/IvanKobzarev/112/head 2025-09-07T07:55:55.6443165Z * [new branch] gh/IvanKobzarev/112/orig -> origin/gh/IvanKobzarev/112/orig 2025-09-07T07:55:55.6446057Z * [new branch] gh/IvanKobzarev/115/base -> origin/gh/IvanKobzarev/115/base 2025-09-07T07:55:55.6447550Z * [new branch] gh/IvanKobzarev/115/head -> origin/gh/IvanKobzarev/115/head 2025-09-07T07:55:55.6449190Z * [new branch] gh/IvanKobzarev/115/orig -> origin/gh/IvanKobzarev/115/orig 2025-09-07T07:55:55.6451834Z * [new branch] gh/IvanKobzarev/116/base -> origin/gh/IvanKobzarev/116/base 2025-09-07T07:55:55.6453412Z * [new branch] gh/IvanKobzarev/116/head -> origin/gh/IvanKobzarev/116/head 2025-09-07T07:55:55.6455425Z * [new branch] gh/IvanKobzarev/116/orig -> origin/gh/IvanKobzarev/116/orig 2025-09-07T07:55:55.6457729Z * [new branch] gh/IvanKobzarev/118/base -> origin/gh/IvanKobzarev/118/base 2025-09-07T07:55:55.6459218Z * [new branch] gh/IvanKobzarev/118/head -> origin/gh/IvanKobzarev/118/head 2025-09-07T07:55:55.6460810Z * [new branch] gh/IvanKobzarev/118/orig -> origin/gh/IvanKobzarev/118/orig 2025-09-07T07:55:55.6463189Z * [new branch] gh/IvanKobzarev/126/base -> origin/gh/IvanKobzarev/126/base 2025-09-07T07:55:55.6465207Z * [new branch] gh/IvanKobzarev/126/head -> origin/gh/IvanKobzarev/126/head 2025-09-07T07:55:55.6466653Z * [new branch] gh/IvanKobzarev/126/orig -> origin/gh/IvanKobzarev/126/orig 2025-09-07T07:55:55.6469119Z * [new branch] gh/IvanKobzarev/127/base -> origin/gh/IvanKobzarev/127/base 2025-09-07T07:55:55.6470849Z * [new branch] gh/IvanKobzarev/127/head -> origin/gh/IvanKobzarev/127/head 2025-09-07T07:55:55.6472384Z * [new branch] gh/IvanKobzarev/127/orig -> origin/gh/IvanKobzarev/127/orig 2025-09-07T07:55:55.6475419Z * [new branch] gh/IvanKobzarev/128/base -> origin/gh/IvanKobzarev/128/base 2025-09-07T07:55:55.6476938Z * [new branch] gh/IvanKobzarev/128/head -> origin/gh/IvanKobzarev/128/head 2025-09-07T07:55:55.6478644Z * [new branch] gh/IvanKobzarev/128/orig -> origin/gh/IvanKobzarev/128/orig 2025-09-07T07:55:55.6480894Z * [new branch] gh/IvanKobzarev/132/base -> origin/gh/IvanKobzarev/132/base 2025-09-07T07:55:55.6482468Z * [new branch] gh/IvanKobzarev/132/head -> origin/gh/IvanKobzarev/132/head 2025-09-07T07:55:55.6484161Z * [new branch] gh/IvanKobzarev/132/orig -> origin/gh/IvanKobzarev/132/orig 2025-09-07T07:55:55.6487049Z * [new branch] gh/IvanKobzarev/133/base -> origin/gh/IvanKobzarev/133/base 2025-09-07T07:55:55.6488846Z * [new branch] gh/IvanKobzarev/133/head -> origin/gh/IvanKobzarev/133/head 2025-09-07T07:55:55.6490368Z * [new branch] gh/IvanKobzarev/133/orig -> origin/gh/IvanKobzarev/133/orig 2025-09-07T07:55:55.6492758Z * [new branch] gh/IvanKobzarev/134/base -> origin/gh/IvanKobzarev/134/base 2025-09-07T07:55:55.6494413Z * [new branch] gh/IvanKobzarev/134/head -> origin/gh/IvanKobzarev/134/head 2025-09-07T07:55:55.6495997Z * [new branch] gh/IvanKobzarev/134/orig -> origin/gh/IvanKobzarev/134/orig 2025-09-07T07:55:55.6498510Z * [new branch] gh/IvanKobzarev/135/base -> origin/gh/IvanKobzarev/135/base 2025-09-07T07:55:55.6500139Z * [new branch] gh/IvanKobzarev/135/head -> origin/gh/IvanKobzarev/135/head 2025-09-07T07:55:55.6501712Z * [new branch] gh/IvanKobzarev/135/orig -> origin/gh/IvanKobzarev/135/orig 2025-09-07T07:55:55.6504178Z * [new branch] gh/IvanKobzarev/136/base -> origin/gh/IvanKobzarev/136/base 2025-09-07T07:55:55.6505865Z * [new branch] gh/IvanKobzarev/136/head -> origin/gh/IvanKobzarev/136/head 2025-09-07T07:55:55.6507431Z * [new branch] gh/IvanKobzarev/136/orig -> origin/gh/IvanKobzarev/136/orig 2025-09-07T07:55:55.6509641Z * [new branch] gh/IvanKobzarev/137/base -> origin/gh/IvanKobzarev/137/base 2025-09-07T07:55:55.6511221Z * [new branch] gh/IvanKobzarev/137/head -> origin/gh/IvanKobzarev/137/head 2025-09-07T07:55:55.6512775Z * [new branch] gh/IvanKobzarev/137/orig -> origin/gh/IvanKobzarev/137/orig 2025-09-07T07:55:55.6515498Z * [new branch] gh/IvanKobzarev/138/base -> origin/gh/IvanKobzarev/138/base 2025-09-07T07:55:55.6516987Z * [new branch] gh/IvanKobzarev/138/head -> origin/gh/IvanKobzarev/138/head 2025-09-07T07:55:55.6518788Z * [new branch] gh/IvanKobzarev/138/orig -> origin/gh/IvanKobzarev/138/orig 2025-09-07T07:55:55.6521148Z * [new branch] gh/IvanKobzarev/139/base -> origin/gh/IvanKobzarev/139/base 2025-09-07T07:55:55.6522690Z * [new branch] gh/IvanKobzarev/139/head -> origin/gh/IvanKobzarev/139/head 2025-09-07T07:55:55.6524414Z * [new branch] gh/IvanKobzarev/139/orig -> origin/gh/IvanKobzarev/139/orig 2025-09-07T07:55:55.6527097Z * [new branch] gh/IvanKobzarev/140/base -> origin/gh/IvanKobzarev/140/base 2025-09-07T07:55:55.6528695Z * [new branch] gh/IvanKobzarev/140/head -> origin/gh/IvanKobzarev/140/head 2025-09-07T07:55:55.6530287Z * [new branch] gh/IvanKobzarev/140/orig -> origin/gh/IvanKobzarev/140/orig 2025-09-07T07:55:55.6532717Z * [new branch] gh/IvanKobzarev/141/base -> origin/gh/IvanKobzarev/141/base 2025-09-07T07:55:55.6535081Z * [new branch] gh/IvanKobzarev/141/head -> origin/gh/IvanKobzarev/141/head 2025-09-07T07:55:55.6536920Z * [new branch] gh/IvanKobzarev/141/orig -> origin/gh/IvanKobzarev/141/orig 2025-09-07T07:55:55.6538744Z * [new branch] gh/IvanKobzarev/142/base -> origin/gh/IvanKobzarev/142/base 2025-09-07T07:55:55.6540281Z * [new branch] gh/IvanKobzarev/142/head -> origin/gh/IvanKobzarev/142/head 2025-09-07T07:55:55.6541814Z * [new branch] gh/IvanKobzarev/142/orig -> origin/gh/IvanKobzarev/142/orig 2025-09-07T07:55:55.6544392Z * [new branch] gh/IvanKobzarev/143/base -> origin/gh/IvanKobzarev/143/base 2025-09-07T07:55:55.6546001Z * [new branch] gh/IvanKobzarev/143/head -> origin/gh/IvanKobzarev/143/head 2025-09-07T07:55:55.6547570Z * [new branch] gh/IvanKobzarev/143/orig -> origin/gh/IvanKobzarev/143/orig 2025-09-07T07:55:55.6549946Z * [new branch] gh/IvanKobzarev/144/base -> origin/gh/IvanKobzarev/144/base 2025-09-07T07:55:55.6551493Z * [new branch] gh/IvanKobzarev/144/head -> origin/gh/IvanKobzarev/144/head 2025-09-07T07:55:55.6553069Z * [new branch] gh/IvanKobzarev/144/orig -> origin/gh/IvanKobzarev/144/orig 2025-09-07T07:55:55.6555780Z * [new branch] gh/IvanKobzarev/145/base -> origin/gh/IvanKobzarev/145/base 2025-09-07T07:55:55.6557579Z * [new branch] gh/IvanKobzarev/145/head -> origin/gh/IvanKobzarev/145/head 2025-09-07T07:55:55.6558992Z * [new branch] gh/IvanKobzarev/145/orig -> origin/gh/IvanKobzarev/145/orig 2025-09-07T07:55:55.6561209Z * [new branch] gh/IvanKobzarev/146/base -> origin/gh/IvanKobzarev/146/base 2025-09-07T07:55:55.6562902Z * [new branch] gh/IvanKobzarev/146/head -> origin/gh/IvanKobzarev/146/head 2025-09-07T07:55:55.6564770Z * [new branch] gh/IvanKobzarev/146/orig -> origin/gh/IvanKobzarev/146/orig 2025-09-07T07:55:55.6567801Z * [new branch] gh/NikhilAPatel/1/base -> origin/gh/NikhilAPatel/1/base 2025-09-07T07:55:55.6569537Z * [new branch] gh/NikhilAPatel/1/head -> origin/gh/NikhilAPatel/1/head 2025-09-07T07:55:55.6571659Z * [new branch] gh/NikhilAPatel/2/base -> origin/gh/NikhilAPatel/2/base 2025-09-07T07:55:55.6573260Z * [new branch] gh/NikhilAPatel/2/head -> origin/gh/NikhilAPatel/2/head 2025-09-07T07:55:55.6575992Z * [new branch] gh/NikhilAPatel/4/base -> origin/gh/NikhilAPatel/4/base 2025-09-07T07:55:55.6577546Z * [new branch] gh/NikhilAPatel/4/head -> origin/gh/NikhilAPatel/4/head 2025-09-07T07:55:55.6580260Z * [new branch] gh/PaliC/1/base -> origin/gh/PaliC/1/base 2025-09-07T07:55:55.6581840Z * [new branch] gh/PaliC/1/head -> origin/gh/PaliC/1/head 2025-09-07T07:55:55.6583408Z * [new branch] gh/PaliC/1/orig -> origin/gh/PaliC/1/orig 2025-09-07T07:55:55.6586135Z * [new branch] gh/PaliC/17/base -> origin/gh/PaliC/17/base 2025-09-07T07:55:55.6587640Z * [new branch] gh/PaliC/17/head -> origin/gh/PaliC/17/head 2025-09-07T07:55:55.6589234Z * [new branch] gh/PaliC/17/orig -> origin/gh/PaliC/17/orig 2025-09-07T07:55:55.6591453Z * [new branch] gh/PaliC/18/base -> origin/gh/PaliC/18/base 2025-09-07T07:55:55.6592982Z * [new branch] gh/PaliC/18/head -> origin/gh/PaliC/18/head 2025-09-07T07:55:55.6594783Z * [new branch] gh/PaliC/18/orig -> origin/gh/PaliC/18/orig 2025-09-07T07:55:55.6597025Z * [new branch] gh/PaliC/2/base -> origin/gh/PaliC/2/base 2025-09-07T07:55:55.6598654Z * [new branch] gh/PaliC/2/head -> origin/gh/PaliC/2/head 2025-09-07T07:55:55.6600195Z * [new branch] gh/PaliC/2/orig -> origin/gh/PaliC/2/orig 2025-09-07T07:55:55.6602645Z * [new branch] gh/PaliC/20/base -> origin/gh/PaliC/20/base 2025-09-07T07:55:55.6604501Z * [new branch] gh/PaliC/20/head -> origin/gh/PaliC/20/head 2025-09-07T07:55:55.6606169Z * [new branch] gh/PaliC/20/orig -> origin/gh/PaliC/20/orig 2025-09-07T07:55:55.6608397Z * [new branch] gh/PaliC/21/base -> origin/gh/PaliC/21/base 2025-09-07T07:55:55.6609949Z * [new branch] gh/PaliC/21/head -> origin/gh/PaliC/21/head 2025-09-07T07:55:55.6611396Z * [new branch] gh/PaliC/21/orig -> origin/gh/PaliC/21/orig 2025-09-07T07:55:55.6613604Z * [new branch] gh/PaliC/22/base -> origin/gh/PaliC/22/base 2025-09-07T07:55:55.6615519Z * [new branch] gh/PaliC/22/head -> origin/gh/PaliC/22/head 2025-09-07T07:55:55.6616896Z * [new branch] gh/PaliC/22/orig -> origin/gh/PaliC/22/orig 2025-09-07T07:55:55.6619074Z * [new branch] gh/PaliC/23/base -> origin/gh/PaliC/23/base 2025-09-07T07:55:55.6620703Z * [new branch] gh/PaliC/23/head -> origin/gh/PaliC/23/head 2025-09-07T07:55:55.6622265Z * [new branch] gh/PaliC/23/orig -> origin/gh/PaliC/23/orig 2025-09-07T07:55:55.6624892Z * [new branch] gh/PaliC/24/base -> origin/gh/PaliC/24/base 2025-09-07T07:55:55.6626536Z * [new branch] gh/PaliC/24/head -> origin/gh/PaliC/24/head 2025-09-07T07:55:55.6627911Z * [new branch] gh/PaliC/24/orig -> origin/gh/PaliC/24/orig 2025-09-07T07:55:55.6630739Z * [new branch] gh/PaulZhang12/17/base -> origin/gh/PaulZhang12/17/base 2025-09-07T07:55:55.6632234Z * [new branch] gh/PaulZhang12/17/head -> origin/gh/PaulZhang12/17/head 2025-09-07T07:55:55.6635116Z * [new branch] gh/PaulZhang12/20/base -> origin/gh/PaulZhang12/20/base 2025-09-07T07:55:55.6636655Z * [new branch] gh/PaulZhang12/20/head -> origin/gh/PaulZhang12/20/head 2025-09-07T07:55:55.6638367Z * [new branch] gh/PaulZhang12/20/orig -> origin/gh/PaulZhang12/20/orig 2025-09-07T07:55:55.6640644Z * [new branch] gh/PaulZhang12/21/base -> origin/gh/PaulZhang12/21/base 2025-09-07T07:55:55.6642265Z * [new branch] gh/PaulZhang12/21/head -> origin/gh/PaulZhang12/21/head 2025-09-07T07:55:55.6643991Z * [new branch] gh/PaulZhang12/21/orig -> origin/gh/PaulZhang12/21/orig 2025-09-07T07:55:55.6646513Z * [new branch] gh/PaulZhang12/22/base -> origin/gh/PaulZhang12/22/base 2025-09-07T07:55:55.6647941Z * [new branch] gh/PaulZhang12/22/head -> origin/gh/PaulZhang12/22/head 2025-09-07T07:55:55.6649536Z * [new branch] gh/PaulZhang12/22/orig -> origin/gh/PaulZhang12/22/orig 2025-09-07T07:55:55.6651724Z * [new branch] gh/PaulZhang12/23/base -> origin/gh/PaulZhang12/23/base 2025-09-07T07:55:55.6653274Z * [new branch] gh/PaulZhang12/23/head -> origin/gh/PaulZhang12/23/head 2025-09-07T07:55:55.6655288Z * [new branch] gh/PaulZhang12/23/orig -> origin/gh/PaulZhang12/23/orig 2025-09-07T07:55:55.6657413Z * [new branch] gh/PaulZhang12/24/base -> origin/gh/PaulZhang12/24/base 2025-09-07T07:55:55.6659028Z * [new branch] gh/PaulZhang12/24/head -> origin/gh/PaulZhang12/24/head 2025-09-07T07:55:55.6660584Z * [new branch] gh/PaulZhang12/24/orig -> origin/gh/PaulZhang12/24/orig 2025-09-07T07:55:55.6663057Z * [new branch] gh/PaulZhang12/25/base -> origin/gh/PaulZhang12/25/base 2025-09-07T07:55:55.6665047Z * [new branch] gh/PaulZhang12/25/head -> origin/gh/PaulZhang12/25/head 2025-09-07T07:55:55.6666596Z * [new branch] gh/PaulZhang12/25/orig -> origin/gh/PaulZhang12/25/orig 2025-09-07T07:55:55.6669418Z * [new branch] gh/SamGinzburg/11/base -> origin/gh/SamGinzburg/11/base 2025-09-07T07:55:55.6671025Z * [new branch] gh/SamGinzburg/11/head -> origin/gh/SamGinzburg/11/head 2025-09-07T07:55:55.6674127Z * [new branch] gh/Sidharth123-cpu/24/base -> origin/gh/Sidharth123-cpu/24/base 2025-09-07T07:55:55.6676359Z * [new branch] gh/Sidharth123-cpu/25/base -> origin/gh/Sidharth123-cpu/25/base 2025-09-07T07:55:55.6678518Z * [new branch] gh/Sidharth123-cpu/26/base -> origin/gh/Sidharth123-cpu/26/base 2025-09-07T07:55:55.6680754Z * [new branch] gh/Sidharth123-cpu/27/base -> origin/gh/Sidharth123-cpu/27/base 2025-09-07T07:55:55.6683474Z * [new branch] gh/StrongerXi/1/base -> origin/gh/StrongerXi/1/base 2025-09-07T07:55:55.6685484Z * [new branch] gh/StrongerXi/1/head -> origin/gh/StrongerXi/1/head 2025-09-07T07:55:55.6687662Z * [new branch] gh/StrongerXi/133/base -> origin/gh/StrongerXi/133/base 2025-09-07T07:55:55.6689236Z * [new branch] gh/StrongerXi/133/head -> origin/gh/StrongerXi/133/head 2025-09-07T07:55:55.6690837Z * [new branch] gh/StrongerXi/133/orig -> origin/gh/StrongerXi/133/orig 2025-09-07T07:55:55.6693095Z * [new branch] gh/StrongerXi/134/base -> origin/gh/StrongerXi/134/base 2025-09-07T07:55:55.6695194Z * [new branch] gh/StrongerXi/134/head -> origin/gh/StrongerXi/134/head 2025-09-07T07:55:55.6696562Z * [new branch] gh/StrongerXi/134/orig -> origin/gh/StrongerXi/134/orig 2025-09-07T07:55:55.6698841Z * [new branch] gh/StrongerXi/136/base -> origin/gh/StrongerXi/136/base 2025-09-07T07:55:55.6700304Z * [new branch] gh/StrongerXi/136/head -> origin/gh/StrongerXi/136/head 2025-09-07T07:55:55.6701871Z * [new branch] gh/StrongerXi/136/orig -> origin/gh/StrongerXi/136/orig 2025-09-07T07:55:55.6704226Z * [new branch] gh/StrongerXi/137/base -> origin/gh/StrongerXi/137/base 2025-09-07T07:55:55.6705946Z * [new branch] gh/StrongerXi/137/head -> origin/gh/StrongerXi/137/head 2025-09-07T07:55:55.6707400Z * [new branch] gh/StrongerXi/137/orig -> origin/gh/StrongerXi/137/orig 2025-09-07T07:55:55.6709611Z * [new branch] gh/StrongerXi/138/base -> origin/gh/StrongerXi/138/base 2025-09-07T07:55:55.6711174Z * [new branch] gh/StrongerXi/138/head -> origin/gh/StrongerXi/138/head 2025-09-07T07:55:55.6712754Z * [new branch] gh/StrongerXi/138/orig -> origin/gh/StrongerXi/138/orig 2025-09-07T07:55:55.6715339Z * [new branch] gh/StrongerXi/139/base -> origin/gh/StrongerXi/139/base 2025-09-07T07:55:55.6716949Z * [new branch] gh/StrongerXi/139/head -> origin/gh/StrongerXi/139/head 2025-09-07T07:55:55.6718754Z * [new branch] gh/StrongerXi/139/orig -> origin/gh/StrongerXi/139/orig 2025-09-07T07:55:55.6720928Z * [new branch] gh/StrongerXi/140/base -> origin/gh/StrongerXi/140/base 2025-09-07T07:55:55.6722433Z * [new branch] gh/StrongerXi/140/head -> origin/gh/StrongerXi/140/head 2025-09-07T07:55:55.6724320Z * [new branch] gh/StrongerXi/140/orig -> origin/gh/StrongerXi/140/orig 2025-09-07T07:55:55.6726586Z * [new branch] gh/StrongerXi/71/base -> origin/gh/StrongerXi/71/base 2025-09-07T07:55:55.6728276Z * [new branch] gh/StrongerXi/71/head -> origin/gh/StrongerXi/71/head 2025-09-07T07:55:55.6730498Z * [new branch] gh/StrongerXi/72/base -> origin/gh/StrongerXi/72/base 2025-09-07T07:55:55.6732086Z * [new branch] gh/StrongerXi/72/head -> origin/gh/StrongerXi/72/head 2025-09-07T07:55:55.6735203Z * [new branch] gh/XilunWu/133/base -> origin/gh/XilunWu/133/base 2025-09-07T07:55:55.6736661Z * [new branch] gh/XilunWu/133/head -> origin/gh/XilunWu/133/head 2025-09-07T07:55:55.6738294Z * [new branch] gh/XilunWu/133/orig -> origin/gh/XilunWu/133/orig 2025-09-07T07:55:55.6740508Z * [new branch] gh/XilunWu/139/base -> origin/gh/XilunWu/139/base 2025-09-07T07:55:55.6742034Z * [new branch] gh/XilunWu/139/head -> origin/gh/XilunWu/139/head 2025-09-07T07:55:55.6743601Z * [new branch] gh/XilunWu/139/orig -> origin/gh/XilunWu/139/orig 2025-09-07T07:55:55.6746352Z * [new branch] gh/XilunWu/143/base -> origin/gh/XilunWu/143/base 2025-09-07T07:55:55.6747909Z * [new branch] gh/XilunWu/143/head -> origin/gh/XilunWu/143/head 2025-09-07T07:55:55.6749440Z * [new branch] gh/XilunWu/143/orig -> origin/gh/XilunWu/143/orig 2025-09-07T07:55:55.6751867Z * [new branch] gh/XilunWu/144/base -> origin/gh/XilunWu/144/base 2025-09-07T07:55:55.6753406Z * [new branch] gh/XilunWu/144/head -> origin/gh/XilunWu/144/head 2025-09-07T07:55:55.6755236Z * [new branch] gh/XilunWu/144/orig -> origin/gh/XilunWu/144/orig 2025-09-07T07:55:55.6757563Z * [new branch] gh/XilunWu/145/base -> origin/gh/XilunWu/145/base 2025-09-07T07:55:55.6759099Z * [new branch] gh/XilunWu/145/head -> origin/gh/XilunWu/145/head 2025-09-07T07:55:55.6761046Z * [new branch] gh/XilunWu/145/orig -> origin/gh/XilunWu/145/orig 2025-09-07T07:55:55.6763006Z * [new branch] gh/XilunWu/146/base -> origin/gh/XilunWu/146/base 2025-09-07T07:55:55.6764937Z * [new branch] gh/XilunWu/146/head -> origin/gh/XilunWu/146/head 2025-09-07T07:55:55.6766479Z * [new branch] gh/XilunWu/146/orig -> origin/gh/XilunWu/146/orig 2025-09-07T07:55:55.6768730Z * [new branch] gh/XilunWu/147/base -> origin/gh/XilunWu/147/base 2025-09-07T07:55:55.6770265Z * [new branch] gh/XilunWu/147/head -> origin/gh/XilunWu/147/head 2025-09-07T07:55:55.6771849Z * [new branch] gh/XilunWu/147/orig -> origin/gh/XilunWu/147/orig 2025-09-07T07:55:55.6774184Z * [new branch] gh/XilunWu/148/base -> origin/gh/XilunWu/148/base 2025-09-07T07:55:55.6776075Z * [new branch] gh/XilunWu/148/head -> origin/gh/XilunWu/148/head 2025-09-07T07:55:55.6777624Z * [new branch] gh/XilunWu/148/orig -> origin/gh/XilunWu/148/orig 2025-09-07T07:55:55.6779763Z * [new branch] gh/XilunWu/149/base -> origin/gh/XilunWu/149/base 2025-09-07T07:55:55.6781321Z * [new branch] gh/XilunWu/149/head -> origin/gh/XilunWu/149/head 2025-09-07T07:55:55.6782937Z * [new branch] gh/XilunWu/149/orig -> origin/gh/XilunWu/149/orig 2025-09-07T07:55:55.6785427Z * [new branch] gh/XilunWu/150/base -> origin/gh/XilunWu/150/base 2025-09-07T07:55:55.6786973Z * [new branch] gh/XilunWu/150/head -> origin/gh/XilunWu/150/head 2025-09-07T07:55:55.6788429Z * [new branch] gh/XilunWu/150/orig -> origin/gh/XilunWu/150/orig 2025-09-07T07:55:55.6790685Z * [new branch] gh/XilunWu/151/base -> origin/gh/XilunWu/151/base 2025-09-07T07:55:55.6792317Z * [new branch] gh/XilunWu/151/head -> origin/gh/XilunWu/151/head 2025-09-07T07:55:55.6794046Z * [new branch] gh/XilunWu/151/orig -> origin/gh/XilunWu/151/orig 2025-09-07T07:55:55.6796356Z * [new branch] gh/XilunWu/152/base -> origin/gh/XilunWu/152/base 2025-09-07T07:55:55.6797984Z * [new branch] gh/XilunWu/152/head -> origin/gh/XilunWu/152/head 2025-09-07T07:55:55.6799498Z * [new branch] gh/XilunWu/152/orig -> origin/gh/XilunWu/152/orig 2025-09-07T07:55:55.6801922Z * [new branch] gh/XilunWu/153/base -> origin/gh/XilunWu/153/base 2025-09-07T07:55:55.6803536Z * [new branch] gh/XilunWu/153/head -> origin/gh/XilunWu/153/head 2025-09-07T07:55:55.6805486Z * [new branch] gh/XilunWu/153/orig -> origin/gh/XilunWu/153/orig 2025-09-07T07:55:55.6807741Z * [new branch] gh/XilunWu/160/base -> origin/gh/XilunWu/160/base 2025-09-07T07:55:55.6809301Z * [new branch] gh/XilunWu/160/head -> origin/gh/XilunWu/160/head 2025-09-07T07:55:55.6810843Z * [new branch] gh/XilunWu/160/orig -> origin/gh/XilunWu/160/orig 2025-09-07T07:55:55.6813201Z * [new branch] gh/XilunWu/161/base -> origin/gh/XilunWu/161/base 2025-09-07T07:55:55.6815062Z * [new branch] gh/XilunWu/161/head -> origin/gh/XilunWu/161/head 2025-09-07T07:55:55.6816508Z * [new branch] gh/XilunWu/161/orig -> origin/gh/XilunWu/161/orig 2025-09-07T07:55:55.6818890Z * [new branch] gh/XilunWu/163/base -> origin/gh/XilunWu/163/base 2025-09-07T07:55:55.6820524Z * [new branch] gh/XilunWu/163/head -> origin/gh/XilunWu/163/head 2025-09-07T07:55:55.6822070Z * [new branch] gh/XilunWu/163/orig -> origin/gh/XilunWu/163/orig 2025-09-07T07:55:55.6824745Z * [new branch] gh/XilunWu/164/base -> origin/gh/XilunWu/164/base 2025-09-07T07:55:55.6826515Z * [new branch] gh/XilunWu/164/head -> origin/gh/XilunWu/164/head 2025-09-07T07:55:55.6827900Z * [new branch] gh/XilunWu/164/orig -> origin/gh/XilunWu/164/orig 2025-09-07T07:55:55.6830207Z * [new branch] gh/XilunWu/165/base -> origin/gh/XilunWu/165/base 2025-09-07T07:55:55.6831910Z * [new branch] gh/XilunWu/165/head -> origin/gh/XilunWu/165/head 2025-09-07T07:55:55.6833460Z * [new branch] gh/XilunWu/165/orig -> origin/gh/XilunWu/165/orig 2025-09-07T07:55:55.6836300Z * [new branch] gh/XilunWu/166/base -> origin/gh/XilunWu/166/base 2025-09-07T07:55:55.6837912Z * [new branch] gh/XilunWu/166/head -> origin/gh/XilunWu/166/head 2025-09-07T07:55:55.6839457Z * [new branch] gh/XilunWu/166/orig -> origin/gh/XilunWu/166/orig 2025-09-07T07:55:55.6841878Z * [new branch] gh/XilunWu/167/base -> origin/gh/XilunWu/167/base 2025-09-07T07:55:55.6843457Z * [new branch] gh/XilunWu/167/head -> origin/gh/XilunWu/167/head 2025-09-07T07:55:55.6845465Z * [new branch] gh/XilunWu/167/orig -> origin/gh/XilunWu/167/orig 2025-09-07T07:55:55.6847641Z * [new branch] gh/XilunWu/168/base -> origin/gh/XilunWu/168/base 2025-09-07T07:55:55.6849205Z * [new branch] gh/XilunWu/168/head -> origin/gh/XilunWu/168/head 2025-09-07T07:55:55.6850780Z * [new branch] gh/XilunWu/168/orig -> origin/gh/XilunWu/168/orig 2025-09-07T07:55:55.6853009Z * [new branch] gh/XilunWu/169/base -> origin/gh/XilunWu/169/base 2025-09-07T07:55:55.6855225Z * [new branch] gh/XilunWu/169/head -> origin/gh/XilunWu/169/head 2025-09-07T07:55:55.6856793Z * [new branch] gh/XilunWu/169/orig -> origin/gh/XilunWu/169/orig 2025-09-07T07:55:55.6858982Z * [new branch] gh/XilunWu/170/base -> origin/gh/XilunWu/170/base 2025-09-07T07:55:55.6860523Z * [new branch] gh/XilunWu/170/head -> origin/gh/XilunWu/170/head 2025-09-07T07:55:55.6862031Z * [new branch] gh/XilunWu/170/orig -> origin/gh/XilunWu/170/orig 2025-09-07T07:55:55.6865258Z * [new branch] gh/XuehaiPan/14/base -> origin/gh/XuehaiPan/14/base 2025-09-07T07:55:55.6866840Z * [new branch] gh/XuehaiPan/14/head -> origin/gh/XuehaiPan/14/head 2025-09-07T07:55:55.6868332Z * [new branch] gh/XuehaiPan/14/orig -> origin/gh/XuehaiPan/14/orig 2025-09-07T07:55:55.6870514Z * [new branch] gh/XuehaiPan/179/base -> origin/gh/XuehaiPan/179/base 2025-09-07T07:55:55.6872156Z * [new branch] gh/XuehaiPan/179/head -> origin/gh/XuehaiPan/179/head 2025-09-07T07:55:55.6873693Z * [new branch] gh/XuehaiPan/179/orig -> origin/gh/XuehaiPan/179/orig 2025-09-07T07:55:55.6876453Z * [new branch] gh/XuehaiPan/189/base -> origin/gh/XuehaiPan/189/base 2025-09-07T07:55:55.6878173Z * [new branch] gh/XuehaiPan/189/head -> origin/gh/XuehaiPan/189/head 2025-09-07T07:55:55.6879701Z * [new branch] gh/XuehaiPan/189/orig -> origin/gh/XuehaiPan/189/orig 2025-09-07T07:55:55.6881920Z * [new branch] gh/XuehaiPan/232/base -> origin/gh/XuehaiPan/232/base 2025-09-07T07:55:55.6883476Z * [new branch] gh/XuehaiPan/232/head -> origin/gh/XuehaiPan/232/head 2025-09-07T07:55:55.6885443Z * [new branch] gh/XuehaiPan/232/orig -> origin/gh/XuehaiPan/232/orig 2025-09-07T07:55:55.6887530Z * [new branch] gh/XuehaiPan/249/base -> origin/gh/XuehaiPan/249/base 2025-09-07T07:55:55.6889210Z * [new branch] gh/XuehaiPan/249/head -> origin/gh/XuehaiPan/249/head 2025-09-07T07:55:55.6890725Z * [new branch] gh/XuehaiPan/249/orig -> origin/gh/XuehaiPan/249/orig 2025-09-07T07:55:55.6893384Z * [new branch] gh/XuehaiPan/253/base -> origin/gh/XuehaiPan/253/base 2025-09-07T07:55:55.6895093Z * [new branch] gh/XuehaiPan/253/head -> origin/gh/XuehaiPan/253/head 2025-09-07T07:55:55.6896689Z * [new branch] gh/XuehaiPan/253/orig -> origin/gh/XuehaiPan/253/orig 2025-09-07T07:55:55.6898923Z * [new branch] gh/XuehaiPan/254/base -> origin/gh/XuehaiPan/254/base 2025-09-07T07:55:55.6900422Z * [new branch] gh/XuehaiPan/254/head -> origin/gh/XuehaiPan/254/head 2025-09-07T07:55:55.6902056Z * [new branch] gh/XuehaiPan/254/orig -> origin/gh/XuehaiPan/254/orig 2025-09-07T07:55:55.6904563Z * [new branch] gh/XuehaiPan/255/base -> origin/gh/XuehaiPan/255/base 2025-09-07T07:55:55.6906118Z * [new branch] gh/XuehaiPan/255/head -> origin/gh/XuehaiPan/255/head 2025-09-07T07:55:55.6907576Z * [new branch] gh/XuehaiPan/255/orig -> origin/gh/XuehaiPan/255/orig 2025-09-07T07:55:55.6909855Z * [new branch] gh/XuehaiPan/257/base -> origin/gh/XuehaiPan/257/base 2025-09-07T07:55:55.6911425Z * [new branch] gh/XuehaiPan/257/head -> origin/gh/XuehaiPan/257/head 2025-09-07T07:55:55.6912953Z * [new branch] gh/XuehaiPan/257/orig -> origin/gh/XuehaiPan/257/orig 2025-09-07T07:55:55.6915563Z * [new branch] gh/XuehaiPan/271/base -> origin/gh/XuehaiPan/271/base 2025-09-07T07:55:55.6917161Z * [new branch] gh/XuehaiPan/271/head -> origin/gh/XuehaiPan/271/head 2025-09-07T07:55:55.6918766Z * [new branch] gh/XuehaiPan/271/orig -> origin/gh/XuehaiPan/271/orig 2025-09-07T07:55:55.6921070Z * [new branch] gh/XuehaiPan/290/base -> origin/gh/XuehaiPan/290/base 2025-09-07T07:55:55.6922670Z * [new branch] gh/XuehaiPan/290/head -> origin/gh/XuehaiPan/290/head 2025-09-07T07:55:55.6924374Z * [new branch] gh/XuehaiPan/290/orig -> origin/gh/XuehaiPan/290/orig 2025-09-07T07:55:55.6926959Z * [new branch] gh/XuehaiPan/343/base -> origin/gh/XuehaiPan/343/base 2025-09-07T07:55:55.6928284Z * [new branch] gh/XuehaiPan/343/head -> origin/gh/XuehaiPan/343/head 2025-09-07T07:55:55.6929814Z * [new branch] gh/XuehaiPan/343/orig -> origin/gh/XuehaiPan/343/orig 2025-09-07T07:55:55.6932305Z * [new branch] gh/XuehaiPan/347/base -> origin/gh/XuehaiPan/347/base 2025-09-07T07:55:55.6933854Z * [new branch] gh/XuehaiPan/347/head -> origin/gh/XuehaiPan/347/head 2025-09-07T07:55:55.6935694Z * [new branch] gh/XuehaiPan/347/orig -> origin/gh/XuehaiPan/347/orig 2025-09-07T07:55:55.6937794Z * [new branch] gh/XuehaiPan/348/base -> origin/gh/XuehaiPan/348/base 2025-09-07T07:55:55.6939403Z * [new branch] gh/XuehaiPan/348/head -> origin/gh/XuehaiPan/348/head 2025-09-07T07:55:55.6940928Z * [new branch] gh/XuehaiPan/348/orig -> origin/gh/XuehaiPan/348/orig 2025-09-07T07:55:55.6943167Z * [new branch] gh/XuehaiPan/350/base -> origin/gh/XuehaiPan/350/base 2025-09-07T07:55:55.6945042Z * [new branch] gh/XuehaiPan/350/head -> origin/gh/XuehaiPan/350/head 2025-09-07T07:55:55.6946621Z * [new branch] gh/XuehaiPan/350/orig -> origin/gh/XuehaiPan/350/orig 2025-09-07T07:55:55.6949002Z * [new branch] gh/XuehaiPan/356/base -> origin/gh/XuehaiPan/356/base 2025-09-07T07:55:55.6950603Z * [new branch] gh/XuehaiPan/356/head -> origin/gh/XuehaiPan/356/head 2025-09-07T07:55:55.6952184Z * [new branch] gh/XuehaiPan/356/orig -> origin/gh/XuehaiPan/356/orig 2025-09-07T07:55:55.6954930Z * [new branch] gh/XuehaiPan/357/base -> origin/gh/XuehaiPan/357/base 2025-09-07T07:55:55.6956429Z * [new branch] gh/XuehaiPan/357/head -> origin/gh/XuehaiPan/357/head 2025-09-07T07:55:55.6958296Z * [new branch] gh/XuehaiPan/357/orig -> origin/gh/XuehaiPan/357/orig 2025-09-07T07:55:55.6960430Z * [new branch] gh/XuehaiPan/358/base -> origin/gh/XuehaiPan/358/base 2025-09-07T07:55:55.6961988Z * [new branch] gh/XuehaiPan/358/head -> origin/gh/XuehaiPan/358/head 2025-09-07T07:55:55.6963489Z * [new branch] gh/XuehaiPan/358/orig -> origin/gh/XuehaiPan/358/orig 2025-09-07T07:55:55.6966174Z * [new branch] gh/XuehaiPan/359/base -> origin/gh/XuehaiPan/359/base 2025-09-07T07:55:55.6967692Z * [new branch] gh/XuehaiPan/359/head -> origin/gh/XuehaiPan/359/head 2025-09-07T07:55:55.6969234Z * [new branch] gh/XuehaiPan/359/orig -> origin/gh/XuehaiPan/359/orig 2025-09-07T07:55:55.6971475Z * [new branch] gh/XuehaiPan/360/base -> origin/gh/XuehaiPan/360/base 2025-09-07T07:55:55.6973121Z * [new branch] gh/XuehaiPan/360/head -> origin/gh/XuehaiPan/360/head 2025-09-07T07:55:55.6975032Z * [new branch] gh/XuehaiPan/360/orig -> origin/gh/XuehaiPan/360/orig 2025-09-07T07:55:55.6977302Z * [new branch] gh/XuehaiPan/365/base -> origin/gh/XuehaiPan/365/base 2025-09-07T07:55:55.6978833Z * [new branch] gh/XuehaiPan/365/head -> origin/gh/XuehaiPan/365/head 2025-09-07T07:55:55.6980393Z * [new branch] gh/XuehaiPan/365/orig -> origin/gh/XuehaiPan/365/orig 2025-09-07T07:55:55.6982712Z * [new branch] gh/XuehaiPan/366/base -> origin/gh/XuehaiPan/366/base 2025-09-07T07:55:55.6984428Z * [new branch] gh/XuehaiPan/366/head -> origin/gh/XuehaiPan/366/head 2025-09-07T07:55:55.6986708Z * [new branch] gh/XuehaiPan/369/base -> origin/gh/XuehaiPan/369/base 2025-09-07T07:55:55.6988275Z * [new branch] gh/XuehaiPan/369/head -> origin/gh/XuehaiPan/369/head 2025-09-07T07:55:55.6989870Z * [new branch] gh/XuehaiPan/369/orig -> origin/gh/XuehaiPan/369/orig 2025-09-07T07:55:55.6992172Z * [new branch] gh/XuehaiPan/370/base -> origin/gh/XuehaiPan/370/base 2025-09-07T07:55:55.6993925Z * [new branch] gh/XuehaiPan/370/head -> origin/gh/XuehaiPan/370/head 2025-09-07T07:55:55.6995732Z * [new branch] gh/XuehaiPan/370/orig -> origin/gh/XuehaiPan/370/orig 2025-09-07T07:55:55.6998004Z * [new branch] gh/XuehaiPan/380/base -> origin/gh/XuehaiPan/380/base 2025-09-07T07:55:55.6999539Z * [new branch] gh/XuehaiPan/380/head -> origin/gh/XuehaiPan/380/head 2025-09-07T07:55:55.7001090Z * [new branch] gh/XuehaiPan/380/orig -> origin/gh/XuehaiPan/380/orig 2025-09-07T07:55:55.7003363Z * [new branch] gh/XuehaiPan/381/base -> origin/gh/XuehaiPan/381/base 2025-09-07T07:55:55.7005305Z * [new branch] gh/XuehaiPan/381/head -> origin/gh/XuehaiPan/381/head 2025-09-07T07:55:55.7007543Z * [new branch] gh/XuehaiPan/382/base -> origin/gh/XuehaiPan/382/base 2025-09-07T07:55:55.7009255Z * [new branch] gh/XuehaiPan/382/head -> origin/gh/XuehaiPan/382/head 2025-09-07T07:55:55.7010812Z * [new branch] gh/XuehaiPan/382/orig -> origin/gh/XuehaiPan/382/orig 2025-09-07T07:55:55.7013173Z * [new branch] gh/XuehaiPan/383/base -> origin/gh/XuehaiPan/383/base 2025-09-07T07:55:55.7015072Z * [new branch] gh/XuehaiPan/383/head -> origin/gh/XuehaiPan/383/head 2025-09-07T07:55:55.7016597Z * [new branch] gh/XuehaiPan/383/orig -> origin/gh/XuehaiPan/383/orig 2025-09-07T07:55:55.7018847Z * [new branch] gh/XuehaiPan/384/base -> origin/gh/XuehaiPan/384/base 2025-09-07T07:55:55.7020298Z * [new branch] gh/XuehaiPan/384/head -> origin/gh/XuehaiPan/384/head 2025-09-07T07:55:55.7021929Z * [new branch] gh/XuehaiPan/384/orig -> origin/gh/XuehaiPan/384/orig 2025-09-07T07:55:55.7024808Z * [new branch] gh/XuehaiPan/385/base -> origin/gh/XuehaiPan/385/base 2025-09-07T07:55:55.7026299Z * [new branch] gh/XuehaiPan/385/head -> origin/gh/XuehaiPan/385/head 2025-09-07T07:55:55.7027677Z * [new branch] gh/XuehaiPan/385/orig -> origin/gh/XuehaiPan/385/orig 2025-09-07T07:55:55.7029985Z * [new branch] gh/XuehaiPan/386/base -> origin/gh/XuehaiPan/386/base 2025-09-07T07:55:55.7031515Z * [new branch] gh/XuehaiPan/386/head -> origin/gh/XuehaiPan/386/head 2025-09-07T07:55:55.7033066Z * [new branch] gh/XuehaiPan/386/orig -> origin/gh/XuehaiPan/386/orig 2025-09-07T07:55:55.7035687Z * [new branch] gh/XuehaiPan/387/base -> origin/gh/XuehaiPan/387/base 2025-09-07T07:55:55.7037241Z * [new branch] gh/XuehaiPan/387/head -> origin/gh/XuehaiPan/387/head 2025-09-07T07:55:55.7038853Z * [new branch] gh/XuehaiPan/387/orig -> origin/gh/XuehaiPan/387/orig 2025-09-07T07:55:55.7041541Z * [new branch] gh/ZainRizvi/1/base -> origin/gh/ZainRizvi/1/base 2025-09-07T07:55:55.7043325Z * [new branch] gh/ZainRizvi/1/head -> origin/gh/ZainRizvi/1/head 2025-09-07T07:55:55.7046029Z * [new branch] gh/ZainRizvi/2/base -> origin/gh/ZainRizvi/2/base 2025-09-07T07:55:55.7047281Z * [new branch] gh/ZainRizvi/2/head -> origin/gh/ZainRizvi/2/head 2025-09-07T07:55:55.7049529Z * [new branch] gh/ZainRizvi/3/base -> origin/gh/ZainRizvi/3/base 2025-09-07T07:55:55.7051047Z * [new branch] gh/ZainRizvi/3/head -> origin/gh/ZainRizvi/3/head 2025-09-07T07:55:55.7053268Z * [new branch] gh/ZainRizvi/4/base -> origin/gh/ZainRizvi/4/base 2025-09-07T07:55:55.7055165Z * [new branch] gh/ZainRizvi/4/head -> origin/gh/ZainRizvi/4/head 2025-09-07T07:55:55.7057261Z * [new branch] gh/ZainRizvi/5/base -> origin/gh/ZainRizvi/5/base 2025-09-07T07:55:55.7058693Z * [new branch] gh/ZainRizvi/5/head -> origin/gh/ZainRizvi/5/head 2025-09-07T07:55:55.7060966Z * [new branch] gh/ZainRizvi/6/base -> origin/gh/ZainRizvi/6/base 2025-09-07T07:55:55.7062476Z * [new branch] gh/ZainRizvi/6/head -> origin/gh/ZainRizvi/6/head 2025-09-07T07:55:55.7064443Z * [new branch] gh/ZainRizvi/6/orig -> origin/gh/ZainRizvi/6/orig 2025-09-07T07:55:55.7066693Z * [new branch] gh/ZainRizvi/7/base -> origin/gh/ZainRizvi/7/base 2025-09-07T07:55:55.7068223Z * [new branch] gh/ZainRizvi/7/head -> origin/gh/ZainRizvi/7/head 2025-09-07T07:55:55.7069770Z * [new branch] gh/ZainRizvi/7/orig -> origin/gh/ZainRizvi/7/orig 2025-09-07T07:55:55.7072106Z * [new branch] gh/ZainRizvi/8/base -> origin/gh/ZainRizvi/8/base 2025-09-07T07:55:55.7073630Z * [new branch] gh/ZainRizvi/8/head -> origin/gh/ZainRizvi/8/head 2025-09-07T07:55:55.7076252Z * [new branch] gh/ZainRizvi/9/base -> origin/gh/ZainRizvi/9/base 2025-09-07T07:55:55.7077761Z * [new branch] gh/ZainRizvi/9/head -> origin/gh/ZainRizvi/9/head 2025-09-07T07:55:55.7079328Z * [new branch] gh/ZainRizvi/9/orig -> origin/gh/ZainRizvi/9/orig 2025-09-07T07:55:55.7082165Z * [new branch] gh/ZhiweiYan-96/39/base -> origin/gh/ZhiweiYan-96/39/base 2025-09-07T07:55:55.7083828Z * [new branch] gh/ZhiweiYan-96/39/head -> origin/gh/ZhiweiYan-96/39/head 2025-09-07T07:55:55.7085767Z * [new branch] gh/ZhiweiYan-96/39/orig -> origin/gh/ZhiweiYan-96/39/orig 2025-09-07T07:55:55.7087956Z * [new branch] gh/ZhiweiYan-96/44/base -> origin/gh/ZhiweiYan-96/44/base 2025-09-07T07:55:55.7089488Z * [new branch] gh/ZhiweiYan-96/44/head -> origin/gh/ZhiweiYan-96/44/head 2025-09-07T07:55:55.7091813Z * [new branch] gh/ZhiweiYan-96/45/base -> origin/gh/ZhiweiYan-96/45/base 2025-09-07T07:55:55.7093006Z * [new branch] gh/ZhiweiYan-96/45/head -> origin/gh/ZhiweiYan-96/45/head 2025-09-07T07:55:55.7095981Z * [new branch] gh/ZhiweiYan-96/49/base -> origin/gh/ZhiweiYan-96/49/base 2025-09-07T07:55:55.7097410Z * [new branch] gh/ZhiweiYan-96/49/head -> origin/gh/ZhiweiYan-96/49/head 2025-09-07T07:55:55.7099664Z * [new branch] gh/ZhiweiYan-96/62/base -> origin/gh/ZhiweiYan-96/62/base 2025-09-07T07:55:55.7101187Z * [new branch] gh/ZhiweiYan-96/62/head -> origin/gh/ZhiweiYan-96/62/head 2025-09-07T07:55:55.7103427Z * [new branch] gh/ZhiweiYan-96/64/base -> origin/gh/ZhiweiYan-96/64/base 2025-09-07T07:55:55.7105393Z * [new branch] gh/ZhiweiYan-96/64/head -> origin/gh/ZhiweiYan-96/64/head 2025-09-07T07:55:55.7106912Z * [new branch] gh/ZhiweiYan-96/64/orig -> origin/gh/ZhiweiYan-96/64/orig 2025-09-07T07:55:55.7109178Z * [new branch] gh/ZhiweiYan-96/65/base -> origin/gh/ZhiweiYan-96/65/base 2025-09-07T07:55:55.7110683Z * [new branch] gh/ZhiweiYan-96/65/head -> origin/gh/ZhiweiYan-96/65/head 2025-09-07T07:55:55.7112245Z * [new branch] gh/ZhiweiYan-96/65/orig -> origin/gh/ZhiweiYan-96/65/orig 2025-09-07T07:55:55.7115245Z * [new branch] gh/ZhiweiYan-96/66/base -> origin/gh/ZhiweiYan-96/66/base 2025-09-07T07:55:55.7116325Z * [new branch] gh/ZhiweiYan-96/66/head -> origin/gh/ZhiweiYan-96/66/head 2025-09-07T07:55:55.7118888Z * [new branch] gh/ZhiweiYan-96/67/base -> origin/gh/ZhiweiYan-96/67/base 2025-09-07T07:55:55.7120302Z * [new branch] gh/ZhiweiYan-96/67/head -> origin/gh/ZhiweiYan-96/67/head 2025-09-07T07:55:55.7122475Z * [new branch] gh/ZhiweiYan-96/68/base -> origin/gh/ZhiweiYan-96/68/base 2025-09-07T07:55:55.7124097Z * [new branch] gh/ZhiweiYan-96/68/head -> origin/gh/ZhiweiYan-96/68/head 2025-09-07T07:55:55.7125849Z * [new branch] gh/ZhiweiYan-96/68/orig -> origin/gh/ZhiweiYan-96/68/orig 2025-09-07T07:55:55.7129236Z * [new branch] gh/aakhundov/1/base -> origin/gh/aakhundov/1/base 2025-09-07T07:55:55.7130843Z * [new branch] gh/aakhundov/1/head -> origin/gh/aakhundov/1/head 2025-09-07T07:55:55.7132978Z * [new branch] gh/aakhundov/2/base -> origin/gh/aakhundov/2/base 2025-09-07T07:55:55.7134809Z * [new branch] gh/aakhundov/2/head -> origin/gh/aakhundov/2/head 2025-09-07T07:55:55.7137142Z * [new branch] gh/aditew01/openblas -> origin/gh/aditew01/openblas 2025-09-07T07:55:55.7138654Z * [new branch] gh/aditew01/sbgemm -> origin/gh/aditew01/sbgemm 2025-09-07T07:55:55.7140336Z * [new branch] gh/aditew01/vecbf16 -> origin/gh/aditew01/vecbf16 2025-09-07T07:55:55.7142739Z * [new branch] gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init 2025-09-07T07:55:55.7145653Z * [new branch] gh/alexsamardzic/9/base -> origin/gh/alexsamardzic/9/base 2025-09-07T07:55:55.7147165Z * [new branch] gh/alexsamardzic/9/head -> origin/gh/alexsamardzic/9/head 2025-09-07T07:55:55.7148794Z * [new branch] gh/alexsamardzic/9/orig -> origin/gh/alexsamardzic/9/orig 2025-09-07T07:55:55.7151728Z * [new branch] gh/amjames/18/base -> origin/gh/amjames/18/base 2025-09-07T07:55:55.7153207Z * [new branch] gh/amjames/18/head -> origin/gh/amjames/18/head 2025-09-07T07:55:55.7155096Z * [new branch] gh/amjames/18/orig -> origin/gh/amjames/18/orig 2025-09-07T07:55:55.7158114Z * [new branch] gh/andrewor14/35/base -> origin/gh/andrewor14/35/base 2025-09-07T07:55:55.7159963Z * [new branch] gh/andrewor14/35/head -> origin/gh/andrewor14/35/head 2025-09-07T07:55:55.7161399Z * [new branch] gh/andrewor14/35/orig -> origin/gh/andrewor14/35/orig 2025-09-07T07:55:55.7163932Z * [new branch] gh/andrewor14/50/base -> origin/gh/andrewor14/50/base 2025-09-07T07:55:55.7165752Z * [new branch] gh/andrewor14/50/head -> origin/gh/andrewor14/50/head 2025-09-07T07:55:55.7167300Z * [new branch] gh/andrewor14/50/orig -> origin/gh/andrewor14/50/orig 2025-09-07T07:55:55.7169573Z * [new branch] gh/andrewor14/51/base -> origin/gh/andrewor14/51/base 2025-09-07T07:55:55.7171244Z * [new branch] gh/andrewor14/51/orig -> origin/gh/andrewor14/51/orig 2025-09-07T07:55:55.7174417Z * [new branch] gh/andyanwang/1/base -> origin/gh/andyanwang/1/base 2025-09-07T07:55:55.7176171Z * [new branch] gh/andyanwang/1/head -> origin/gh/andyanwang/1/head 2025-09-07T07:55:55.7177669Z * [new branch] gh/andyanwang/1/orig -> origin/gh/andyanwang/1/orig 2025-09-07T07:55:55.7180007Z * [new branch] gh/andyanwang/13/base -> origin/gh/andyanwang/13/base 2025-09-07T07:55:55.7181626Z * [new branch] gh/andyanwang/13/head -> origin/gh/andyanwang/13/head 2025-09-07T07:55:55.7183860Z * [new branch] gh/andyanwang/13/orig -> origin/gh/andyanwang/13/orig 2025-09-07T07:55:55.7186289Z * [new branch] gh/andyanwang/2/base -> origin/gh/andyanwang/2/base 2025-09-07T07:55:55.7187839Z * [new branch] gh/andyanwang/2/head -> origin/gh/andyanwang/2/head 2025-09-07T07:55:55.7189481Z * [new branch] gh/andyanwang/2/orig -> origin/gh/andyanwang/2/orig 2025-09-07T07:55:55.7191743Z * [new branch] gh/andyanwang/28/base -> origin/gh/andyanwang/28/base 2025-09-07T07:55:55.7193412Z * [new branch] gh/andyanwang/28/head -> origin/gh/andyanwang/28/head 2025-09-07T07:55:55.7195309Z * [new branch] gh/andyanwang/28/orig -> origin/gh/andyanwang/28/orig 2025-09-07T07:55:55.7197435Z * [new branch] gh/andyanwang/3/base -> origin/gh/andyanwang/3/base 2025-09-07T07:55:55.7199102Z * [new branch] gh/andyanwang/3/head -> origin/gh/andyanwang/3/head 2025-09-07T07:55:55.7200616Z * [new branch] gh/andyanwang/3/orig -> origin/gh/andyanwang/3/orig 2025-09-07T07:55:55.7203114Z * [new branch] gh/andyanwang/30/base -> origin/gh/andyanwang/30/base 2025-09-07T07:55:55.7205707Z * [new branch] gh/andyanwang/30/orig -> origin/gh/andyanwang/30/orig 2025-09-07T07:55:55.7208768Z * [new branch] gh/andyanwang/31/base -> origin/gh/andyanwang/31/base 2025-09-07T07:55:55.7211429Z * [new branch] gh/andyanwang/31/orig -> origin/gh/andyanwang/31/orig 2025-09-07T07:55:55.7215457Z * [new branch] gh/andyanwang/32/base -> origin/gh/andyanwang/32/base 2025-09-07T07:55:55.7225169Z * [new branch] gh/andyanwang/32/head -> origin/gh/andyanwang/32/head 2025-09-07T07:55:55.7225660Z * [new branch] gh/andyanwang/32/orig -> origin/gh/andyanwang/32/orig 2025-09-07T07:55:55.7226110Z * [new branch] gh/andyanwang/39/base -> origin/gh/andyanwang/39/base 2025-09-07T07:55:55.7226562Z * [new branch] gh/andyanwang/39/head -> origin/gh/andyanwang/39/head 2025-09-07T07:55:55.7226990Z * [new branch] gh/andyanwang/39/orig -> origin/gh/andyanwang/39/orig 2025-09-07T07:55:55.7227434Z * [new branch] gh/andyanwang/4/base -> origin/gh/andyanwang/4/base 2025-09-07T07:55:55.7228230Z * [new branch] gh/andyanwang/4/head -> origin/gh/andyanwang/4/head 2025-09-07T07:55:55.7230031Z * [new branch] gh/andyanwang/4/orig -> origin/gh/andyanwang/4/orig 2025-09-07T07:55:55.7232990Z * [new branch] gh/angelayi/107/base -> origin/gh/angelayi/107/base 2025-09-07T07:55:55.7234735Z * [new branch] gh/angelayi/107/head -> origin/gh/angelayi/107/head 2025-09-07T07:55:55.7236890Z * [new branch] gh/angelayi/111/base -> origin/gh/angelayi/111/base 2025-09-07T07:55:55.7238545Z * [new branch] gh/angelayi/111/head -> origin/gh/angelayi/111/head 2025-09-07T07:55:55.7240107Z * [new branch] gh/angelayi/111/orig -> origin/gh/angelayi/111/orig 2025-09-07T07:55:55.7242367Z * [new branch] gh/angelayi/112/base -> origin/gh/angelayi/112/base 2025-09-07T07:55:55.7244449Z * [new branch] gh/angelayi/112/head -> origin/gh/angelayi/112/head 2025-09-07T07:55:55.7246206Z * [new branch] gh/angelayi/112/orig -> origin/gh/angelayi/112/orig 2025-09-07T07:55:55.7248477Z * [new branch] gh/angelayi/113/base -> origin/gh/angelayi/113/base 2025-09-07T07:55:55.7250098Z * [new branch] gh/angelayi/113/head -> origin/gh/angelayi/113/head 2025-09-07T07:55:55.7251643Z * [new branch] gh/angelayi/113/orig -> origin/gh/angelayi/113/orig 2025-09-07T07:55:55.7254024Z * [new branch] gh/angelayi/114/base -> origin/gh/angelayi/114/base 2025-09-07T07:55:55.7255840Z * [new branch] gh/angelayi/114/head -> origin/gh/angelayi/114/head 2025-09-07T07:55:55.7257540Z * [new branch] gh/angelayi/114/orig -> origin/gh/angelayi/114/orig 2025-09-07T07:55:55.7259550Z * [new branch] gh/angelayi/115/base -> origin/gh/angelayi/115/base 2025-09-07T07:55:55.7261121Z * [new branch] gh/angelayi/115/head -> origin/gh/angelayi/115/head 2025-09-07T07:55:55.7262664Z * [new branch] gh/angelayi/115/orig -> origin/gh/angelayi/115/orig 2025-09-07T07:55:55.7265855Z * [new branch] gh/anijain2305/753/base -> origin/gh/anijain2305/753/base 2025-09-07T07:55:55.7267462Z * [new branch] gh/anijain2305/753/head -> origin/gh/anijain2305/753/head 2025-09-07T07:55:55.7268971Z * [new branch] gh/anijain2305/753/orig -> origin/gh/anijain2305/753/orig 2025-09-07T07:55:55.7271313Z * [new branch] gh/anijain2305/766/base -> origin/gh/anijain2305/766/base 2025-09-07T07:55:55.7272891Z * [new branch] gh/anijain2305/766/head -> origin/gh/anijain2305/766/head 2025-09-07T07:55:55.7274761Z * [new branch] gh/anijain2305/766/orig -> origin/gh/anijain2305/766/orig 2025-09-07T07:55:55.7277125Z * [new branch] gh/anijain2305/790/base -> origin/gh/anijain2305/790/base 2025-09-07T07:55:55.7278746Z * [new branch] gh/anijain2305/790/head -> origin/gh/anijain2305/790/head 2025-09-07T07:55:55.7280283Z * [new branch] gh/anijain2305/790/orig -> origin/gh/anijain2305/790/orig 2025-09-07T07:55:55.7282696Z * [new branch] gh/anijain2305/792/base -> origin/gh/anijain2305/792/base 2025-09-07T07:55:55.7284336Z * [new branch] gh/anijain2305/792/head -> origin/gh/anijain2305/792/head 2025-09-07T07:55:55.7286106Z * [new branch] gh/anijain2305/792/orig -> origin/gh/anijain2305/792/orig 2025-09-07T07:55:55.7288299Z * [new branch] gh/anijain2305/803/base -> origin/gh/anijain2305/803/base 2025-09-07T07:55:55.7289847Z * [new branch] gh/anijain2305/803/head -> origin/gh/anijain2305/803/head 2025-09-07T07:55:55.7291410Z * [new branch] gh/anijain2305/803/orig -> origin/gh/anijain2305/803/orig 2025-09-07T07:55:55.7293608Z * [new branch] gh/anijain2305/804/base -> origin/gh/anijain2305/804/base 2025-09-07T07:55:55.7295557Z * [new branch] gh/anijain2305/804/head -> origin/gh/anijain2305/804/head 2025-09-07T07:55:55.7297295Z * [new branch] gh/anijain2305/804/orig -> origin/gh/anijain2305/804/orig 2025-09-07T07:55:55.7299315Z * [new branch] gh/anijain2305/805/base -> origin/gh/anijain2305/805/base 2025-09-07T07:55:55.7300842Z * [new branch] gh/anijain2305/805/head -> origin/gh/anijain2305/805/head 2025-09-07T07:55:55.7302341Z * [new branch] gh/anijain2305/805/orig -> origin/gh/anijain2305/805/orig 2025-09-07T07:55:55.7305140Z * [new branch] gh/anijain2305/810/base -> origin/gh/anijain2305/810/base 2025-09-07T07:55:55.7306758Z * [new branch] gh/anijain2305/810/head -> origin/gh/anijain2305/810/head 2025-09-07T07:55:55.7308240Z * [new branch] gh/anijain2305/810/orig -> origin/gh/anijain2305/810/orig 2025-09-07T07:55:55.7310577Z * [new branch] gh/anijain2305/812/base -> origin/gh/anijain2305/812/base 2025-09-07T07:55:55.7312220Z * [new branch] gh/anijain2305/812/head -> origin/gh/anijain2305/812/head 2025-09-07T07:55:55.7313891Z * [new branch] gh/anijain2305/812/orig -> origin/gh/anijain2305/812/orig 2025-09-07T07:55:55.7316458Z * [new branch] gh/anijain2305/838/base -> origin/gh/anijain2305/838/base 2025-09-07T07:55:55.7318060Z * [new branch] gh/anijain2305/838/head -> origin/gh/anijain2305/838/head 2025-09-07T07:55:55.7319607Z * [new branch] gh/anijain2305/838/orig -> origin/gh/anijain2305/838/orig 2025-09-07T07:55:55.7321848Z * [new branch] gh/anijain2305/839/base -> origin/gh/anijain2305/839/base 2025-09-07T07:55:55.7323399Z * [new branch] gh/anijain2305/839/head -> origin/gh/anijain2305/839/head 2025-09-07T07:55:55.7325371Z * [new branch] gh/anijain2305/839/orig -> origin/gh/anijain2305/839/orig 2025-09-07T07:55:55.7327506Z * [new branch] gh/anijain2305/843/base -> origin/gh/anijain2305/843/base 2025-09-07T07:55:55.7329153Z * [new branch] gh/anijain2305/843/head -> origin/gh/anijain2305/843/head 2025-09-07T07:55:55.7330651Z * [new branch] gh/anijain2305/843/orig -> origin/gh/anijain2305/843/orig 2025-09-07T07:55:55.7332933Z * [new branch] gh/anijain2305/844/base -> origin/gh/anijain2305/844/base 2025-09-07T07:55:55.7334775Z * [new branch] gh/anijain2305/844/head -> origin/gh/anijain2305/844/head 2025-09-07T07:55:55.7336328Z * [new branch] gh/anijain2305/844/orig -> origin/gh/anijain2305/844/orig 2025-09-07T07:55:55.7338788Z * [new branch] gh/anijain2305/846/base -> origin/gh/anijain2305/846/base 2025-09-07T07:55:55.7340327Z * [new branch] gh/anijain2305/846/head -> origin/gh/anijain2305/846/head 2025-09-07T07:55:55.7342357Z * [new branch] gh/anijain2305/846/orig -> origin/gh/anijain2305/846/orig 2025-09-07T07:55:55.7344744Z * [new branch] gh/anijain2305/848/base -> origin/gh/anijain2305/848/base 2025-09-07T07:55:55.7346311Z * [new branch] gh/anijain2305/848/head -> origin/gh/anijain2305/848/head 2025-09-07T07:55:55.7347793Z * [new branch] gh/anijain2305/848/orig -> origin/gh/anijain2305/848/orig 2025-09-07T07:55:55.7350139Z * [new branch] gh/anijain2305/849/base -> origin/gh/anijain2305/849/base 2025-09-07T07:55:55.7351654Z * [new branch] gh/anijain2305/849/head -> origin/gh/anijain2305/849/head 2025-09-07T07:55:55.7353171Z * [new branch] gh/anijain2305/849/orig -> origin/gh/anijain2305/849/orig 2025-09-07T07:55:55.7355917Z * [new branch] gh/anijain2305/850/base -> origin/gh/anijain2305/850/base 2025-09-07T07:55:55.7357424Z * [new branch] gh/anijain2305/850/head -> origin/gh/anijain2305/850/head 2025-09-07T07:55:55.7358950Z * [new branch] gh/anijain2305/850/orig -> origin/gh/anijain2305/850/orig 2025-09-07T07:55:55.7361416Z * [new branch] gh/anijain2305/851/base -> origin/gh/anijain2305/851/base 2025-09-07T07:55:55.7362919Z * [new branch] gh/anijain2305/851/head -> origin/gh/anijain2305/851/head 2025-09-07T07:55:55.7364788Z * [new branch] gh/anijain2305/851/orig -> origin/gh/anijain2305/851/orig 2025-09-07T07:55:55.7367017Z * [new branch] gh/anijain2305/852/base -> origin/gh/anijain2305/852/base 2025-09-07T07:55:55.7368639Z * [new branch] gh/anijain2305/852/head -> origin/gh/anijain2305/852/head 2025-09-07T07:55:55.7370230Z * [new branch] gh/anijain2305/852/orig -> origin/gh/anijain2305/852/orig 2025-09-07T07:55:55.7372376Z * [new branch] gh/anijain2305/853/base -> origin/gh/anijain2305/853/base 2025-09-07T07:55:55.7374073Z * [new branch] gh/anijain2305/853/head -> origin/gh/anijain2305/853/head 2025-09-07T07:55:55.7375900Z * [new branch] gh/anijain2305/853/orig -> origin/gh/anijain2305/853/orig 2025-09-07T07:55:55.7378039Z * [new branch] gh/anijain2305/854/base -> origin/gh/anijain2305/854/base 2025-09-07T07:55:55.7379626Z * [new branch] gh/anijain2305/854/head -> origin/gh/anijain2305/854/head 2025-09-07T07:55:55.7381259Z * [new branch] gh/anijain2305/854/orig -> origin/gh/anijain2305/854/orig 2025-09-07T07:55:55.7383526Z * [new branch] gh/anijain2305/855/base -> origin/gh/anijain2305/855/base 2025-09-07T07:55:55.7385647Z * [new branch] gh/anijain2305/855/head -> origin/gh/anijain2305/855/head 2025-09-07T07:55:55.7387062Z * [new branch] gh/anijain2305/855/orig -> origin/gh/anijain2305/855/orig 2025-09-07T07:55:55.7389362Z * [new branch] gh/anijain2305/856/base -> origin/gh/anijain2305/856/base 2025-09-07T07:55:55.7390936Z * [new branch] gh/anijain2305/856/head -> origin/gh/anijain2305/856/head 2025-09-07T07:55:55.7392444Z * [new branch] gh/anijain2305/856/orig -> origin/gh/anijain2305/856/orig 2025-09-07T07:55:55.7395115Z * [new branch] gh/anijain2305/857/base -> origin/gh/anijain2305/857/base 2025-09-07T07:55:55.7396646Z * [new branch] gh/anijain2305/857/head -> origin/gh/anijain2305/857/head 2025-09-07T07:55:55.7398280Z * [new branch] gh/anijain2305/857/orig -> origin/gh/anijain2305/857/orig 2025-09-07T07:55:55.7400643Z * [new branch] gh/anijain2305/858/base -> origin/gh/anijain2305/858/base 2025-09-07T07:55:55.7402160Z * [new branch] gh/anijain2305/858/head -> origin/gh/anijain2305/858/head 2025-09-07T07:55:55.7403679Z * [new branch] gh/anijain2305/858/orig -> origin/gh/anijain2305/858/orig 2025-09-07T07:55:55.7406322Z * [new branch] gh/anijain2305/859/base -> origin/gh/anijain2305/859/base 2025-09-07T07:55:55.7407808Z * [new branch] gh/anijain2305/859/head -> origin/gh/anijain2305/859/head 2025-09-07T07:55:55.7409386Z * [new branch] gh/anijain2305/859/orig -> origin/gh/anijain2305/859/orig 2025-09-07T07:55:55.7411619Z * [new branch] gh/anijain2305/860/base -> origin/gh/anijain2305/860/base 2025-09-07T07:55:55.7413138Z * [new branch] gh/anijain2305/860/head -> origin/gh/anijain2305/860/head 2025-09-07T07:55:55.7415131Z * [new branch] gh/anijain2305/860/orig -> origin/gh/anijain2305/860/orig 2025-09-07T07:55:55.7417280Z * [new branch] gh/anijain2305/861/base -> origin/gh/anijain2305/861/base 2025-09-07T07:55:55.7418897Z * [new branch] gh/anijain2305/861/head -> origin/gh/anijain2305/861/head 2025-09-07T07:55:55.7420494Z * [new branch] gh/anijain2305/861/orig -> origin/gh/anijain2305/861/orig 2025-09-07T07:55:55.7422704Z * [new branch] gh/anijain2305/862/base -> origin/gh/anijain2305/862/base 2025-09-07T07:55:55.7424729Z * [new branch] gh/anijain2305/862/head -> origin/gh/anijain2305/862/head 2025-09-07T07:55:55.7426388Z * [new branch] gh/anijain2305/862/orig -> origin/gh/anijain2305/862/orig 2025-09-07T07:55:55.7428564Z * [new branch] gh/anijain2305/863/base -> origin/gh/anijain2305/863/base 2025-09-07T07:55:55.7430152Z * [new branch] gh/anijain2305/863/head -> origin/gh/anijain2305/863/head 2025-09-07T07:55:55.7431902Z * [new branch] gh/anijain2305/863/orig -> origin/gh/anijain2305/863/orig 2025-09-07T07:55:55.7434799Z * [new branch] gh/anijain2305/864/base -> origin/gh/anijain2305/864/base 2025-09-07T07:55:55.7436357Z * [new branch] gh/anijain2305/864/head -> origin/gh/anijain2305/864/head 2025-09-07T07:55:55.7438055Z * [new branch] gh/anijain2305/864/orig -> origin/gh/anijain2305/864/orig 2025-09-07T07:55:55.7440443Z * [new branch] gh/anijain2305/865/base -> origin/gh/anijain2305/865/base 2025-09-07T07:55:55.7442028Z * [new branch] gh/anijain2305/865/head -> origin/gh/anijain2305/865/head 2025-09-07T07:55:55.7443546Z * [new branch] gh/anijain2305/865/orig -> origin/gh/anijain2305/865/orig 2025-09-07T07:55:55.7446213Z * [new branch] gh/anijain2305/866/base -> origin/gh/anijain2305/866/base 2025-09-07T07:55:55.7447696Z * [new branch] gh/anijain2305/866/head -> origin/gh/anijain2305/866/head 2025-09-07T07:55:55.7449214Z * [new branch] gh/anijain2305/866/orig -> origin/gh/anijain2305/866/orig 2025-09-07T07:55:55.7452028Z * [new branch] gh/anjali411/216/base -> origin/gh/anjali411/216/base 2025-09-07T07:55:55.7453653Z * [new branch] gh/anjali411/216/head -> origin/gh/anjali411/216/head 2025-09-07T07:55:55.7455598Z * [new branch] gh/anjali411/216/orig -> origin/gh/anjali411/216/orig 2025-09-07T07:55:55.7458348Z * [new branch] gh/ankitageorge/13/base -> origin/gh/ankitageorge/13/base 2025-09-07T07:55:55.7459947Z * [new branch] gh/ankitageorge/13/head -> origin/gh/ankitageorge/13/head 2025-09-07T07:55:55.7461521Z * [new branch] gh/ankitageorge/13/orig -> origin/gh/ankitageorge/13/orig 2025-09-07T07:55:55.7463975Z * [new branch] gh/ankitageorge/14/base -> origin/gh/ankitageorge/14/base 2025-09-07T07:55:55.7465754Z * [new branch] gh/ankitageorge/14/head -> origin/gh/ankitageorge/14/head 2025-09-07T07:55:55.7467498Z * [new branch] gh/ankitageorge/14/orig -> origin/gh/ankitageorge/14/orig 2025-09-07T07:55:55.7469743Z * [new branch] gh/ankitageorge/15/base -> origin/gh/ankitageorge/15/base 2025-09-07T07:55:55.7471286Z * [new branch] gh/ankitageorge/15/head -> origin/gh/ankitageorge/15/head 2025-09-07T07:55:55.7472907Z * [new branch] gh/ankitageorge/15/orig -> origin/gh/ankitageorge/15/orig 2025-09-07T07:55:55.7475664Z * [new branch] gh/ankitageorge/16/base -> origin/gh/ankitageorge/16/base 2025-09-07T07:55:55.7477308Z * [new branch] gh/ankitageorge/16/head -> origin/gh/ankitageorge/16/head 2025-09-07T07:55:55.7478989Z * [new branch] gh/ankitageorge/16/orig -> origin/gh/ankitageorge/16/orig 2025-09-07T07:55:55.7481311Z * [new branch] gh/ankitageorge/17/base -> origin/gh/ankitageorge/17/base 2025-09-07T07:55:55.7482772Z * [new branch] gh/ankitageorge/17/head -> origin/gh/ankitageorge/17/head 2025-09-07T07:55:55.7484668Z * [new branch] gh/ankitageorge/17/orig -> origin/gh/ankitageorge/17/orig 2025-09-07T07:55:55.7487036Z * [new branch] gh/ankitageorge/21/base -> origin/gh/ankitageorge/21/base 2025-09-07T07:55:55.7488511Z * [new branch] gh/ankitageorge/21/head -> origin/gh/ankitageorge/21/head 2025-09-07T07:55:55.7489987Z * [new branch] gh/ankitageorge/21/orig -> origin/gh/ankitageorge/21/orig 2025-09-07T07:55:55.7493195Z * [new branch] gh/anshul-si/1/base -> origin/gh/anshul-si/1/base 2025-09-07T07:55:55.7495058Z * [new branch] gh/anshul-si/1/head -> origin/gh/anshul-si/1/head 2025-09-07T07:55:55.7497214Z * [new branch] gh/anshul-si/15/base -> origin/gh/anshul-si/15/base 2025-09-07T07:55:55.7498713Z * [new branch] gh/anshul-si/15/head -> origin/gh/anshul-si/15/head 2025-09-07T07:55:55.7500319Z * [new branch] gh/anshul-si/15/orig -> origin/gh/anshul-si/15/orig 2025-09-07T07:55:55.7502666Z * [new branch] gh/anshul-si/16/base -> origin/gh/anshul-si/16/base 2025-09-07T07:55:55.7504861Z * [new branch] gh/anshul-si/16/head -> origin/gh/anshul-si/16/head 2025-09-07T07:55:55.7506014Z * [new branch] gh/anshul-si/16/orig -> origin/gh/anshul-si/16/orig 2025-09-07T07:55:55.7508457Z * [new branch] gh/anshul-si/17/base -> origin/gh/anshul-si/17/base 2025-09-07T07:55:55.7510252Z * [new branch] gh/anshul-si/17/head -> origin/gh/anshul-si/17/head 2025-09-07T07:55:55.7511832Z * [new branch] gh/anshul-si/17/orig -> origin/gh/anshul-si/17/orig 2025-09-07T07:55:55.7514609Z * [new branch] gh/anshul-si/18/base -> origin/gh/anshul-si/18/base 2025-09-07T07:55:55.7516208Z * [new branch] gh/anshul-si/18/head -> origin/gh/anshul-si/18/head 2025-09-07T07:55:55.7517960Z * [new branch] gh/anshul-si/18/orig -> origin/gh/anshul-si/18/orig 2025-09-07T07:55:55.7520239Z * [new branch] gh/anshul-si/19/base -> origin/gh/anshul-si/19/base 2025-09-07T07:55:55.7521886Z * [new branch] gh/anshul-si/19/head -> origin/gh/anshul-si/19/head 2025-09-07T07:55:55.7523402Z * [new branch] gh/anshul-si/19/orig -> origin/gh/anshul-si/19/orig 2025-09-07T07:55:55.7525902Z * [new branch] gh/anshul-si/2/base -> origin/gh/anshul-si/2/base 2025-09-07T07:55:55.7527402Z * [new branch] gh/anshul-si/2/head -> origin/gh/anshul-si/2/head 2025-09-07T07:55:55.7530004Z * [new branch] gh/anshul-si/20/base -> origin/gh/anshul-si/20/base 2025-09-07T07:55:55.7531705Z * [new branch] gh/anshul-si/20/head -> origin/gh/anshul-si/20/head 2025-09-07T07:55:55.7533223Z * [new branch] gh/anshul-si/20/orig -> origin/gh/anshul-si/20/orig 2025-09-07T07:55:55.7535847Z * [new branch] gh/anshul-si/21/base -> origin/gh/anshul-si/21/base 2025-09-07T07:55:55.7537400Z * [new branch] gh/anshul-si/21/head -> origin/gh/anshul-si/21/head 2025-09-07T07:55:55.7538928Z * [new branch] gh/anshul-si/21/orig -> origin/gh/anshul-si/21/orig 2025-09-07T07:55:55.7541204Z * [new branch] gh/anshul-si/22/base -> origin/gh/anshul-si/22/base 2025-09-07T07:55:55.7542804Z * [new branch] gh/anshul-si/22/head -> origin/gh/anshul-si/22/head 2025-09-07T07:55:55.7544696Z * [new branch] gh/anshul-si/22/orig -> origin/gh/anshul-si/22/orig 2025-09-07T07:55:55.7546906Z * [new branch] gh/anshul-si/23/base -> origin/gh/anshul-si/23/base 2025-09-07T07:55:55.7548556Z * [new branch] gh/anshul-si/23/head -> origin/gh/anshul-si/23/head 2025-09-07T07:55:55.7550062Z * [new branch] gh/anshul-si/23/orig -> origin/gh/anshul-si/23/orig 2025-09-07T07:55:55.7552497Z * [new branch] gh/anshul-si/24/base -> origin/gh/anshul-si/24/base 2025-09-07T07:55:55.7554385Z * [new branch] gh/anshul-si/24/head -> origin/gh/anshul-si/24/head 2025-09-07T07:55:55.7556034Z * [new branch] gh/anshul-si/24/orig -> origin/gh/anshul-si/24/orig 2025-09-07T07:55:55.7558483Z * [new branch] gh/anshul-si/25/base -> origin/gh/anshul-si/25/base 2025-09-07T07:55:55.7560209Z * [new branch] gh/anshul-si/25/head -> origin/gh/anshul-si/25/head 2025-09-07T07:55:55.7561645Z * [new branch] gh/anshul-si/25/orig -> origin/gh/anshul-si/25/orig 2025-09-07T07:55:55.7564054Z * [new branch] gh/anshul-si/26/base -> origin/gh/anshul-si/26/base 2025-09-07T07:55:55.7565841Z * [new branch] gh/anshul-si/26/head -> origin/gh/anshul-si/26/head 2025-09-07T07:55:55.7567342Z * [new branch] gh/anshul-si/26/orig -> origin/gh/anshul-si/26/orig 2025-09-07T07:55:55.7569738Z * [new branch] gh/anshul-si/27/base -> origin/gh/anshul-si/27/base 2025-09-07T07:55:55.7571307Z * [new branch] gh/anshul-si/27/head -> origin/gh/anshul-si/27/head 2025-09-07T07:55:55.7572912Z * [new branch] gh/anshul-si/27/orig -> origin/gh/anshul-si/27/orig 2025-09-07T07:55:55.7575435Z * [new branch] gh/anshul-si/28/base -> origin/gh/anshul-si/28/base 2025-09-07T07:55:55.7576925Z * [new branch] gh/anshul-si/28/head -> origin/gh/anshul-si/28/head 2025-09-07T07:55:55.7578534Z * [new branch] gh/anshul-si/28/orig -> origin/gh/anshul-si/28/orig 2025-09-07T07:55:55.7581080Z * [new branch] gh/anshul-si/29/base -> origin/gh/anshul-si/29/base 2025-09-07T07:55:55.7582899Z * [new branch] gh/anshul-si/29/head -> origin/gh/anshul-si/29/head 2025-09-07T07:55:55.7584850Z * [new branch] gh/anshul-si/29/orig -> origin/gh/anshul-si/29/orig 2025-09-07T07:55:55.7586870Z * [new branch] gh/anshul-si/3/base -> origin/gh/anshul-si/3/base 2025-09-07T07:55:55.7588396Z * [new branch] gh/anshul-si/3/head -> origin/gh/anshul-si/3/head 2025-09-07T07:55:55.7590616Z * [new branch] gh/anshul-si/4/base -> origin/gh/anshul-si/4/base 2025-09-07T07:55:55.7592092Z * [new branch] gh/anshul-si/4/head -> origin/gh/anshul-si/4/head 2025-09-07T07:55:55.7594627Z * [new branch] gh/anshul-si/5/base -> origin/gh/anshul-si/5/base 2025-09-07T07:55:55.7596089Z * [new branch] gh/anshul-si/5/head -> origin/gh/anshul-si/5/head 2025-09-07T07:55:55.7599074Z * [new branch] gh/aorenste/132/base -> origin/gh/aorenste/132/base 2025-09-07T07:55:55.7600670Z * [new branch] gh/aorenste/132/head -> origin/gh/aorenste/132/head 2025-09-07T07:55:55.7603426Z * [new branch] gh/bdhirsh/650/base -> origin/gh/bdhirsh/650/base 2025-09-07T07:55:55.7605638Z * [new branch] gh/bdhirsh/650/head -> origin/gh/bdhirsh/650/head 2025-09-07T07:55:55.7607139Z * [new branch] gh/bdhirsh/650/orig -> origin/gh/bdhirsh/650/orig 2025-09-07T07:55:55.7609367Z * [new branch] gh/bdhirsh/663/base -> origin/gh/bdhirsh/663/base 2025-09-07T07:55:55.7611138Z * [new branch] gh/bdhirsh/663/head -> origin/gh/bdhirsh/663/head 2025-09-07T07:55:55.7614049Z * [new branch] gh/bdhirsh/663/orig -> origin/gh/bdhirsh/663/orig 2025-09-07T07:55:55.7616636Z * [new branch] gh/bdhirsh/665/base -> origin/gh/bdhirsh/665/base 2025-09-07T07:55:55.7618096Z * [new branch] gh/bdhirsh/665/head -> origin/gh/bdhirsh/665/head 2025-09-07T07:55:55.7619599Z * [new branch] gh/bdhirsh/665/orig -> origin/gh/bdhirsh/665/orig 2025-09-07T07:55:55.7622099Z * [new branch] gh/bdhirsh/666/base -> origin/gh/bdhirsh/666/base 2025-09-07T07:55:55.7623891Z * [new branch] gh/bdhirsh/666/head -> origin/gh/bdhirsh/666/head 2025-09-07T07:55:55.7625850Z * [new branch] gh/bdhirsh/666/orig -> origin/gh/bdhirsh/666/orig 2025-09-07T07:55:55.7628571Z * [new branch] gh/bdhirsh/667/base -> origin/gh/bdhirsh/667/base 2025-09-07T07:55:55.7630295Z * [new branch] gh/bdhirsh/667/head -> origin/gh/bdhirsh/667/head 2025-09-07T07:55:55.7631738Z * [new branch] gh/bdhirsh/667/orig -> origin/gh/bdhirsh/667/orig 2025-09-07T07:55:55.7634258Z * [new branch] gh/bdhirsh/668/base -> origin/gh/bdhirsh/668/base 2025-09-07T07:55:55.7636019Z * [new branch] gh/bdhirsh/668/head -> origin/gh/bdhirsh/668/head 2025-09-07T07:55:55.7637574Z * [new branch] gh/bdhirsh/668/orig -> origin/gh/bdhirsh/668/orig 2025-09-07T07:55:55.7640006Z * [new branch] gh/bdhirsh/669/base -> origin/gh/bdhirsh/669/base 2025-09-07T07:55:55.7641535Z * [new branch] gh/bdhirsh/669/head -> origin/gh/bdhirsh/669/head 2025-09-07T07:55:55.7643132Z * [new branch] gh/bdhirsh/669/orig -> origin/gh/bdhirsh/669/orig 2025-09-07T07:55:55.7646038Z * [new branch] gh/bdhirsh/670/base -> origin/gh/bdhirsh/670/base 2025-09-07T07:55:55.7647550Z * [new branch] gh/bdhirsh/670/head -> origin/gh/bdhirsh/670/head 2025-09-07T07:55:55.7649156Z * [new branch] gh/bdhirsh/670/orig -> origin/gh/bdhirsh/670/orig 2025-09-07T07:55:55.7652043Z * [new branch] gh/benjaminglass1/100/base -> origin/gh/benjaminglass1/100/base 2025-09-07T07:55:55.7653604Z * [new branch] gh/benjaminglass1/100/head -> origin/gh/benjaminglass1/100/head 2025-09-07T07:55:55.7655642Z * [new branch] gh/benjaminglass1/100/orig -> origin/gh/benjaminglass1/100/orig 2025-09-07T07:55:55.7657799Z * [new branch] gh/benjaminglass1/101/base -> origin/gh/benjaminglass1/101/base 2025-09-07T07:55:55.7659518Z * [new branch] gh/benjaminglass1/101/head -> origin/gh/benjaminglass1/101/head 2025-09-07T07:55:55.7661110Z * [new branch] gh/benjaminglass1/101/orig -> origin/gh/benjaminglass1/101/orig 2025-09-07T07:55:55.7663425Z * [new branch] gh/benjaminglass1/102/base -> origin/gh/benjaminglass1/102/base 2025-09-07T07:55:55.7665385Z * [new branch] gh/benjaminglass1/102/head -> origin/gh/benjaminglass1/102/head 2025-09-07T07:55:55.7666904Z * [new branch] gh/benjaminglass1/102/orig -> origin/gh/benjaminglass1/102/orig 2025-09-07T07:55:55.7669191Z * [new branch] gh/benjaminglass1/103/base -> origin/gh/benjaminglass1/103/base 2025-09-07T07:55:55.7670763Z * [new branch] gh/benjaminglass1/103/head -> origin/gh/benjaminglass1/103/head 2025-09-07T07:55:55.7672407Z * [new branch] gh/benjaminglass1/103/orig -> origin/gh/benjaminglass1/103/orig 2025-09-07T07:55:55.7675067Z * [new branch] gh/benjaminglass1/104/base -> origin/gh/benjaminglass1/104/base 2025-09-07T07:55:55.7676519Z * [new branch] gh/benjaminglass1/104/head -> origin/gh/benjaminglass1/104/head 2025-09-07T07:55:55.7678141Z * [new branch] gh/benjaminglass1/104/orig -> origin/gh/benjaminglass1/104/orig 2025-09-07T07:55:55.7680381Z * [new branch] gh/benjaminglass1/105/base -> origin/gh/benjaminglass1/105/base 2025-09-07T07:55:55.7681926Z * [new branch] gh/benjaminglass1/105/head -> origin/gh/benjaminglass1/105/head 2025-09-07T07:55:55.7683649Z * [new branch] gh/benjaminglass1/105/orig -> origin/gh/benjaminglass1/105/orig 2025-09-07T07:55:55.7686326Z * [new branch] gh/benjaminglass1/106/base -> origin/gh/benjaminglass1/106/base 2025-09-07T07:55:55.7687850Z * [new branch] gh/benjaminglass1/106/head -> origin/gh/benjaminglass1/106/head 2025-09-07T07:55:55.7689392Z * [new branch] gh/benjaminglass1/106/orig -> origin/gh/benjaminglass1/106/orig 2025-09-07T07:55:55.7691691Z * [new branch] gh/benjaminglass1/79/base -> origin/gh/benjaminglass1/79/base 2025-09-07T07:55:55.7693178Z * [new branch] gh/benjaminglass1/79/head -> origin/gh/benjaminglass1/79/head 2025-09-07T07:55:55.7695327Z * [new branch] gh/benjaminglass1/79/orig -> origin/gh/benjaminglass1/79/orig 2025-09-07T07:55:55.7697339Z * [new branch] gh/benjaminglass1/86/base -> origin/gh/benjaminglass1/86/base 2025-09-07T07:55:55.7698933Z * [new branch] gh/benjaminglass1/86/head -> origin/gh/benjaminglass1/86/head 2025-09-07T07:55:55.7700481Z * [new branch] gh/benjaminglass1/86/orig -> origin/gh/benjaminglass1/86/orig 2025-09-07T07:55:55.7702776Z * [new branch] gh/benjaminglass1/89/base -> origin/gh/benjaminglass1/89/base 2025-09-07T07:55:55.7704681Z * [new branch] gh/benjaminglass1/89/head -> origin/gh/benjaminglass1/89/head 2025-09-07T07:55:55.7706267Z * [new branch] gh/benjaminglass1/89/orig -> origin/gh/benjaminglass1/89/orig 2025-09-07T07:55:55.7708455Z * [new branch] gh/benjaminglass1/91/base -> origin/gh/benjaminglass1/91/base 2025-09-07T07:55:55.7710063Z * [new branch] gh/benjaminglass1/91/head -> origin/gh/benjaminglass1/91/head 2025-09-07T07:55:55.7711679Z * [new branch] gh/benjaminglass1/91/orig -> origin/gh/benjaminglass1/91/orig 2025-09-07T07:55:55.7714046Z * [new branch] gh/benjaminglass1/93/base -> origin/gh/benjaminglass1/93/base 2025-09-07T07:55:55.7715870Z * [new branch] gh/benjaminglass1/93/head -> origin/gh/benjaminglass1/93/head 2025-09-07T07:55:55.7717431Z * [new branch] gh/benjaminglass1/93/orig -> origin/gh/benjaminglass1/93/orig 2025-09-07T07:55:55.7719718Z * [new branch] gh/benjaminglass1/95/base -> origin/gh/benjaminglass1/95/base 2025-09-07T07:55:55.7721318Z * [new branch] gh/benjaminglass1/95/head -> origin/gh/benjaminglass1/95/head 2025-09-07T07:55:55.7723108Z * [new branch] gh/benjaminglass1/95/orig -> origin/gh/benjaminglass1/95/orig 2025-09-07T07:55:55.7725888Z * [new branch] gh/benjaminglass1/97/base -> origin/gh/benjaminglass1/97/base 2025-09-07T07:55:55.7729561Z * [new branch] gh/benjaminglass1/97/head -> origin/gh/benjaminglass1/97/head 2025-09-07T07:55:55.7730758Z * [new branch] gh/benjaminglass1/97/orig -> origin/gh/benjaminglass1/97/orig 2025-09-07T07:55:55.7731538Z * [new branch] gh/benjaminglass1/99/base -> origin/gh/benjaminglass1/99/base 2025-09-07T07:55:55.7732748Z * [new branch] gh/benjaminglass1/99/head -> origin/gh/benjaminglass1/99/head 2025-09-07T07:55:55.7734676Z * [new branch] gh/benjaminglass1/99/orig -> origin/gh/benjaminglass1/99/orig 2025-09-07T07:55:55.7737462Z * [new branch] gh/bobrenjc93/514/base -> origin/gh/bobrenjc93/514/base 2025-09-07T07:55:55.7738974Z * [new branch] gh/bobrenjc93/514/head -> origin/gh/bobrenjc93/514/head 2025-09-07T07:55:55.7740626Z * [new branch] gh/bobrenjc93/514/orig -> origin/gh/bobrenjc93/514/orig 2025-09-07T07:55:55.7742788Z * [new branch] gh/bobrenjc93/521/base -> origin/gh/bobrenjc93/521/base 2025-09-07T07:55:55.7744818Z * [new branch] gh/bobrenjc93/521/head -> origin/gh/bobrenjc93/521/head 2025-09-07T07:55:55.7746303Z * [new branch] gh/bobrenjc93/521/orig -> origin/gh/bobrenjc93/521/orig 2025-09-07T07:55:55.7748545Z * [new branch] gh/bobrenjc93/522/base -> origin/gh/bobrenjc93/522/base 2025-09-07T07:55:55.7750108Z * [new branch] gh/bobrenjc93/522/head -> origin/gh/bobrenjc93/522/head 2025-09-07T07:55:55.7751647Z * [new branch] gh/bobrenjc93/522/orig -> origin/gh/bobrenjc93/522/orig 2025-09-07T07:55:55.7754029Z * [new branch] gh/bobrenjc93/525/base -> origin/gh/bobrenjc93/525/base 2025-09-07T07:55:55.7755821Z * [new branch] gh/bobrenjc93/525/head -> origin/gh/bobrenjc93/525/head 2025-09-07T07:55:55.7757306Z * [new branch] gh/bobrenjc93/525/orig -> origin/gh/bobrenjc93/525/orig 2025-09-07T07:55:55.7759873Z * [new branch] gh/bobrenjc93/526/base -> origin/gh/bobrenjc93/526/base 2025-09-07T07:55:55.7761373Z * [new branch] gh/bobrenjc93/526/head -> origin/gh/bobrenjc93/526/head 2025-09-07T07:55:55.7762869Z * [new branch] gh/bobrenjc93/526/orig -> origin/gh/bobrenjc93/526/orig 2025-09-07T07:55:55.7765483Z * [new branch] gh/bobrenjc93/527/base -> origin/gh/bobrenjc93/527/base 2025-09-07T07:55:55.7766890Z * [new branch] gh/bobrenjc93/527/head -> origin/gh/bobrenjc93/527/head 2025-09-07T07:55:55.7768489Z * [new branch] gh/bobrenjc93/527/orig -> origin/gh/bobrenjc93/527/orig 2025-09-07T07:55:55.7770671Z * [new branch] gh/bobrenjc93/528/base -> origin/gh/bobrenjc93/528/base 2025-09-07T07:55:55.7772219Z * [new branch] gh/bobrenjc93/528/head -> origin/gh/bobrenjc93/528/head 2025-09-07T07:55:55.7773890Z * [new branch] gh/bobrenjc93/528/orig -> origin/gh/bobrenjc93/528/orig 2025-09-07T07:55:55.7776347Z * [new branch] gh/bobrenjc93/529/base -> origin/gh/bobrenjc93/529/base 2025-09-07T07:55:55.7777928Z * [new branch] gh/bobrenjc93/529/head -> origin/gh/bobrenjc93/529/head 2025-09-07T07:55:55.7779483Z * [new branch] gh/bobrenjc93/529/orig -> origin/gh/bobrenjc93/529/orig 2025-09-07T07:55:55.7781697Z * [new branch] gh/bobrenjc93/535/base -> origin/gh/bobrenjc93/535/base 2025-09-07T07:55:55.7783240Z * [new branch] gh/bobrenjc93/535/head -> origin/gh/bobrenjc93/535/head 2025-09-07T07:55:55.7785154Z * [new branch] gh/bobrenjc93/535/orig -> origin/gh/bobrenjc93/535/orig 2025-09-07T07:55:55.7787366Z * [new branch] gh/bobrenjc93/537/base -> origin/gh/bobrenjc93/537/base 2025-09-07T07:55:55.7788989Z * [new branch] gh/bobrenjc93/537/head -> origin/gh/bobrenjc93/537/head 2025-09-07T07:55:55.7790551Z * [new branch] gh/bobrenjc93/537/orig -> origin/gh/bobrenjc93/537/orig 2025-09-07T07:55:55.7793078Z * [new branch] gh/bobrenjc93/539/base -> origin/gh/bobrenjc93/539/base 2025-09-07T07:55:55.7795120Z * [new branch] gh/bobrenjc93/539/head -> origin/gh/bobrenjc93/539/head 2025-09-07T07:55:55.7796716Z * [new branch] gh/bobrenjc93/539/orig -> origin/gh/bobrenjc93/539/orig 2025-09-07T07:55:55.7799126Z * [new branch] gh/bobrenjc93/540/base -> origin/gh/bobrenjc93/540/base 2025-09-07T07:55:55.7800725Z * [new branch] gh/bobrenjc93/540/head -> origin/gh/bobrenjc93/540/head 2025-09-07T07:55:55.7802262Z * [new branch] gh/bobrenjc93/540/orig -> origin/gh/bobrenjc93/540/orig 2025-09-07T07:55:55.7805017Z * [new branch] gh/bobrenjc93/541/base -> origin/gh/bobrenjc93/541/base 2025-09-07T07:55:55.7806518Z * [new branch] gh/bobrenjc93/541/head -> origin/gh/bobrenjc93/541/head 2025-09-07T07:55:55.7807988Z * [new branch] gh/bobrenjc93/541/orig -> origin/gh/bobrenjc93/541/orig 2025-09-07T07:55:55.7810177Z * [new branch] gh/bobrenjc93/542/base -> origin/gh/bobrenjc93/542/base 2025-09-07T07:55:55.7811749Z * [new branch] gh/bobrenjc93/542/head -> origin/gh/bobrenjc93/542/head 2025-09-07T07:55:55.7813243Z * [new branch] gh/bobrenjc93/542/orig -> origin/gh/bobrenjc93/542/orig 2025-09-07T07:55:55.7815955Z * [new branch] gh/bobrenjc93/543/base -> origin/gh/bobrenjc93/543/base 2025-09-07T07:55:55.7817605Z * [new branch] gh/bobrenjc93/543/head -> origin/gh/bobrenjc93/543/head 2025-09-07T07:55:55.7819126Z * [new branch] gh/bobrenjc93/543/orig -> origin/gh/bobrenjc93/543/orig 2025-09-07T07:55:55.7821294Z * [new branch] gh/bobrenjc93/544/base -> origin/gh/bobrenjc93/544/base 2025-09-07T07:55:55.7823045Z * [new branch] gh/bobrenjc93/544/head -> origin/gh/bobrenjc93/544/head 2025-09-07T07:55:55.7824850Z * [new branch] gh/bobrenjc93/544/orig -> origin/gh/bobrenjc93/544/orig 2025-09-07T07:55:55.7826947Z * [new branch] gh/bobrenjc93/545/base -> origin/gh/bobrenjc93/545/base 2025-09-07T07:55:55.7828726Z * [new branch] gh/bobrenjc93/545/head -> origin/gh/bobrenjc93/545/head 2025-09-07T07:55:55.7830283Z * [new branch] gh/bobrenjc93/545/orig -> origin/gh/bobrenjc93/545/orig 2025-09-07T07:55:55.7832515Z * [new branch] gh/bobrenjc93/546/base -> origin/gh/bobrenjc93/546/base 2025-09-07T07:55:55.7834460Z * [new branch] gh/bobrenjc93/546/head -> origin/gh/bobrenjc93/546/head 2025-09-07T07:55:55.7835985Z * [new branch] gh/bobrenjc93/546/orig -> origin/gh/bobrenjc93/546/orig 2025-09-07T07:55:55.7838856Z * [new branch] gh/bobrenjc93/547/base -> origin/gh/bobrenjc93/547/base 2025-09-07T07:55:55.7840457Z * [new branch] gh/bobrenjc93/547/head -> origin/gh/bobrenjc93/547/head 2025-09-07T07:55:55.7842032Z * [new branch] gh/bobrenjc93/547/orig -> origin/gh/bobrenjc93/547/orig 2025-09-07T07:55:55.7844437Z * [new branch] gh/bobrenjc93/548/base -> origin/gh/bobrenjc93/548/base 2025-09-07T07:55:55.7845942Z * [new branch] gh/bobrenjc93/548/head -> origin/gh/bobrenjc93/548/head 2025-09-07T07:55:55.7847479Z * [new branch] gh/bobrenjc93/548/orig -> origin/gh/bobrenjc93/548/orig 2025-09-07T07:55:55.7849594Z * [new branch] gh/bobrenjc93/549/base -> origin/gh/bobrenjc93/549/base 2025-09-07T07:55:55.7851384Z * [new branch] gh/bobrenjc93/549/head -> origin/gh/bobrenjc93/549/head 2025-09-07T07:55:55.7852950Z * [new branch] gh/bobrenjc93/549/orig -> origin/gh/bobrenjc93/549/orig 2025-09-07T07:55:55.7855935Z * [new branch] gh/bobrenjc93/550/base -> origin/gh/bobrenjc93/550/base 2025-09-07T07:55:55.7857371Z * [new branch] gh/bobrenjc93/550/head -> origin/gh/bobrenjc93/550/head 2025-09-07T07:55:55.7858887Z * [new branch] gh/bobrenjc93/550/orig -> origin/gh/bobrenjc93/550/orig 2025-09-07T07:55:55.7861388Z * [new branch] gh/bobrenjc93/551/base -> origin/gh/bobrenjc93/551/base 2025-09-07T07:55:55.7863348Z * [new branch] gh/bobrenjc93/551/head -> origin/gh/bobrenjc93/551/head 2025-09-07T07:55:55.7865176Z * [new branch] gh/bobrenjc93/551/orig -> origin/gh/bobrenjc93/551/orig 2025-09-07T07:55:55.7867436Z * [new branch] gh/bobrenjc93/552/base -> origin/gh/bobrenjc93/552/base 2025-09-07T07:55:55.7869050Z * [new branch] gh/bobrenjc93/552/head -> origin/gh/bobrenjc93/552/head 2025-09-07T07:55:55.7870575Z * [new branch] gh/bobrenjc93/552/orig -> origin/gh/bobrenjc93/552/orig 2025-09-07T07:55:55.7872783Z * [new branch] gh/bobrenjc93/553/base -> origin/gh/bobrenjc93/553/base 2025-09-07T07:55:55.7874759Z * [new branch] gh/bobrenjc93/553/head -> origin/gh/bobrenjc93/553/head 2025-09-07T07:55:55.7876258Z * [new branch] gh/bobrenjc93/553/orig -> origin/gh/bobrenjc93/553/orig 2025-09-07T07:55:55.7878535Z * [new branch] gh/bobrenjc93/554/base -> origin/gh/bobrenjc93/554/base 2025-09-07T07:55:55.7880070Z * [new branch] gh/bobrenjc93/554/head -> origin/gh/bobrenjc93/554/head 2025-09-07T07:55:55.7881900Z * [new branch] gh/bobrenjc93/554/orig -> origin/gh/bobrenjc93/554/orig 2025-09-07T07:55:55.7884023Z * [new branch] gh/bobrenjc93/555/base -> origin/gh/bobrenjc93/555/base 2025-09-07T07:55:55.7885718Z * [new branch] gh/bobrenjc93/555/head -> origin/gh/bobrenjc93/555/head 2025-09-07T07:55:55.7887239Z * [new branch] gh/bobrenjc93/555/orig -> origin/gh/bobrenjc93/555/orig 2025-09-07T07:55:55.7889704Z * [new branch] gh/bobrenjc93/556/base -> origin/gh/bobrenjc93/556/base 2025-09-07T07:55:55.7891204Z * [new branch] gh/bobrenjc93/556/head -> origin/gh/bobrenjc93/556/head 2025-09-07T07:55:55.7892757Z * [new branch] gh/bobrenjc93/556/orig -> origin/gh/bobrenjc93/556/orig 2025-09-07T07:55:55.7895911Z * [new branch] gh/briancoutinho/2/base -> origin/gh/briancoutinho/2/base 2025-09-07T07:55:55.7897457Z * [new branch] gh/briancoutinho/2/head -> origin/gh/briancoutinho/2/head 2025-09-07T07:55:55.7900152Z * [new branch] gh/c00w/23/base -> origin/gh/c00w/23/base 2025-09-07T07:55:55.7901729Z * [new branch] gh/c00w/23/head -> origin/gh/c00w/23/head 2025-09-07T07:55:55.7903999Z * [new branch] gh/c00w/48/base -> origin/gh/c00w/48/base 2025-09-07T07:55:55.7905918Z * [new branch] gh/c00w/48/head -> origin/gh/c00w/48/head 2025-09-07T07:55:55.7907370Z * [new branch] gh/c00w/48/orig -> origin/gh/c00w/48/orig 2025-09-07T07:55:55.7909829Z * [new branch] gh/c00w/53/base -> origin/gh/c00w/53/base 2025-09-07T07:55:55.7911321Z * [new branch] gh/c00w/53/head -> origin/gh/c00w/53/head 2025-09-07T07:55:55.7913115Z * [new branch] gh/c00w/53/orig -> origin/gh/c00w/53/orig 2025-09-07T07:55:55.7915723Z * [new branch] gh/c00w/54/base -> origin/gh/c00w/54/base 2025-09-07T07:55:55.7917183Z * [new branch] gh/c00w/54/head -> origin/gh/c00w/54/head 2025-09-07T07:55:55.7918759Z * [new branch] gh/c00w/54/orig -> origin/gh/c00w/54/orig 2025-09-07T07:55:55.7920930Z * [new branch] gh/c00w/55/base -> origin/gh/c00w/55/base 2025-09-07T07:55:55.7922665Z * [new branch] gh/c00w/55/head -> origin/gh/c00w/55/head 2025-09-07T07:55:55.7924511Z * [new branch] gh/c00w/55/orig -> origin/gh/c00w/55/orig 2025-09-07T07:55:55.7926727Z * [new branch] gh/c00w/56/base -> origin/gh/c00w/56/base 2025-09-07T07:55:55.7928339Z * [new branch] gh/c00w/56/head -> origin/gh/c00w/56/head 2025-09-07T07:55:55.7929870Z * [new branch] gh/c00w/56/orig -> origin/gh/c00w/56/orig 2025-09-07T07:55:55.7932645Z * [new branch] gh/clee2000/1/base -> origin/gh/clee2000/1/base 2025-09-07T07:55:55.7934597Z * [new branch] gh/clee2000/1/head -> origin/gh/clee2000/1/head 2025-09-07T07:55:55.7936166Z * [new branch] gh/clee2000/1/orig -> origin/gh/clee2000/1/orig 2025-09-07T07:55:55.7939017Z * [new branch] gh/coconutruben/1/base -> origin/gh/coconutruben/1/base 2025-09-07T07:55:55.7940644Z * [new branch] gh/coconutruben/1/head -> origin/gh/coconutruben/1/head 2025-09-07T07:55:55.7942971Z * [new branch] gh/coconutruben/11/base -> origin/gh/coconutruben/11/base 2025-09-07T07:55:55.7945005Z * [new branch] gh/coconutruben/11/head -> origin/gh/coconutruben/11/head 2025-09-07T07:55:55.7946631Z * [new branch] gh/coconutruben/11/orig -> origin/gh/coconutruben/11/orig 2025-09-07T07:55:55.7949385Z * [new branch] gh/coconutruben/12/base -> origin/gh/coconutruben/12/base 2025-09-07T07:55:55.7951232Z * [new branch] gh/coconutruben/12/head -> origin/gh/coconutruben/12/head 2025-09-07T07:55:55.7953048Z * [new branch] gh/coconutruben/12/orig -> origin/gh/coconutruben/12/orig 2025-09-07T07:55:55.7955747Z * [new branch] gh/coconutruben/13/base -> origin/gh/coconutruben/13/base 2025-09-07T07:55:55.7957373Z * [new branch] gh/coconutruben/13/head -> origin/gh/coconutruben/13/head 2025-09-07T07:55:55.7959211Z * [new branch] gh/coconutruben/13/orig -> origin/gh/coconutruben/13/orig 2025-09-07T07:55:55.7961357Z * [new branch] gh/coconutruben/14/base -> origin/gh/coconutruben/14/base 2025-09-07T07:55:55.7962950Z * [new branch] gh/coconutruben/14/head -> origin/gh/coconutruben/14/head 2025-09-07T07:55:55.7964919Z * [new branch] gh/coconutruben/14/orig -> origin/gh/coconutruben/14/orig 2025-09-07T07:55:55.7967441Z * [new branch] gh/coconutruben/15/base -> origin/gh/coconutruben/15/base 2025-09-07T07:55:55.7969108Z * [new branch] gh/coconutruben/15/head -> origin/gh/coconutruben/15/head 2025-09-07T07:55:55.7970759Z * [new branch] gh/coconutruben/15/orig -> origin/gh/coconutruben/15/orig 2025-09-07T07:55:55.7973007Z * [new branch] gh/coconutruben/16/base -> origin/gh/coconutruben/16/base 2025-09-07T07:55:55.7974955Z * [new branch] gh/coconutruben/16/head -> origin/gh/coconutruben/16/head 2025-09-07T07:55:55.7976486Z * [new branch] gh/coconutruben/16/orig -> origin/gh/coconutruben/16/orig 2025-09-07T07:55:55.7978923Z * [new branch] gh/coconutruben/17/base -> origin/gh/coconutruben/17/base 2025-09-07T07:55:55.7980687Z * [new branch] gh/coconutruben/17/head -> origin/gh/coconutruben/17/head 2025-09-07T07:55:55.7982342Z * [new branch] gh/coconutruben/17/orig -> origin/gh/coconutruben/17/orig 2025-09-07T07:55:55.7985299Z * [new branch] gh/coconutruben/18/base -> origin/gh/coconutruben/18/base 2025-09-07T07:55:55.7986847Z * [new branch] gh/coconutruben/18/head -> origin/gh/coconutruben/18/head 2025-09-07T07:55:55.7988389Z * [new branch] gh/coconutruben/18/orig -> origin/gh/coconutruben/18/orig 2025-09-07T07:55:55.7990914Z * [new branch] gh/coconutruben/19/base -> origin/gh/coconutruben/19/base 2025-09-07T07:55:55.7992577Z * [new branch] gh/coconutruben/19/head -> origin/gh/coconutruben/19/head 2025-09-07T07:55:55.7994415Z * [new branch] gh/coconutruben/19/orig -> origin/gh/coconutruben/19/orig 2025-09-07T07:55:55.7996887Z * [new branch] gh/coconutruben/20/base -> origin/gh/coconutruben/20/base 2025-09-07T07:55:55.7998694Z * [new branch] gh/coconutruben/20/head -> origin/gh/coconutruben/20/head 2025-09-07T07:55:55.8000312Z * [new branch] gh/coconutruben/20/orig -> origin/gh/coconutruben/20/orig 2025-09-07T07:55:55.8002614Z * [new branch] gh/coconutruben/21/base -> origin/gh/coconutruben/21/base 2025-09-07T07:55:55.8004351Z * [new branch] gh/coconutruben/21/head -> origin/gh/coconutruben/21/head 2025-09-07T07:55:55.8006140Z * [new branch] gh/coconutruben/21/orig -> origin/gh/coconutruben/21/orig 2025-09-07T07:55:55.8008501Z * [new branch] gh/coconutruben/22/base -> origin/gh/coconutruben/22/base 2025-09-07T07:55:55.8010115Z * [new branch] gh/coconutruben/22/head -> origin/gh/coconutruben/22/head 2025-09-07T07:55:55.8011995Z * [new branch] gh/coconutruben/22/orig -> origin/gh/coconutruben/22/orig 2025-09-07T07:55:55.8014738Z * [new branch] gh/coconutruben/24/base -> origin/gh/coconutruben/24/base 2025-09-07T07:55:55.8016327Z * [new branch] gh/coconutruben/24/head -> origin/gh/coconutruben/24/head 2025-09-07T07:55:55.8017914Z * [new branch] gh/coconutruben/24/orig -> origin/gh/coconutruben/24/orig 2025-09-07T07:55:55.8020575Z * [new branch] gh/coconutruben/25/base -> origin/gh/coconutruben/25/base 2025-09-07T07:55:55.8022479Z * [new branch] gh/coconutruben/25/head -> origin/gh/coconutruben/25/head 2025-09-07T07:55:55.8024738Z * [new branch] gh/coconutruben/25/orig -> origin/gh/coconutruben/25/orig 2025-09-07T07:55:55.8027414Z * [new branch] gh/coconutruben/28/base -> origin/gh/coconutruben/28/base 2025-09-07T07:55:55.8028808Z * [new branch] gh/coconutruben/28/head -> origin/gh/coconutruben/28/head 2025-09-07T07:55:55.8030355Z * [new branch] gh/coconutruben/28/orig -> origin/gh/coconutruben/28/orig 2025-09-07T07:55:55.8032801Z * [new branch] gh/coconutruben/29/base -> origin/gh/coconutruben/29/base 2025-09-07T07:55:55.8034759Z * [new branch] gh/coconutruben/29/head -> origin/gh/coconutruben/29/head 2025-09-07T07:55:55.8036318Z * [new branch] gh/coconutruben/29/orig -> origin/gh/coconutruben/29/orig 2025-09-07T07:55:55.8038947Z * [new branch] gh/coconutruben/30/base -> origin/gh/coconutruben/30/base 2025-09-07T07:55:55.8040646Z * [new branch] gh/coconutruben/30/head -> origin/gh/coconutruben/30/head 2025-09-07T07:55:55.8042179Z * [new branch] gh/coconutruben/30/orig -> origin/gh/coconutruben/30/orig 2025-09-07T07:55:55.8045094Z * [new branch] gh/coconutruben/31/base -> origin/gh/coconutruben/31/base 2025-09-07T07:55:55.8046670Z * [new branch] gh/coconutruben/31/head -> origin/gh/coconutruben/31/head 2025-09-07T07:55:55.8048331Z * [new branch] gh/coconutruben/31/orig -> origin/gh/coconutruben/31/orig 2025-09-07T07:55:55.8050868Z * [new branch] gh/coconutruben/32/base -> origin/gh/coconutruben/32/base 2025-09-07T07:55:55.8052522Z * [new branch] gh/coconutruben/32/head -> origin/gh/coconutruben/32/head 2025-09-07T07:55:55.8054311Z * [new branch] gh/coconutruben/32/orig -> origin/gh/coconutruben/32/orig 2025-09-07T07:55:55.8056918Z * [new branch] gh/coconutruben/33/base -> origin/gh/coconutruben/33/base 2025-09-07T07:55:55.8058557Z * [new branch] gh/coconutruben/33/head -> origin/gh/coconutruben/33/head 2025-09-07T07:55:55.8060143Z * [new branch] gh/coconutruben/33/orig -> origin/gh/coconutruben/33/orig 2025-09-07T07:55:55.8062296Z * [new branch] gh/coconutruben/34/base -> origin/gh/coconutruben/34/base 2025-09-07T07:55:55.8064021Z * [new branch] gh/coconutruben/34/head -> origin/gh/coconutruben/34/head 2025-09-07T07:55:55.8065844Z * [new branch] gh/coconutruben/34/orig -> origin/gh/coconutruben/34/orig 2025-09-07T07:55:55.8068083Z * [new branch] gh/coconutruben/35/base -> origin/gh/coconutruben/35/base 2025-09-07T07:55:55.8069714Z * [new branch] gh/coconutruben/35/head -> origin/gh/coconutruben/35/head 2025-09-07T07:55:55.8071282Z * [new branch] gh/coconutruben/35/orig -> origin/gh/coconutruben/35/orig 2025-09-07T07:55:55.8075263Z * [new branch] gh/coconutruben/36/base -> origin/gh/coconutruben/36/base 2025-09-07T07:55:55.8077252Z * [new branch] gh/coconutruben/36/head -> origin/gh/coconutruben/36/head 2025-09-07T07:55:55.8079621Z * [new branch] gh/coconutruben/36/orig -> origin/gh/coconutruben/36/orig 2025-09-07T07:55:55.8082357Z * [new branch] gh/coconutruben/37/base -> origin/gh/coconutruben/37/base 2025-09-07T07:55:55.8083997Z * [new branch] gh/coconutruben/37/head -> origin/gh/coconutruben/37/head 2025-09-07T07:55:55.8085824Z * [new branch] gh/coconutruben/37/orig -> origin/gh/coconutruben/37/orig 2025-09-07T07:55:55.8088269Z * [new branch] gh/coconutruben/38/base -> origin/gh/coconutruben/38/base 2025-09-07T07:55:55.8089963Z * [new branch] gh/coconutruben/38/head -> origin/gh/coconutruben/38/head 2025-09-07T07:55:55.8091534Z * [new branch] gh/coconutruben/38/orig -> origin/gh/coconutruben/38/orig 2025-09-07T07:55:55.8094152Z * [new branch] gh/coconutruben/39/base -> origin/gh/coconutruben/39/base 2025-09-07T07:55:55.8096013Z * [new branch] gh/coconutruben/39/head -> origin/gh/coconutruben/39/head 2025-09-07T07:55:55.8097447Z * [new branch] gh/coconutruben/39/orig -> origin/gh/coconutruben/39/orig 2025-09-07T07:55:55.8099931Z * [new branch] gh/coconutruben/40/base -> origin/gh/coconutruben/40/base 2025-09-07T07:55:55.8101506Z * [new branch] gh/coconutruben/40/head -> origin/gh/coconutruben/40/head 2025-09-07T07:55:55.8103016Z * [new branch] gh/coconutruben/40/orig -> origin/gh/coconutruben/40/orig 2025-09-07T07:55:55.8105935Z * [new branch] gh/coconutruben/41/base -> origin/gh/coconutruben/41/base 2025-09-07T07:55:55.8107624Z * [new branch] gh/coconutruben/41/head -> origin/gh/coconutruben/41/head 2025-09-07T07:55:55.8109153Z * [new branch] gh/coconutruben/41/orig -> origin/gh/coconutruben/41/orig 2025-09-07T07:55:55.8111609Z * [new branch] gh/coconutruben/42/base -> origin/gh/coconutruben/42/base 2025-09-07T07:55:55.8113460Z * [new branch] gh/coconutruben/42/head -> origin/gh/coconutruben/42/head 2025-09-07T07:55:55.8115538Z * [new branch] gh/coconutruben/42/orig -> origin/gh/coconutruben/42/orig 2025-09-07T07:55:55.8118115Z * [new branch] gh/coconutruben/43/base -> origin/gh/coconutruben/43/base 2025-09-07T07:55:55.8119736Z * [new branch] gh/coconutruben/43/head -> origin/gh/coconutruben/43/head 2025-09-07T07:55:55.8121340Z * [new branch] gh/coconutruben/43/orig -> origin/gh/coconutruben/43/orig 2025-09-07T07:55:55.8124017Z * [new branch] gh/coconutruben/44/base -> origin/gh/coconutruben/44/base 2025-09-07T07:55:55.8125878Z * [new branch] gh/coconutruben/44/head -> origin/gh/coconutruben/44/head 2025-09-07T07:55:55.8127401Z * [new branch] gh/coconutruben/44/orig -> origin/gh/coconutruben/44/orig 2025-09-07T07:55:55.8129939Z * [new branch] gh/coconutruben/45/base -> origin/gh/coconutruben/45/base 2025-09-07T07:55:55.8131631Z * [new branch] gh/coconutruben/45/head -> origin/gh/coconutruben/45/head 2025-09-07T07:55:55.8133269Z * [new branch] gh/coconutruben/45/orig -> origin/gh/coconutruben/45/orig 2025-09-07T07:55:55.8136005Z * [new branch] gh/coconutruben/46/base -> origin/gh/coconutruben/46/base 2025-09-07T07:55:55.8137496Z * [new branch] gh/coconutruben/46/head -> origin/gh/coconutruben/46/head 2025-09-07T07:55:55.8139121Z * [new branch] gh/coconutruben/46/orig -> origin/gh/coconutruben/46/orig 2025-09-07T07:55:55.8141610Z * [new branch] gh/coconutruben/47/base -> origin/gh/coconutruben/47/base 2025-09-07T07:55:55.8143202Z * [new branch] gh/coconutruben/47/head -> origin/gh/coconutruben/47/head 2025-09-07T07:55:55.8145280Z * [new branch] gh/coconutruben/47/orig -> origin/gh/coconutruben/47/orig 2025-09-07T07:55:55.8147833Z * [new branch] gh/coconutruben/48/base -> origin/gh/coconutruben/48/base 2025-09-07T07:55:55.8149445Z * [new branch] gh/coconutruben/48/head -> origin/gh/coconutruben/48/head 2025-09-07T07:55:55.8151025Z * [new branch] gh/coconutruben/48/orig -> origin/gh/coconutruben/48/orig 2025-09-07T07:55:55.8153649Z * [new branch] gh/coconutruben/49/base -> origin/gh/coconutruben/49/base 2025-09-07T07:55:55.8155656Z * [new branch] gh/coconutruben/49/head -> origin/gh/coconutruben/49/head 2025-09-07T07:55:55.8157253Z * [new branch] gh/coconutruben/49/orig -> origin/gh/coconutruben/49/orig 2025-09-07T07:55:55.8159774Z * [new branch] gh/coconutruben/50/base -> origin/gh/coconutruben/50/base 2025-09-07T07:55:55.8161416Z * [new branch] gh/coconutruben/50/head -> origin/gh/coconutruben/50/head 2025-09-07T07:55:55.8163215Z * [new branch] gh/coconutruben/50/orig -> origin/gh/coconutruben/50/orig 2025-09-07T07:55:55.8165808Z * [new branch] gh/coconutruben/51/base -> origin/gh/coconutruben/51/base 2025-09-07T07:55:55.8167335Z * [new branch] gh/coconutruben/51/head -> origin/gh/coconutruben/51/head 2025-09-07T07:55:55.8169037Z * [new branch] gh/coconutruben/51/orig -> origin/gh/coconutruben/51/orig 2025-09-07T07:55:55.8171522Z * [new branch] gh/coconutruben/52/base -> origin/gh/coconutruben/52/base 2025-09-07T07:55:55.8173187Z * [new branch] gh/coconutruben/52/head -> origin/gh/coconutruben/52/head 2025-09-07T07:55:55.8175254Z * [new branch] gh/coconutruben/52/orig -> origin/gh/coconutruben/52/orig 2025-09-07T07:55:55.8177622Z * [new branch] gh/coconutruben/53/base -> origin/gh/coconutruben/53/base 2025-09-07T07:55:55.8179144Z * [new branch] gh/coconutruben/53/head -> origin/gh/coconutruben/53/head 2025-09-07T07:55:55.8180748Z * [new branch] gh/coconutruben/53/orig -> origin/gh/coconutruben/53/orig 2025-09-07T07:55:55.8183126Z * [new branch] gh/coconutruben/54/base -> origin/gh/coconutruben/54/base 2025-09-07T07:55:55.8185146Z * [new branch] gh/coconutruben/54/head -> origin/gh/coconutruben/54/head 2025-09-07T07:55:55.8186715Z * [new branch] gh/coconutruben/54/orig -> origin/gh/coconutruben/54/orig 2025-09-07T07:55:55.8189294Z * [new branch] gh/coconutruben/55/base -> origin/gh/coconutruben/55/base 2025-09-07T07:55:55.8190873Z * [new branch] gh/coconutruben/55/head -> origin/gh/coconutruben/55/head 2025-09-07T07:55:55.8192474Z * [new branch] gh/coconutruben/55/orig -> origin/gh/coconutruben/55/orig 2025-09-07T07:55:55.8195357Z * [new branch] gh/coconutruben/56/base -> origin/gh/coconutruben/56/base 2025-09-07T07:55:55.8196952Z * [new branch] gh/coconutruben/56/head -> origin/gh/coconutruben/56/head 2025-09-07T07:55:55.8198674Z * [new branch] gh/coconutruben/56/orig -> origin/gh/coconutruben/56/orig 2025-09-07T07:55:55.8201174Z * [new branch] gh/coconutruben/57/base -> origin/gh/coconutruben/57/base 2025-09-07T07:55:55.8202829Z * [new branch] gh/coconutruben/57/head -> origin/gh/coconutruben/57/head 2025-09-07T07:55:55.8204826Z * [new branch] gh/coconutruben/57/orig -> origin/gh/coconutruben/57/orig 2025-09-07T07:55:55.8207397Z * [new branch] gh/coconutruben/58/base -> origin/gh/coconutruben/58/base 2025-09-07T07:55:55.8209185Z * [new branch] gh/coconutruben/58/head -> origin/gh/coconutruben/58/head 2025-09-07T07:55:55.8210781Z * [new branch] gh/coconutruben/58/orig -> origin/gh/coconutruben/58/orig 2025-09-07T07:55:55.8213326Z * [new branch] gh/coconutruben/59/base -> origin/gh/coconutruben/59/base 2025-09-07T07:55:55.8215253Z * [new branch] gh/coconutruben/59/head -> origin/gh/coconutruben/59/head 2025-09-07T07:55:55.8216751Z * [new branch] gh/coconutruben/59/orig -> origin/gh/coconutruben/59/orig 2025-09-07T07:55:55.8219203Z * [new branch] gh/coconutruben/60/base -> origin/gh/coconutruben/60/base 2025-09-07T07:55:55.8220885Z * [new branch] gh/coconutruben/60/head -> origin/gh/coconutruben/60/head 2025-09-07T07:55:55.8222525Z * [new branch] gh/coconutruben/60/orig -> origin/gh/coconutruben/60/orig 2025-09-07T07:55:55.8225395Z * [new branch] gh/coconutruben/61/base -> origin/gh/coconutruben/61/base 2025-09-07T07:55:55.8227067Z * [new branch] gh/coconutruben/61/head -> origin/gh/coconutruben/61/head 2025-09-07T07:55:55.8228704Z * [new branch] gh/coconutruben/61/orig -> origin/gh/coconutruben/61/orig 2025-09-07T07:55:55.8231185Z * [new branch] gh/coconutruben/62/base -> origin/gh/coconutruben/62/base 2025-09-07T07:55:55.8232941Z * [new branch] gh/coconutruben/62/head -> origin/gh/coconutruben/62/head 2025-09-07T07:55:55.8234886Z * [new branch] gh/coconutruben/62/orig -> origin/gh/coconutruben/62/orig 2025-09-07T07:55:55.8237356Z * [new branch] gh/coconutruben/63/base -> origin/gh/coconutruben/63/base 2025-09-07T07:55:55.8239256Z * [new branch] gh/coconutruben/63/head -> origin/gh/coconutruben/63/head 2025-09-07T07:55:55.8240638Z * [new branch] gh/coconutruben/63/orig -> origin/gh/coconutruben/63/orig 2025-09-07T07:55:55.8242996Z * [new branch] gh/coconutruben/64/base -> origin/gh/coconutruben/64/base 2025-09-07T07:55:55.8245029Z * [new branch] gh/coconutruben/64/head -> origin/gh/coconutruben/64/head 2025-09-07T07:55:55.8246669Z * [new branch] gh/coconutruben/64/orig -> origin/gh/coconutruben/64/orig 2025-09-07T07:55:55.8249017Z * [new branch] gh/coconutruben/65/base -> origin/gh/coconutruben/65/base 2025-09-07T07:55:55.8250710Z * [new branch] gh/coconutruben/65/head -> origin/gh/coconutruben/65/head 2025-09-07T07:55:55.8252219Z * [new branch] gh/coconutruben/65/orig -> origin/gh/coconutruben/65/orig 2025-09-07T07:55:55.8255091Z * [new branch] gh/coconutruben/66/base -> origin/gh/coconutruben/66/base 2025-09-07T07:55:55.8256524Z * [new branch] gh/coconutruben/66/head -> origin/gh/coconutruben/66/head 2025-09-07T07:55:55.8258020Z * [new branch] gh/coconutruben/66/orig -> origin/gh/coconutruben/66/orig 2025-09-07T07:55:55.8261337Z * [new branch] gh/codingwithsurya/12/base -> origin/gh/codingwithsurya/12/base 2025-09-07T07:55:55.8263081Z * [new branch] gh/codingwithsurya/12/head -> origin/gh/codingwithsurya/12/head 2025-09-07T07:55:55.8265245Z * [new branch] gh/codingwithsurya/12/orig -> origin/gh/codingwithsurya/12/orig 2025-09-07T07:55:55.8267432Z * [new branch] gh/codingwithsurya/14/base -> origin/gh/codingwithsurya/14/base 2025-09-07T07:55:55.8268959Z * [new branch] gh/codingwithsurya/14/head -> origin/gh/codingwithsurya/14/head 2025-09-07T07:55:55.8270551Z * [new branch] gh/codingwithsurya/14/orig -> origin/gh/codingwithsurya/14/orig 2025-09-07T07:55:55.8272998Z * [new branch] gh/codingwithsurya/15/base -> origin/gh/codingwithsurya/15/base 2025-09-07T07:55:55.8275019Z * [new branch] gh/codingwithsurya/15/head -> origin/gh/codingwithsurya/15/head 2025-09-07T07:55:55.8276539Z * [new branch] gh/codingwithsurya/15/orig -> origin/gh/codingwithsurya/15/orig 2025-09-07T07:55:55.8279112Z * [new branch] gh/codingwithsurya/16/base -> origin/gh/codingwithsurya/16/base 2025-09-07T07:55:55.8280753Z * [new branch] gh/codingwithsurya/16/head -> origin/gh/codingwithsurya/16/head 2025-09-07T07:55:55.8282358Z * [new branch] gh/codingwithsurya/16/orig -> origin/gh/codingwithsurya/16/orig 2025-09-07T07:55:55.8285296Z * [new branch] gh/codingwithsurya/17/base -> origin/gh/codingwithsurya/17/base 2025-09-07T07:55:55.8286856Z * [new branch] gh/codingwithsurya/17/head -> origin/gh/codingwithsurya/17/head 2025-09-07T07:55:55.8288416Z * [new branch] gh/codingwithsurya/17/orig -> origin/gh/codingwithsurya/17/orig 2025-09-07T07:55:55.8290757Z * [new branch] gh/codingwithsurya/18/base -> origin/gh/codingwithsurya/18/base 2025-09-07T07:55:55.8292353Z * [new branch] gh/codingwithsurya/18/head -> origin/gh/codingwithsurya/18/head 2025-09-07T07:55:55.8294061Z * [new branch] gh/codingwithsurya/18/orig -> origin/gh/codingwithsurya/18/orig 2025-09-07T07:55:55.8296784Z * [new branch] gh/codingwithsurya/19/base -> origin/gh/codingwithsurya/19/base 2025-09-07T07:55:55.8298490Z * [new branch] gh/codingwithsurya/19/head -> origin/gh/codingwithsurya/19/head 2025-09-07T07:55:55.8299844Z * [new branch] gh/codingwithsurya/19/orig -> origin/gh/codingwithsurya/19/orig 2025-09-07T07:55:55.8302101Z * [new branch] gh/codingwithsurya/20/base -> origin/gh/codingwithsurya/20/base 2025-09-07T07:55:55.8303852Z * [new branch] gh/codingwithsurya/20/head -> origin/gh/codingwithsurya/20/head 2025-09-07T07:55:55.8305751Z * [new branch] gh/codingwithsurya/20/orig -> origin/gh/codingwithsurya/20/orig 2025-09-07T07:55:55.8308215Z * [new branch] gh/codingwithsurya/21/base -> origin/gh/codingwithsurya/21/base 2025-09-07T07:55:55.8309823Z * [new branch] gh/codingwithsurya/21/head -> origin/gh/codingwithsurya/21/head 2025-09-07T07:55:55.8311594Z * [new branch] gh/codingwithsurya/21/orig -> origin/gh/codingwithsurya/21/orig 2025-09-07T07:55:55.8314705Z * [new branch] gh/colinchan15/1/base -> origin/gh/colinchan15/1/base 2025-09-07T07:55:55.8316253Z * [new branch] gh/colinchan15/1/head -> origin/gh/colinchan15/1/head 2025-09-07T07:55:55.8318467Z * [new branch] gh/colinchan15/2/base -> origin/gh/colinchan15/2/base 2025-09-07T07:55:55.8319956Z * [new branch] gh/colinchan15/2/head -> origin/gh/colinchan15/2/head 2025-09-07T07:55:55.8322103Z * [new branch] gh/colinchan15/3/base -> origin/gh/colinchan15/3/base 2025-09-07T07:55:55.8323565Z * [new branch] gh/colinchan15/3/head -> origin/gh/colinchan15/3/head 2025-09-07T07:55:55.8326450Z * [new branch] gh/colinchan15/6/base -> origin/gh/colinchan15/6/base 2025-09-07T07:55:55.8327653Z * [new branch] gh/colinchan15/6/head -> origin/gh/colinchan15/6/head 2025-09-07T07:55:55.8330479Z * [new branch] gh/davidberard98/382/base -> origin/gh/davidberard98/382/base 2025-09-07T07:55:55.8332202Z * [new branch] gh/davidberard98/382/head -> origin/gh/davidberard98/382/head 2025-09-07T07:55:55.8333838Z * [new branch] gh/davidberard98/382/orig -> origin/gh/davidberard98/382/orig 2025-09-07T07:55:55.8336287Z * [new branch] gh/davidberard98/386/base -> origin/gh/davidberard98/386/base 2025-09-07T07:55:55.8337826Z * [new branch] gh/davidberard98/386/head -> origin/gh/davidberard98/386/head 2025-09-07T07:55:55.8339379Z * [new branch] gh/davidberard98/386/orig -> origin/gh/davidberard98/386/orig 2025-09-07T07:55:55.8341827Z * [new branch] gh/davidberard98/391/base -> origin/gh/davidberard98/391/base 2025-09-07T07:55:55.8343303Z * [new branch] gh/davidberard98/391/head -> origin/gh/davidberard98/391/head 2025-09-07T07:55:55.8345345Z * [new branch] gh/davidberard98/391/orig -> origin/gh/davidberard98/391/orig 2025-09-07T07:55:55.8347515Z * [new branch] gh/davidberard98/392/base -> origin/gh/davidberard98/392/base 2025-09-07T07:55:55.8349088Z * [new branch] gh/davidberard98/392/head -> origin/gh/davidberard98/392/head 2025-09-07T07:55:55.8350650Z * [new branch] gh/davidberard98/392/orig -> origin/gh/davidberard98/392/orig 2025-09-07T07:55:55.8352992Z * [new branch] gh/davidberard98/394/base -> origin/gh/davidberard98/394/base 2025-09-07T07:55:55.8355022Z * [new branch] gh/davidberard98/394/head -> origin/gh/davidberard98/394/head 2025-09-07T07:55:55.8356523Z * [new branch] gh/davidberard98/394/orig -> origin/gh/davidberard98/394/orig 2025-09-07T07:55:55.8358921Z * [new branch] gh/davidberard98/396/base -> origin/gh/davidberard98/396/base 2025-09-07T07:55:55.8360494Z * [new branch] gh/davidberard98/396/head -> origin/gh/davidberard98/396/head 2025-09-07T07:55:55.8362109Z * [new branch] gh/davidberard98/396/orig -> origin/gh/davidberard98/396/orig 2025-09-07T07:55:55.8365217Z * [new branch] gh/davidberard98/397/base -> origin/gh/davidberard98/397/base 2025-09-07T07:55:55.8366642Z * [new branch] gh/davidberard98/397/head -> origin/gh/davidberard98/397/head 2025-09-07T07:55:55.8368181Z * [new branch] gh/davidberard98/397/orig -> origin/gh/davidberard98/397/orig 2025-09-07T07:55:55.8370434Z * [new branch] gh/davidberard98/398/base -> origin/gh/davidberard98/398/base 2025-09-07T07:55:55.8372001Z * [new branch] gh/davidberard98/398/head -> origin/gh/davidberard98/398/head 2025-09-07T07:55:55.8373507Z * [new branch] gh/davidberard98/398/orig -> origin/gh/davidberard98/398/orig 2025-09-07T07:55:55.8376208Z * [new branch] gh/davidberard98/399/base -> origin/gh/davidberard98/399/base 2025-09-07T07:55:55.8377818Z * [new branch] gh/davidberard98/399/head -> origin/gh/davidberard98/399/head 2025-09-07T07:55:55.8379359Z * [new branch] gh/davidberard98/399/orig -> origin/gh/davidberard98/399/orig 2025-09-07T07:55:55.8381733Z * [new branch] gh/davidberard98/400/base -> origin/gh/davidberard98/400/base 2025-09-07T07:55:55.8383443Z * [new branch] gh/davidberard98/400/head -> origin/gh/davidberard98/400/head 2025-09-07T07:55:55.8385400Z * [new branch] gh/davidberard98/400/orig -> origin/gh/davidberard98/400/orig 2025-09-07T07:55:55.8387600Z * [new branch] gh/davidberard98/401/base -> origin/gh/davidberard98/401/base 2025-09-07T07:55:55.8389091Z * [new branch] gh/davidberard98/401/head -> origin/gh/davidberard98/401/head 2025-09-07T07:55:55.8390642Z * [new branch] gh/davidberard98/401/orig -> origin/gh/davidberard98/401/orig 2025-09-07T07:55:55.8392908Z * [new branch] gh/davidberard98/402/base -> origin/gh/davidberard98/402/base 2025-09-07T07:55:55.8394891Z * [new branch] gh/davidberard98/402/head -> origin/gh/davidberard98/402/head 2025-09-07T07:55:55.8396368Z * [new branch] gh/davidberard98/402/orig -> origin/gh/davidberard98/402/orig 2025-09-07T07:55:55.8398821Z * [new branch] gh/davidberard98/403/base -> origin/gh/davidberard98/403/base 2025-09-07T07:55:55.8400420Z * [new branch] gh/davidberard98/403/head -> origin/gh/davidberard98/403/head 2025-09-07T07:55:55.8401934Z * [new branch] gh/davidberard98/403/orig -> origin/gh/davidberard98/403/orig 2025-09-07T07:55:55.8404629Z * [new branch] gh/davidberard98/404/base -> origin/gh/davidberard98/404/base 2025-09-07T07:55:55.8406141Z * [new branch] gh/davidberard98/404/head -> origin/gh/davidberard98/404/head 2025-09-07T07:55:55.8407970Z * [new branch] gh/davidberard98/404/orig -> origin/gh/davidberard98/404/orig 2025-09-07T07:55:55.8410288Z * [new branch] gh/davidberard98/405/base -> origin/gh/davidberard98/405/base 2025-09-07T07:55:55.8411823Z * [new branch] gh/davidberard98/405/head -> origin/gh/davidberard98/405/head 2025-09-07T07:55:55.8413392Z * [new branch] gh/davidberard98/405/orig -> origin/gh/davidberard98/405/orig 2025-09-07T07:55:55.8416094Z * [new branch] gh/davidberard98/406/base -> origin/gh/davidberard98/406/base 2025-09-07T07:55:55.8417776Z * [new branch] gh/davidberard98/406/head -> origin/gh/davidberard98/406/head 2025-09-07T07:55:55.8419482Z * [new branch] gh/davidberard98/406/orig -> origin/gh/davidberard98/406/orig 2025-09-07T07:55:55.8421873Z * [new branch] gh/davidberard98/407/base -> origin/gh/davidberard98/407/base 2025-09-07T07:55:55.8423371Z * [new branch] gh/davidberard98/407/head -> origin/gh/davidberard98/407/head 2025-09-07T07:55:55.8425334Z * [new branch] gh/davidberard98/407/orig -> origin/gh/davidberard98/407/orig 2025-09-07T07:55:55.8427510Z * [new branch] gh/davidberard98/408/base -> origin/gh/davidberard98/408/base 2025-09-07T07:55:55.8429289Z * [new branch] gh/davidberard98/408/head -> origin/gh/davidberard98/408/head 2025-09-07T07:55:55.8430659Z * [new branch] gh/davidberard98/408/orig -> origin/gh/davidberard98/408/orig 2025-09-07T07:55:55.8432757Z * [new branch] gh/davidberard98/409/base -> origin/gh/davidberard98/409/base 2025-09-07T07:55:55.8434853Z * [new branch] gh/davidberard98/409/head -> origin/gh/davidberard98/409/head 2025-09-07T07:55:55.8436427Z * [new branch] gh/davidberard98/409/orig -> origin/gh/davidberard98/409/orig 2025-09-07T07:55:55.8439396Z * [new branch] gh/desertfire/594/base -> origin/gh/desertfire/594/base 2025-09-07T07:55:55.8440923Z * [new branch] gh/desertfire/594/head -> origin/gh/desertfire/594/head 2025-09-07T07:55:55.8442464Z * [new branch] gh/desertfire/594/orig -> origin/gh/desertfire/594/orig 2025-09-07T07:55:55.8445119Z * [new branch] gh/desertfire/595/base -> origin/gh/desertfire/595/base 2025-09-07T07:55:55.8446512Z * [new branch] gh/desertfire/595/head -> origin/gh/desertfire/595/head 2025-09-07T07:55:55.8448106Z * [new branch] gh/desertfire/595/orig -> origin/gh/desertfire/595/orig 2025-09-07T07:55:55.8450381Z * [new branch] gh/desertfire/597/base -> origin/gh/desertfire/597/base 2025-09-07T07:55:55.8452001Z * [new branch] gh/desertfire/597/head -> origin/gh/desertfire/597/head 2025-09-07T07:55:55.8453515Z * [new branch] gh/desertfire/597/orig -> origin/gh/desertfire/597/orig 2025-09-07T07:55:55.8456768Z * [new branch] gh/dharakk/1/base -> origin/gh/dharakk/1/base 2025-09-07T07:55:55.8458281Z * [new branch] gh/dharakk/1/head -> origin/gh/dharakk/1/head 2025-09-07T07:55:55.8461081Z * [new branch] gh/drisspg/149/base -> origin/gh/drisspg/149/base 2025-09-07T07:55:55.8462631Z * [new branch] gh/drisspg/149/head -> origin/gh/drisspg/149/head 2025-09-07T07:55:55.8464366Z * [new branch] gh/drisspg/149/orig -> origin/gh/drisspg/149/orig 2025-09-07T07:55:55.8466777Z * [new branch] gh/drisspg/159/base -> origin/gh/drisspg/159/base 2025-09-07T07:55:55.8468245Z * [new branch] gh/drisspg/159/head -> origin/gh/drisspg/159/head 2025-09-07T07:55:55.8469814Z * [new branch] gh/drisspg/159/orig -> origin/gh/drisspg/159/orig 2025-09-07T07:55:55.8472094Z * [new branch] gh/drisspg/166/base -> origin/gh/drisspg/166/base 2025-09-07T07:55:55.8473639Z * [new branch] gh/drisspg/166/head -> origin/gh/drisspg/166/head 2025-09-07T07:55:55.8475560Z * [new branch] gh/drisspg/166/orig -> origin/gh/drisspg/166/orig 2025-09-07T07:55:55.8477853Z * [new branch] gh/drisspg/170/base -> origin/gh/drisspg/170/base 2025-09-07T07:55:55.8479351Z * [new branch] gh/drisspg/170/head -> origin/gh/drisspg/170/head 2025-09-07T07:55:55.8480861Z * [new branch] gh/drisspg/170/orig -> origin/gh/drisspg/170/orig 2025-09-07T07:55:55.8483115Z * [new branch] gh/drisspg/173/base -> origin/gh/drisspg/173/base 2025-09-07T07:55:55.8485042Z * [new branch] gh/drisspg/173/head -> origin/gh/drisspg/173/head 2025-09-07T07:55:55.8486516Z * [new branch] gh/drisspg/173/orig -> origin/gh/drisspg/173/orig 2025-09-07T07:55:55.8488920Z * [new branch] gh/drisspg/177/base -> origin/gh/drisspg/177/base 2025-09-07T07:55:55.8490336Z * [new branch] gh/drisspg/177/head -> origin/gh/drisspg/177/head 2025-09-07T07:55:55.8491970Z * [new branch] gh/drisspg/177/orig -> origin/gh/drisspg/177/orig 2025-09-07T07:55:55.8494452Z * [new branch] gh/drisspg/178/base -> origin/gh/drisspg/178/base 2025-09-07T07:55:55.8496104Z * [new branch] gh/drisspg/178/head -> origin/gh/drisspg/178/head 2025-09-07T07:55:55.8497508Z * [new branch] gh/drisspg/178/orig -> origin/gh/drisspg/178/orig 2025-09-07T07:55:55.8499770Z * [new branch] gh/drisspg/180/base -> origin/gh/drisspg/180/base 2025-09-07T07:55:55.8501397Z * [new branch] gh/drisspg/180/head -> origin/gh/drisspg/180/head 2025-09-07T07:55:55.8503140Z * [new branch] gh/drisspg/180/orig -> origin/gh/drisspg/180/orig 2025-09-07T07:55:55.8505782Z * [new branch] gh/drisspg/181/base -> origin/gh/drisspg/181/base 2025-09-07T07:55:55.8507268Z * [new branch] gh/drisspg/181/head -> origin/gh/drisspg/181/head 2025-09-07T07:55:55.8508707Z * [new branch] gh/drisspg/181/orig -> origin/gh/drisspg/181/orig 2025-09-07T07:55:55.8511037Z * [new branch] gh/drisspg/182/base -> origin/gh/drisspg/182/base 2025-09-07T07:55:55.8512668Z * [new branch] gh/drisspg/182/head -> origin/gh/drisspg/182/head 2025-09-07T07:55:55.8515256Z * [new branch] gh/drisspg/183/base -> origin/gh/drisspg/183/base 2025-09-07T07:55:55.8516676Z * [new branch] gh/drisspg/183/head -> origin/gh/drisspg/183/head 2025-09-07T07:55:55.8518968Z * [new branch] gh/drisspg/184/base -> origin/gh/drisspg/184/base 2025-09-07T07:55:55.8520418Z * [new branch] gh/drisspg/184/head -> origin/gh/drisspg/184/head 2025-09-07T07:55:55.8522706Z * [new branch] gh/drisspg/185/base -> origin/gh/drisspg/185/base 2025-09-07T07:55:55.8524309Z * [new branch] gh/drisspg/185/head -> origin/gh/drisspg/185/head 2025-09-07T07:55:55.8526830Z * [new branch] gh/drisspg/186/base -> origin/gh/drisspg/186/base 2025-09-07T07:55:55.8528375Z * [new branch] gh/drisspg/186/head -> origin/gh/drisspg/186/head 2025-09-07T07:55:55.8529924Z * [new branch] gh/drisspg/186/orig -> origin/gh/drisspg/186/orig 2025-09-07T07:55:55.8532182Z * [new branch] gh/drisspg/187/base -> origin/gh/drisspg/187/base 2025-09-07T07:55:55.8533944Z * [new branch] gh/drisspg/187/head -> origin/gh/drisspg/187/head 2025-09-07T07:55:55.8535691Z * [new branch] gh/drisspg/187/orig -> origin/gh/drisspg/187/orig 2025-09-07T07:55:55.8537871Z * [new branch] gh/drisspg/188/base -> origin/gh/drisspg/188/base 2025-09-07T07:55:55.8539365Z * [new branch] gh/drisspg/188/head -> origin/gh/drisspg/188/head 2025-09-07T07:55:55.8540907Z * [new branch] gh/drisspg/188/orig -> origin/gh/drisspg/188/orig 2025-09-07T07:55:55.8543135Z * [new branch] gh/drisspg/189/base -> origin/gh/drisspg/189/base 2025-09-07T07:55:55.8545148Z * [new branch] gh/drisspg/189/head -> origin/gh/drisspg/189/head 2025-09-07T07:55:55.8546687Z * [new branch] gh/drisspg/189/orig -> origin/gh/drisspg/189/orig 2025-09-07T07:55:55.8548988Z * [new branch] gh/drisspg/190/base -> origin/gh/drisspg/190/base 2025-09-07T07:55:55.8550489Z * [new branch] gh/drisspg/190/head -> origin/gh/drisspg/190/head 2025-09-07T07:55:55.8552047Z * [new branch] gh/drisspg/190/orig -> origin/gh/drisspg/190/orig 2025-09-07T07:55:55.8554606Z * [new branch] gh/drisspg/191/base -> origin/gh/drisspg/191/base 2025-09-07T07:55:55.8556110Z * [new branch] gh/drisspg/191/head -> origin/gh/drisspg/191/head 2025-09-07T07:55:55.8557766Z * [new branch] gh/drisspg/191/orig -> origin/gh/drisspg/191/orig 2025-09-07T07:55:55.8560035Z * [new branch] gh/drisspg/192/base -> origin/gh/drisspg/192/base 2025-09-07T07:55:55.8561811Z * [new branch] gh/drisspg/192/head -> origin/gh/drisspg/192/head 2025-09-07T07:55:55.8563143Z * [new branch] gh/drisspg/192/orig -> origin/gh/drisspg/192/orig 2025-09-07T07:55:55.8565801Z * [new branch] gh/drisspg/193/base -> origin/gh/drisspg/193/base 2025-09-07T07:55:55.8567365Z * [new branch] gh/drisspg/193/head -> origin/gh/drisspg/193/head 2025-09-07T07:55:55.8568909Z * [new branch] gh/drisspg/193/orig -> origin/gh/drisspg/193/orig 2025-09-07T07:55:55.8571145Z * [new branch] gh/drisspg/194/base -> origin/gh/drisspg/194/base 2025-09-07T07:55:55.8572673Z * [new branch] gh/drisspg/194/head -> origin/gh/drisspg/194/head 2025-09-07T07:55:55.8574637Z * [new branch] gh/drisspg/194/orig -> origin/gh/drisspg/194/orig 2025-09-07T07:55:55.8576952Z * [new branch] gh/drisspg/195/base -> origin/gh/drisspg/195/base 2025-09-07T07:55:55.8581129Z * [new branch] gh/drisspg/195/head -> origin/gh/drisspg/195/head 2025-09-07T07:55:55.8582626Z * [new branch] gh/drisspg/195/orig -> origin/gh/drisspg/195/orig 2025-09-07T07:55:55.8585355Z * [new branch] gh/drisspg/196/base -> origin/gh/drisspg/196/base 2025-09-07T07:55:55.8586809Z * [new branch] gh/drisspg/196/head -> origin/gh/drisspg/196/head 2025-09-07T07:55:55.8588419Z * [new branch] gh/drisspg/196/orig -> origin/gh/drisspg/196/orig 2025-09-07T07:55:55.8590717Z * [new branch] gh/drisspg/197/base -> origin/gh/drisspg/197/base 2025-09-07T07:55:55.8592273Z * [new branch] gh/drisspg/197/head -> origin/gh/drisspg/197/head 2025-09-07T07:55:55.8593984Z * [new branch] gh/drisspg/197/orig -> origin/gh/drisspg/197/orig 2025-09-07T07:55:55.8596422Z * [new branch] gh/drisspg/198/base -> origin/gh/drisspg/198/base 2025-09-07T07:55:55.8598061Z * [new branch] gh/drisspg/198/head -> origin/gh/drisspg/198/head 2025-09-07T07:55:55.8599812Z * [new branch] gh/drisspg/198/orig -> origin/gh/drisspg/198/orig 2025-09-07T07:55:55.8602112Z * [new branch] gh/drisspg/199/base -> origin/gh/drisspg/199/base 2025-09-07T07:55:55.8603653Z * [new branch] gh/drisspg/199/head -> origin/gh/drisspg/199/head 2025-09-07T07:55:55.8605560Z * [new branch] gh/drisspg/199/orig -> origin/gh/drisspg/199/orig 2025-09-07T07:55:55.8608448Z * [new branch] gh/dsjohns2/1/base -> origin/gh/dsjohns2/1/base 2025-09-07T07:55:55.8609913Z * [new branch] gh/dsjohns2/1/head -> origin/gh/dsjohns2/1/head 2025-09-07T07:55:55.8612640Z * [new branch] gh/eellison/784/base -> origin/gh/eellison/784/base 2025-09-07T07:55:55.8614301Z * [new branch] gh/eellison/784/head -> origin/gh/eellison/784/head 2025-09-07T07:55:55.8616162Z * [new branch] gh/eellison/784/orig -> origin/gh/eellison/784/orig 2025-09-07T07:55:55.8618537Z * [new branch] gh/eellison/785/base -> origin/gh/eellison/785/base 2025-09-07T07:55:55.8620145Z * [new branch] gh/eellison/785/head -> origin/gh/eellison/785/head 2025-09-07T07:55:55.8621734Z * [new branch] gh/eellison/785/orig -> origin/gh/eellison/785/orig 2025-09-07T07:55:55.8624153Z * [new branch] gh/eellison/789/base -> origin/gh/eellison/789/base 2025-09-07T07:55:55.8625927Z * [new branch] gh/eellison/789/head -> origin/gh/eellison/789/head 2025-09-07T07:55:55.8627428Z * [new branch] gh/eellison/789/orig -> origin/gh/eellison/789/orig 2025-09-07T07:55:55.8629615Z * [new branch] gh/eellison/800/base -> origin/gh/eellison/800/base 2025-09-07T07:55:55.8631359Z * [new branch] gh/eellison/800/head -> origin/gh/eellison/800/head 2025-09-07T07:55:55.8632744Z * [new branch] gh/eellison/800/orig -> origin/gh/eellison/800/orig 2025-09-07T07:55:55.8635462Z * [new branch] gh/eellison/801/base -> origin/gh/eellison/801/base 2025-09-07T07:55:55.8636972Z * [new branch] gh/eellison/801/head -> origin/gh/eellison/801/head 2025-09-07T07:55:55.8638763Z * [new branch] gh/eellison/801/orig -> origin/gh/eellison/801/orig 2025-09-07T07:55:55.8640929Z * [new branch] gh/eellison/802/base -> origin/gh/eellison/802/base 2025-09-07T07:55:55.8642562Z * [new branch] gh/eellison/802/head -> origin/gh/eellison/802/head 2025-09-07T07:55:55.8644302Z * [new branch] gh/eellison/802/orig -> origin/gh/eellison/802/orig 2025-09-07T07:55:55.8646661Z * [new branch] gh/eellison/805/base -> origin/gh/eellison/805/base 2025-09-07T07:55:55.8648158Z * [new branch] gh/eellison/805/head -> origin/gh/eellison/805/head 2025-09-07T07:55:55.8649830Z * [new branch] gh/eellison/805/orig -> origin/gh/eellison/805/orig 2025-09-07T07:55:55.8652132Z * [new branch] gh/eellison/808/base -> origin/gh/eellison/808/base 2025-09-07T07:55:55.8653924Z * [new branch] gh/eellison/808/head -> origin/gh/eellison/808/head 2025-09-07T07:55:55.8655711Z * [new branch] gh/eellison/808/orig -> origin/gh/eellison/808/orig 2025-09-07T07:55:55.8658176Z * [new branch] gh/eellison/809/base -> origin/gh/eellison/809/base 2025-09-07T07:55:55.8659538Z * [new branch] gh/eellison/809/head -> origin/gh/eellison/809/head 2025-09-07T07:55:55.8661084Z * [new branch] gh/eellison/809/orig -> origin/gh/eellison/809/orig 2025-09-07T07:55:55.8663407Z * [new branch] gh/eellison/813/base -> origin/gh/eellison/813/base 2025-09-07T07:55:55.8665308Z * [new branch] gh/eellison/813/head -> origin/gh/eellison/813/head 2025-09-07T07:55:55.8666781Z * [new branch] gh/eellison/813/orig -> origin/gh/eellison/813/orig 2025-09-07T07:55:55.8669046Z * [new branch] gh/eellison/814/base -> origin/gh/eellison/814/base 2025-09-07T07:55:55.8670626Z * [new branch] gh/eellison/814/head -> origin/gh/eellison/814/head 2025-09-07T07:55:55.8672224Z * [new branch] gh/eellison/814/orig -> origin/gh/eellison/814/orig 2025-09-07T07:55:55.8675208Z * [new branch] gh/eellison/815/base -> origin/gh/eellison/815/base 2025-09-07T07:55:55.8676761Z * [new branch] gh/eellison/815/head -> origin/gh/eellison/815/head 2025-09-07T07:55:55.8678425Z * [new branch] gh/eellison/815/orig -> origin/gh/eellison/815/orig 2025-09-07T07:55:55.8680738Z * [new branch] gh/eellison/816/base -> origin/gh/eellison/816/base 2025-09-07T07:55:55.8682250Z * [new branch] gh/eellison/816/head -> origin/gh/eellison/816/head 2025-09-07T07:55:55.8683932Z * [new branch] gh/eellison/816/orig -> origin/gh/eellison/816/orig 2025-09-07T07:55:55.8686447Z * [new branch] gh/eellison/817/base -> origin/gh/eellison/817/base 2025-09-07T07:55:55.8687883Z * [new branch] gh/eellison/817/head -> origin/gh/eellison/817/head 2025-09-07T07:55:55.8689378Z * [new branch] gh/eellison/817/orig -> origin/gh/eellison/817/orig 2025-09-07T07:55:55.8691681Z * [new branch] gh/eellison/818/base -> origin/gh/eellison/818/base 2025-09-07T07:55:55.8693297Z * [new branch] gh/eellison/818/head -> origin/gh/eellison/818/head 2025-09-07T07:55:55.8695378Z * [new branch] gh/eellison/818/orig -> origin/gh/eellison/818/orig 2025-09-07T07:55:55.8697956Z * [new branch] gh/eellison/819/base -> origin/gh/eellison/819/base 2025-09-07T07:55:55.8699338Z * [new branch] gh/eellison/819/head -> origin/gh/eellison/819/head 2025-09-07T07:55:55.8700923Z * [new branch] gh/eellison/819/orig -> origin/gh/eellison/819/orig 2025-09-07T07:55:55.8703280Z * [new branch] gh/eellison/820/base -> origin/gh/eellison/820/base 2025-09-07T07:55:55.8705328Z * [new branch] gh/eellison/820/head -> origin/gh/eellison/820/head 2025-09-07T07:55:55.8706855Z * [new branch] gh/eellison/820/orig -> origin/gh/eellison/820/orig 2025-09-07T07:55:55.8709006Z * [new branch] gh/eellison/821/base -> origin/gh/eellison/821/base 2025-09-07T07:55:55.8710577Z * [new branch] gh/eellison/821/head -> origin/gh/eellison/821/head 2025-09-07T07:55:55.8712135Z * [new branch] gh/eellison/821/orig -> origin/gh/eellison/821/orig 2025-09-07T07:55:55.8714841Z * [new branch] gh/eellison/822/base -> origin/gh/eellison/822/base 2025-09-07T07:55:55.8716285Z * [new branch] gh/eellison/822/head -> origin/gh/eellison/822/head 2025-09-07T07:55:55.8717947Z * [new branch] gh/eellison/822/orig -> origin/gh/eellison/822/orig 2025-09-07T07:55:55.8720350Z * [new branch] gh/eellison/823/base -> origin/gh/eellison/823/base 2025-09-07T07:55:55.8721947Z * [new branch] gh/eellison/823/head -> origin/gh/eellison/823/head 2025-09-07T07:55:55.8723463Z * [new branch] gh/eellison/823/orig -> origin/gh/eellison/823/orig 2025-09-07T07:55:55.8726632Z * [new branch] gh/etaf/132/base -> origin/gh/etaf/132/base 2025-09-07T07:55:55.8728140Z * [new branch] gh/etaf/132/head -> origin/gh/etaf/132/head 2025-09-07T07:55:55.8729652Z * [new branch] gh/etaf/132/orig -> origin/gh/etaf/132/orig 2025-09-07T07:55:55.8731920Z * [new branch] gh/etaf/138/base -> origin/gh/etaf/138/base 2025-09-07T07:55:55.8733482Z * [new branch] gh/etaf/138/head -> origin/gh/etaf/138/head 2025-09-07T07:55:55.8735330Z * [new branch] gh/etaf/138/orig -> origin/gh/etaf/138/orig 2025-09-07T07:55:55.8737599Z * [new branch] gh/etaf/140/base -> origin/gh/etaf/140/base 2025-09-07T07:55:55.8739195Z * [new branch] gh/etaf/140/head -> origin/gh/etaf/140/head 2025-09-07T07:55:55.8740750Z * [new branch] gh/etaf/140/orig -> origin/gh/etaf/140/orig 2025-09-07T07:55:55.8742977Z * [new branch] gh/etaf/143/base -> origin/gh/etaf/143/base 2025-09-07T07:55:55.8744908Z * [new branch] gh/etaf/143/head -> origin/gh/etaf/143/head 2025-09-07T07:55:55.8746378Z * [new branch] gh/etaf/143/orig -> origin/gh/etaf/143/orig 2025-09-07T07:55:55.8748641Z * [new branch] gh/etaf/147/base -> origin/gh/etaf/147/base 2025-09-07T07:55:55.8750192Z * [new branch] gh/etaf/147/head -> origin/gh/etaf/147/head 2025-09-07T07:55:55.8752559Z * [new branch] gh/etaf/151/base -> origin/gh/etaf/151/base 2025-09-07T07:55:55.8754363Z * [new branch] gh/etaf/151/head -> origin/gh/etaf/151/head 2025-09-07T07:55:55.8756188Z * [new branch] gh/etaf/151/orig -> origin/gh/etaf/151/orig 2025-09-07T07:55:55.8758674Z * [new branch] gh/etaf/152/base -> origin/gh/etaf/152/base 2025-09-07T07:55:55.8760253Z * [new branch] gh/etaf/152/head -> origin/gh/etaf/152/head 2025-09-07T07:55:55.8761855Z * [new branch] gh/etaf/152/orig -> origin/gh/etaf/152/orig 2025-09-07T07:55:55.8764476Z * [new branch] gh/etaf/153/base -> origin/gh/etaf/153/base 2025-09-07T07:55:55.8766251Z * [new branch] gh/etaf/153/head -> origin/gh/etaf/153/head 2025-09-07T07:55:55.8767647Z * [new branch] gh/etaf/153/orig -> origin/gh/etaf/153/orig 2025-09-07T07:55:55.8769999Z * [new branch] gh/etaf/154/base -> origin/gh/etaf/154/base 2025-09-07T07:55:55.8771625Z * [new branch] gh/etaf/154/head -> origin/gh/etaf/154/head 2025-09-07T07:55:55.8773189Z * [new branch] gh/etaf/154/orig -> origin/gh/etaf/154/orig 2025-09-07T07:55:55.8776024Z * [new branch] gh/etaf/155/base -> origin/gh/etaf/155/base 2025-09-07T07:55:55.8777578Z * [new branch] gh/etaf/155/head -> origin/gh/etaf/155/head 2025-09-07T07:55:55.8779079Z * [new branch] gh/etaf/155/orig -> origin/gh/etaf/155/orig 2025-09-07T07:55:55.8781230Z * [new branch] gh/etaf/156/base -> origin/gh/etaf/156/base 2025-09-07T07:55:55.8782849Z * [new branch] gh/etaf/156/head -> origin/gh/etaf/156/head 2025-09-07T07:55:55.8784821Z * [new branch] gh/etaf/156/orig -> origin/gh/etaf/156/orig 2025-09-07T07:55:55.8787154Z * [new branch] gh/etaf/157/base -> origin/gh/etaf/157/base 2025-09-07T07:55:55.8788735Z * [new branch] gh/etaf/157/head -> origin/gh/etaf/157/head 2025-09-07T07:55:55.8790523Z * [new branch] gh/etaf/157/orig -> origin/gh/etaf/157/orig 2025-09-07T07:55:55.8792682Z * [new branch] gh/etaf/158/base -> origin/gh/etaf/158/base 2025-09-07T07:55:55.8794793Z * [new branch] gh/etaf/158/head -> origin/gh/etaf/158/head 2025-09-07T07:55:55.8796353Z * [new branch] gh/etaf/158/orig -> origin/gh/etaf/158/orig 2025-09-07T07:55:55.8798822Z * [new branch] gh/etaf/159/base -> origin/gh/etaf/159/base 2025-09-07T07:55:55.8800408Z * [new branch] gh/etaf/159/head -> origin/gh/etaf/159/head 2025-09-07T07:55:55.8801936Z * [new branch] gh/etaf/159/orig -> origin/gh/etaf/159/orig 2025-09-07T07:55:55.8804556Z * [new branch] gh/etaf/160/base -> origin/gh/etaf/160/base 2025-09-07T07:55:55.8806147Z * [new branch] gh/etaf/160/head -> origin/gh/etaf/160/head 2025-09-07T07:55:55.8807712Z * [new branch] gh/etaf/160/orig -> origin/gh/etaf/160/orig 2025-09-07T07:55:55.8809964Z * [new branch] gh/etaf/161/base -> origin/gh/etaf/161/base 2025-09-07T07:55:55.8811578Z * [new branch] gh/etaf/161/head -> origin/gh/etaf/161/head 2025-09-07T07:55:55.8813142Z * [new branch] gh/etaf/161/orig -> origin/gh/etaf/161/orig 2025-09-07T07:55:55.8815834Z * [new branch] gh/etaf/162/base -> origin/gh/etaf/162/base 2025-09-07T07:55:55.8817406Z * [new branch] gh/etaf/162/head -> origin/gh/etaf/162/head 2025-09-07T07:55:55.8818875Z * [new branch] gh/etaf/162/orig -> origin/gh/etaf/162/orig 2025-09-07T07:55:55.8821133Z * [new branch] gh/etaf/163/base -> origin/gh/etaf/163/base 2025-09-07T07:55:55.8822705Z * [new branch] gh/etaf/163/head -> origin/gh/etaf/163/head 2025-09-07T07:55:55.8824518Z * [new branch] gh/etaf/163/orig -> origin/gh/etaf/163/orig 2025-09-07T07:55:55.8826903Z * [new branch] gh/etaf/164/base -> origin/gh/etaf/164/base 2025-09-07T07:55:55.8828524Z * [new branch] gh/etaf/164/head -> origin/gh/etaf/164/head 2025-09-07T07:55:55.8830110Z * [new branch] gh/etaf/164/orig -> origin/gh/etaf/164/orig 2025-09-07T07:55:55.8832469Z * [new branch] gh/etaf/165/base -> origin/gh/etaf/165/base 2025-09-07T07:55:55.8834285Z * [new branch] gh/etaf/165/orig -> origin/gh/etaf/165/orig 2025-09-07T07:55:55.8836684Z * [new branch] gh/etaf/166/base -> origin/gh/etaf/166/base 2025-09-07T07:55:55.8838347Z * [new branch] gh/etaf/166/head -> origin/gh/etaf/166/head 2025-09-07T07:55:55.8839845Z * [new branch] gh/etaf/166/orig -> origin/gh/etaf/166/orig 2025-09-07T07:55:55.8842155Z * [new branch] gh/etaf/167/base -> origin/gh/etaf/167/base 2025-09-07T07:55:55.8843947Z * [new branch] gh/etaf/167/head -> origin/gh/etaf/167/head 2025-09-07T07:55:55.8845861Z * [new branch] gh/etaf/167/orig -> origin/gh/etaf/167/orig 2025-09-07T07:55:55.8848044Z * [new branch] gh/etaf/168/base -> origin/gh/etaf/168/base 2025-09-07T07:55:55.8849820Z * [new branch] gh/etaf/168/head -> origin/gh/etaf/168/head 2025-09-07T07:55:55.8851265Z * [new branch] gh/etaf/168/orig -> origin/gh/etaf/168/orig 2025-09-07T07:55:55.8853821Z * [new branch] gh/etaf/169/base -> origin/gh/etaf/169/base 2025-09-07T07:55:55.8855546Z * [new branch] gh/etaf/169/head -> origin/gh/etaf/169/head 2025-09-07T07:55:55.8857130Z * [new branch] gh/etaf/169/orig -> origin/gh/etaf/169/orig 2025-09-07T07:55:55.8859882Z * [new branch] gh/exclamaforte/1/base -> origin/gh/exclamaforte/1/base 2025-09-07T07:55:55.8861356Z * [new branch] gh/exclamaforte/1/head -> origin/gh/exclamaforte/1/head 2025-09-07T07:55:55.8863526Z * [new branch] gh/exclamaforte/2/base -> origin/gh/exclamaforte/2/base 2025-09-07T07:55:55.8865369Z * [new branch] gh/exclamaforte/2/head -> origin/gh/exclamaforte/2/head 2025-09-07T07:55:55.8867645Z * [new branch] gh/exclamaforte/3/base -> origin/gh/exclamaforte/3/base 2025-09-07T07:55:55.8869207Z * [new branch] gh/exclamaforte/3/head -> origin/gh/exclamaforte/3/head 2025-09-07T07:55:55.8871454Z * [new branch] gh/exclamaforte/4/base -> origin/gh/exclamaforte/4/base 2025-09-07T07:55:55.8872980Z * [new branch] gh/exclamaforte/4/head -> origin/gh/exclamaforte/4/head 2025-09-07T07:55:55.8876362Z * [new branch] gh/ezyang/2374/base -> origin/gh/ezyang/2374/base 2025-09-07T07:55:55.8877947Z * [new branch] gh/ezyang/2374/head -> origin/gh/ezyang/2374/head 2025-09-07T07:55:55.8879485Z * [new branch] gh/ezyang/2374/orig -> origin/gh/ezyang/2374/orig 2025-09-07T07:55:55.8881740Z * [new branch] gh/ezyang/2973/base -> origin/gh/ezyang/2973/base 2025-09-07T07:55:55.8883288Z * [new branch] gh/ezyang/2973/head -> origin/gh/ezyang/2973/head 2025-09-07T07:55:55.8885203Z * [new branch] gh/ezyang/2973/orig -> origin/gh/ezyang/2973/orig 2025-09-07T07:55:55.8887658Z * [new branch] gh/ezyang/2974/base -> origin/gh/ezyang/2974/base 2025-09-07T07:55:55.8889235Z * [new branch] gh/ezyang/2974/head -> origin/gh/ezyang/2974/head 2025-09-07T07:55:55.8890882Z * [new branch] gh/ezyang/2974/orig -> origin/gh/ezyang/2974/orig 2025-09-07T07:55:55.8893063Z * [new branch] gh/ezyang/3074/base -> origin/gh/ezyang/3074/base 2025-09-07T07:55:55.8895001Z * [new branch] gh/ezyang/3074/head -> origin/gh/ezyang/3074/head 2025-09-07T07:55:55.8896505Z * [new branch] gh/ezyang/3074/orig -> origin/gh/ezyang/3074/orig 2025-09-07T07:55:55.8898782Z * [new branch] gh/ezyang/3088/base -> origin/gh/ezyang/3088/base 2025-09-07T07:55:55.8900344Z * [new branch] gh/ezyang/3088/head -> origin/gh/ezyang/3088/head 2025-09-07T07:55:55.8902049Z * [new branch] gh/ezyang/3088/orig -> origin/gh/ezyang/3088/orig 2025-09-07T07:55:55.8904348Z * [new branch] gh/ezyang/3092/base -> origin/gh/ezyang/3092/base 2025-09-07T07:55:55.8906148Z * [new branch] gh/ezyang/3092/head -> origin/gh/ezyang/3092/head 2025-09-07T07:55:55.8907663Z * [new branch] gh/ezyang/3092/orig -> origin/gh/ezyang/3092/orig 2025-09-07T07:55:55.8909882Z * [new branch] gh/ezyang/3103/base -> origin/gh/ezyang/3103/base 2025-09-07T07:55:55.8911451Z * [new branch] gh/ezyang/3103/head -> origin/gh/ezyang/3103/head 2025-09-07T07:55:55.8912966Z * [new branch] gh/ezyang/3103/orig -> origin/gh/ezyang/3103/orig 2025-09-07T07:55:55.8915554Z * [new branch] gh/ezyang/3105/base -> origin/gh/ezyang/3105/base 2025-09-07T07:55:55.8917169Z * [new branch] gh/ezyang/3105/head -> origin/gh/ezyang/3105/head 2025-09-07T07:55:55.8918730Z * [new branch] gh/ezyang/3105/orig -> origin/gh/ezyang/3105/orig 2025-09-07T07:55:55.8920999Z * [new branch] gh/ezyang/3114/base -> origin/gh/ezyang/3114/base 2025-09-07T07:55:55.8922678Z * [new branch] gh/ezyang/3114/head -> origin/gh/ezyang/3114/head 2025-09-07T07:55:55.8924386Z * [new branch] gh/ezyang/3114/orig -> origin/gh/ezyang/3114/orig 2025-09-07T07:55:55.8926771Z * [new branch] gh/ezyang/3116/base -> origin/gh/ezyang/3116/base 2025-09-07T07:55:55.8928312Z * [new branch] gh/ezyang/3116/head -> origin/gh/ezyang/3116/head 2025-09-07T07:55:55.8929895Z * [new branch] gh/ezyang/3116/orig -> origin/gh/ezyang/3116/orig 2025-09-07T07:55:55.8932125Z * [new branch] gh/ezyang/3120/base -> origin/gh/ezyang/3120/base 2025-09-07T07:55:55.8933842Z * [new branch] gh/ezyang/3120/head -> origin/gh/ezyang/3120/head 2025-09-07T07:55:55.8935522Z * [new branch] gh/ezyang/3120/orig -> origin/gh/ezyang/3120/orig 2025-09-07T07:55:55.8937722Z * [new branch] gh/ezyang/3122/base -> origin/gh/ezyang/3122/base 2025-09-07T07:55:55.8939284Z * [new branch] gh/ezyang/3122/head -> origin/gh/ezyang/3122/head 2025-09-07T07:55:55.8940791Z * [new branch] gh/ezyang/3122/orig -> origin/gh/ezyang/3122/orig 2025-09-07T07:55:55.8943122Z * [new branch] gh/ezyang/3123/base -> origin/gh/ezyang/3123/base 2025-09-07T07:55:55.8945085Z * [new branch] gh/ezyang/3123/head -> origin/gh/ezyang/3123/head 2025-09-07T07:55:55.8946539Z * [new branch] gh/ezyang/3123/orig -> origin/gh/ezyang/3123/orig 2025-09-07T07:55:55.8948747Z * [new branch] gh/ezyang/3125/base -> origin/gh/ezyang/3125/base 2025-09-07T07:55:55.8950282Z * [new branch] gh/ezyang/3125/head -> origin/gh/ezyang/3125/head 2025-09-07T07:55:55.8951853Z * [new branch] gh/ezyang/3125/orig -> origin/gh/ezyang/3125/orig 2025-09-07T07:55:55.8954332Z * [new branch] gh/ezyang/3126/base -> origin/gh/ezyang/3126/base 2025-09-07T07:55:55.8955977Z * [new branch] gh/ezyang/3126/head -> origin/gh/ezyang/3126/head 2025-09-07T07:55:55.8957638Z * [new branch] gh/ezyang/3126/orig -> origin/gh/ezyang/3126/orig 2025-09-07T07:55:55.8959851Z * [new branch] gh/ezyang/3127/base -> origin/gh/ezyang/3127/base 2025-09-07T07:55:55.8961478Z * [new branch] gh/ezyang/3127/head -> origin/gh/ezyang/3127/head 2025-09-07T07:55:55.8963124Z * [new branch] gh/ezyang/3127/orig -> origin/gh/ezyang/3127/orig 2025-09-07T07:55:55.8991187Z * [new branch] gh/ezyang/3128/base -> origin/gh/ezyang/3128/base 2025-09-07T07:55:55.8994240Z * [new branch] gh/ezyang/3128/head -> origin/gh/ezyang/3128/head 2025-09-07T07:55:55.8995714Z * [new branch] gh/ezyang/3128/orig -> origin/gh/ezyang/3128/orig 2025-09-07T07:55:55.8996392Z * [new branch] gh/ezyang/3129/base -> origin/gh/ezyang/3129/base 2025-09-07T07:55:55.8996884Z * [new branch] gh/ezyang/3129/head -> origin/gh/ezyang/3129/head 2025-09-07T07:55:55.8997521Z * [new branch] gh/ezyang/3129/orig -> origin/gh/ezyang/3129/orig 2025-09-07T07:55:55.8998026Z * [new branch] gh/ezyang/3130/base -> origin/gh/ezyang/3130/base 2025-09-07T07:55:55.8998507Z * [new branch] gh/ezyang/3130/head -> origin/gh/ezyang/3130/head 2025-09-07T07:55:55.8998977Z * [new branch] gh/ezyang/3130/orig -> origin/gh/ezyang/3130/orig 2025-09-07T07:55:55.8999437Z * [new branch] gh/ezyang/3131/base -> origin/gh/ezyang/3131/base 2025-09-07T07:55:55.8999904Z * [new branch] gh/ezyang/3131/head -> origin/gh/ezyang/3131/head 2025-09-07T07:55:55.9000395Z * [new branch] gh/ezyang/3131/orig -> origin/gh/ezyang/3131/orig 2025-09-07T07:55:55.9000893Z * [new branch] gh/ezyang/3132/base -> origin/gh/ezyang/3132/base 2025-09-07T07:55:55.9001285Z * [new branch] gh/ezyang/3132/head -> origin/gh/ezyang/3132/head 2025-09-07T07:55:55.9001850Z * [new branch] gh/ezyang/3132/orig -> origin/gh/ezyang/3132/orig 2025-09-07T07:55:55.9002249Z * [new branch] gh/ezyang/3133/base -> origin/gh/ezyang/3133/base 2025-09-07T07:55:55.9002647Z * [new branch] gh/ezyang/3133/head -> origin/gh/ezyang/3133/head 2025-09-07T07:55:55.9003065Z * [new branch] gh/ezyang/3133/orig -> origin/gh/ezyang/3133/orig 2025-09-07T07:55:55.9003468Z * [new branch] gh/ezyang/3134/base -> origin/gh/ezyang/3134/base 2025-09-07T07:55:55.9004104Z * [new branch] gh/ezyang/3134/head -> origin/gh/ezyang/3134/head 2025-09-07T07:55:55.9004513Z * [new branch] gh/ezyang/3134/orig -> origin/gh/ezyang/3134/orig 2025-09-07T07:55:55.9004921Z * [new branch] gh/ezyang/3135/base -> origin/gh/ezyang/3135/base 2025-09-07T07:55:55.9006465Z * [new branch] gh/ezyang/3135/head -> origin/gh/ezyang/3135/head 2025-09-07T07:55:55.9008011Z * [new branch] gh/ezyang/3135/orig -> origin/gh/ezyang/3135/orig 2025-09-07T07:55:55.9010243Z * [new branch] gh/ezyang/3136/base -> origin/gh/ezyang/3136/base 2025-09-07T07:55:55.9011767Z * [new branch] gh/ezyang/3136/head -> origin/gh/ezyang/3136/head 2025-09-07T07:55:55.9013282Z * [new branch] gh/ezyang/3136/orig -> origin/gh/ezyang/3136/orig 2025-09-07T07:55:55.9015973Z * [new branch] gh/ezyang/3137/base -> origin/gh/ezyang/3137/base 2025-09-07T07:55:55.9017479Z * [new branch] gh/ezyang/3137/head -> origin/gh/ezyang/3137/head 2025-09-07T07:55:55.9019046Z * [new branch] gh/ezyang/3137/orig -> origin/gh/ezyang/3137/orig 2025-09-07T07:55:55.9021249Z * [new branch] gh/ezyang/3138/base -> origin/gh/ezyang/3138/base 2025-09-07T07:55:55.9022838Z * [new branch] gh/ezyang/3138/head -> origin/gh/ezyang/3138/head 2025-09-07T07:55:55.9024726Z * [new branch] gh/ezyang/3138/orig -> origin/gh/ezyang/3138/orig 2025-09-07T07:55:55.9027073Z * [new branch] gh/ezyang/3139/base -> origin/gh/ezyang/3139/base 2025-09-07T07:55:55.9028570Z * [new branch] gh/ezyang/3139/head -> origin/gh/ezyang/3139/head 2025-09-07T07:55:55.9030109Z * [new branch] gh/ezyang/3139/orig -> origin/gh/ezyang/3139/orig 2025-09-07T07:55:55.9032367Z * [new branch] gh/ezyang/3140/base -> origin/gh/ezyang/3140/base 2025-09-07T07:55:55.9034255Z * [new branch] gh/ezyang/3140/head -> origin/gh/ezyang/3140/head 2025-09-07T07:55:55.9035803Z * [new branch] gh/ezyang/3140/orig -> origin/gh/ezyang/3140/orig 2025-09-07T07:55:55.9038169Z * [new branch] gh/ezyang/3141/base -> origin/gh/ezyang/3141/base 2025-09-07T07:55:55.9039700Z * [new branch] gh/ezyang/3141/head -> origin/gh/ezyang/3141/head 2025-09-07T07:55:55.9041236Z * [new branch] gh/ezyang/3141/orig -> origin/gh/ezyang/3141/orig 2025-09-07T07:55:55.9043481Z * [new branch] gh/ezyang/3142/base -> origin/gh/ezyang/3142/base 2025-09-07T07:55:55.9045470Z * [new branch] gh/ezyang/3142/head -> origin/gh/ezyang/3142/head 2025-09-07T07:55:55.9046981Z * [new branch] gh/ezyang/3142/orig -> origin/gh/ezyang/3142/orig 2025-09-07T07:55:55.9049248Z * [new branch] gh/ezyang/3143/base -> origin/gh/ezyang/3143/base 2025-09-07T07:55:55.9050776Z * [new branch] gh/ezyang/3143/head -> origin/gh/ezyang/3143/head 2025-09-07T07:55:55.9052304Z * [new branch] gh/ezyang/3143/orig -> origin/gh/ezyang/3143/orig 2025-09-07T07:55:55.9055526Z * [new branch] gh/fadara01/1/base -> origin/gh/fadara01/1/base 2025-09-07T07:55:55.9058131Z * [new branch] gh/fadara01/1/head -> origin/gh/fadara01/1/head 2025-09-07T07:55:55.9059671Z * [new branch] gh/fadara01/1/orig -> origin/gh/fadara01/1/orig 2025-09-07T07:55:55.9062673Z * [new branch] gh/fduwjj/171/base -> origin/gh/fduwjj/171/base 2025-09-07T07:55:55.9064639Z * [new branch] gh/fduwjj/171/head -> origin/gh/fduwjj/171/head 2025-09-07T07:55:55.9066112Z * [new branch] gh/fduwjj/171/orig -> origin/gh/fduwjj/171/orig 2025-09-07T07:55:55.9068499Z * [new branch] gh/fduwjj/175/base -> origin/gh/fduwjj/175/base 2025-09-07T07:55:55.9070234Z * [new branch] gh/fduwjj/175/head -> origin/gh/fduwjj/175/head 2025-09-07T07:55:55.9071773Z * [new branch] gh/fduwjj/175/orig -> origin/gh/fduwjj/175/orig 2025-09-07T07:55:55.9074091Z * [new branch] gh/fduwjj/176/base -> origin/gh/fduwjj/176/base 2025-09-07T07:55:55.9076095Z * [new branch] gh/fduwjj/176/head -> origin/gh/fduwjj/176/head 2025-09-07T07:55:55.9077694Z * [new branch] gh/fduwjj/176/orig -> origin/gh/fduwjj/176/orig 2025-09-07T07:55:55.9079959Z * [new branch] gh/fduwjj/177/base -> origin/gh/fduwjj/177/base 2025-09-07T07:55:55.9081595Z * [new branch] gh/fduwjj/177/head -> origin/gh/fduwjj/177/head 2025-09-07T07:55:55.9083159Z * [new branch] gh/fduwjj/177/orig -> origin/gh/fduwjj/177/orig 2025-09-07T07:55:55.9085841Z * [new branch] gh/fduwjj/178/base -> origin/gh/fduwjj/178/base 2025-09-07T07:55:55.9087401Z * [new branch] gh/fduwjj/178/head -> origin/gh/fduwjj/178/head 2025-09-07T07:55:55.9088958Z * [new branch] gh/fduwjj/178/orig -> origin/gh/fduwjj/178/orig 2025-09-07T07:55:55.9091138Z * [new branch] gh/fduwjj/179/base -> origin/gh/fduwjj/179/base 2025-09-07T07:55:55.9092632Z * [new branch] gh/fduwjj/179/head -> origin/gh/fduwjj/179/head 2025-09-07T07:55:55.9094595Z * [new branch] gh/fduwjj/179/orig -> origin/gh/fduwjj/179/orig 2025-09-07T07:55:55.9096881Z * [new branch] gh/fduwjj/180/base -> origin/gh/fduwjj/180/base 2025-09-07T07:55:55.9098429Z * [new branch] gh/fduwjj/180/head -> origin/gh/fduwjj/180/head 2025-09-07T07:55:55.9100000Z * [new branch] gh/fduwjj/180/orig -> origin/gh/fduwjj/180/orig 2025-09-07T07:55:55.9102253Z * [new branch] gh/fduwjj/181/base -> origin/gh/fduwjj/181/base 2025-09-07T07:55:55.9104049Z * [new branch] gh/fduwjj/181/head -> origin/gh/fduwjj/181/head 2025-09-07T07:55:55.9105707Z * [new branch] gh/fduwjj/181/orig -> origin/gh/fduwjj/181/orig 2025-09-07T07:55:55.9107879Z * [new branch] gh/fduwjj/182/base -> origin/gh/fduwjj/182/base 2025-09-07T07:55:55.9109415Z * [new branch] gh/fduwjj/182/head -> origin/gh/fduwjj/182/head 2025-09-07T07:55:55.9110961Z * [new branch] gh/fduwjj/182/orig -> origin/gh/fduwjj/182/orig 2025-09-07T07:55:55.9113305Z * [new branch] gh/fduwjj/183/base -> origin/gh/fduwjj/183/base 2025-09-07T07:55:55.9115393Z * [new branch] gh/fduwjj/183/head -> origin/gh/fduwjj/183/head 2025-09-07T07:55:55.9116902Z * [new branch] gh/fduwjj/183/orig -> origin/gh/fduwjj/183/orig 2025-09-07T07:55:55.9119509Z * [new branch] gh/fduwjj/184/base -> origin/gh/fduwjj/184/base 2025-09-07T07:55:55.9121161Z * [new branch] gh/fduwjj/184/head -> origin/gh/fduwjj/184/head 2025-09-07T07:55:55.9122603Z * [new branch] gh/fduwjj/184/orig -> origin/gh/fduwjj/184/orig 2025-09-07T07:55:55.9125310Z * [new branch] gh/fduwjj/185/base -> origin/gh/fduwjj/185/base 2025-09-07T07:55:55.9126833Z * [new branch] gh/fduwjj/185/head -> origin/gh/fduwjj/185/head 2025-09-07T07:55:55.9128338Z * [new branch] gh/fduwjj/185/orig -> origin/gh/fduwjj/185/orig 2025-09-07T07:55:55.9130445Z * [new branch] gh/fduwjj/186/base -> origin/gh/fduwjj/186/base 2025-09-07T07:55:55.9132092Z * [new branch] gh/fduwjj/186/head -> origin/gh/fduwjj/186/head 2025-09-07T07:55:55.9133602Z * [new branch] gh/fduwjj/186/orig -> origin/gh/fduwjj/186/orig 2025-09-07T07:55:55.9136098Z * [new branch] gh/fduwjj/187/base -> origin/gh/fduwjj/187/base 2025-09-07T07:55:55.9137585Z * [new branch] gh/fduwjj/187/head -> origin/gh/fduwjj/187/head 2025-09-07T07:55:55.9139152Z * [new branch] gh/fduwjj/187/orig -> origin/gh/fduwjj/187/orig 2025-09-07T07:55:55.9141305Z * [new branch] gh/fduwjj/188/base -> origin/gh/fduwjj/188/base 2025-09-07T07:55:55.9142945Z * [new branch] gh/fduwjj/188/head -> origin/gh/fduwjj/188/head 2025-09-07T07:55:55.9144695Z * [new branch] gh/fduwjj/188/orig -> origin/gh/fduwjj/188/orig 2025-09-07T07:55:55.9146947Z * [new branch] gh/fduwjj/189/base -> origin/gh/fduwjj/189/base 2025-09-07T07:55:55.9148310Z * [new branch] gh/fduwjj/189/head -> origin/gh/fduwjj/189/head 2025-09-07T07:55:55.9149750Z * [new branch] gh/fduwjj/189/orig -> origin/gh/fduwjj/189/orig 2025-09-07T07:55:55.9152060Z * [new branch] gh/fduwjj/190/base -> origin/gh/fduwjj/190/base 2025-09-07T07:55:55.9153893Z * [new branch] gh/fduwjj/190/head -> origin/gh/fduwjj/190/head 2025-09-07T07:55:55.9155474Z * [new branch] gh/fduwjj/190/orig -> origin/gh/fduwjj/190/orig 2025-09-07T07:55:55.9157631Z * [new branch] gh/fduwjj/191/base -> origin/gh/fduwjj/191/base 2025-09-07T07:55:55.9159310Z * [new branch] gh/fduwjj/191/head -> origin/gh/fduwjj/191/head 2025-09-07T07:55:55.9160796Z * [new branch] gh/fduwjj/191/orig -> origin/gh/fduwjj/191/orig 2025-09-07T07:55:55.9163614Z * [new branch] gh/fegin/306/base -> origin/gh/fegin/306/base 2025-09-07T07:55:55.9165482Z * [new branch] gh/fegin/306/head -> origin/gh/fegin/306/head 2025-09-07T07:55:55.9167998Z * [new branch] gh/fegin/306/orig -> origin/gh/fegin/306/orig 2025-09-07T07:55:55.9169689Z * [new branch] gh/fegin/307/base -> origin/gh/fegin/307/base 2025-09-07T07:55:55.9171035Z * [new branch] gh/fegin/307/head -> origin/gh/fegin/307/head 2025-09-07T07:55:55.9172871Z * [new branch] gh/fegin/307/orig -> origin/gh/fegin/307/orig 2025-09-07T07:55:55.9175261Z * [new branch] gh/fegin/308/base -> origin/gh/fegin/308/base 2025-09-07T07:55:55.9176706Z * [new branch] gh/fegin/308/head -> origin/gh/fegin/308/head 2025-09-07T07:55:55.9178311Z * [new branch] gh/fegin/308/orig -> origin/gh/fegin/308/orig 2025-09-07T07:55:55.9180613Z * [new branch] gh/fegin/309/base -> origin/gh/fegin/309/base 2025-09-07T07:55:55.9182127Z * [new branch] gh/fegin/309/head -> origin/gh/fegin/309/head 2025-09-07T07:55:55.9183866Z * [new branch] gh/fegin/309/orig -> origin/gh/fegin/309/orig 2025-09-07T07:55:55.9186357Z * [new branch] gh/fegin/310/base -> origin/gh/fegin/310/base 2025-09-07T07:55:55.9187884Z * [new branch] gh/fegin/310/head -> origin/gh/fegin/310/head 2025-09-07T07:55:55.9189520Z * [new branch] gh/fegin/310/orig -> origin/gh/fegin/310/orig 2025-09-07T07:55:55.9191755Z * [new branch] gh/fegin/311/base -> origin/gh/fegin/311/base 2025-09-07T07:55:55.9193328Z * [new branch] gh/fegin/311/head -> origin/gh/fegin/311/head 2025-09-07T07:55:55.9195271Z * [new branch] gh/fegin/311/orig -> origin/gh/fegin/311/orig 2025-09-07T07:55:55.9197439Z * [new branch] gh/fegin/312/base -> origin/gh/fegin/312/base 2025-09-07T07:55:55.9199026Z * [new branch] gh/fegin/312/head -> origin/gh/fegin/312/head 2025-09-07T07:55:55.9200533Z * [new branch] gh/fegin/312/orig -> origin/gh/fegin/312/orig 2025-09-07T07:55:55.9202746Z * [new branch] gh/fegin/313/base -> origin/gh/fegin/313/base 2025-09-07T07:55:55.9204639Z * [new branch] gh/fegin/313/head -> origin/gh/fegin/313/head 2025-09-07T07:55:55.9206364Z * [new branch] gh/fegin/313/orig -> origin/gh/fegin/313/orig 2025-09-07T07:55:55.9209018Z * [new branch] gh/fffrog/124/base -> origin/gh/fffrog/124/base 2025-09-07T07:55:55.9210630Z * [new branch] gh/fffrog/124/head -> origin/gh/fffrog/124/head 2025-09-07T07:55:55.9212171Z * [new branch] gh/fffrog/124/orig -> origin/gh/fffrog/124/orig 2025-09-07T07:55:55.9214748Z * [new branch] gh/fffrog/129/base -> origin/gh/fffrog/129/base 2025-09-07T07:55:55.9216271Z * [new branch] gh/fffrog/129/head -> origin/gh/fffrog/129/head 2025-09-07T07:55:55.9217810Z * [new branch] gh/fffrog/129/orig -> origin/gh/fffrog/129/orig 2025-09-07T07:55:55.9220084Z * [new branch] gh/fffrog/130/base -> origin/gh/fffrog/130/base 2025-09-07T07:55:55.9221628Z * [new branch] gh/fffrog/130/head -> origin/gh/fffrog/130/head 2025-09-07T07:55:55.9223347Z * [new branch] gh/fffrog/130/orig -> origin/gh/fffrog/130/orig 2025-09-07T07:55:55.9226095Z * [new branch] gh/fffrog/131/base -> origin/gh/fffrog/131/base 2025-09-07T07:55:55.9227519Z * [new branch] gh/fffrog/131/head -> origin/gh/fffrog/131/head 2025-09-07T07:55:55.9229101Z * [new branch] gh/fffrog/131/orig -> origin/gh/fffrog/131/orig 2025-09-07T07:55:55.9231347Z * [new branch] gh/fffrog/132/base -> origin/gh/fffrog/132/base 2025-09-07T07:55:55.9232865Z * [new branch] gh/fffrog/132/head -> origin/gh/fffrog/132/head 2025-09-07T07:55:55.9234743Z * [new branch] gh/fffrog/132/orig -> origin/gh/fffrog/132/orig 2025-09-07T07:55:55.9237215Z * [new branch] gh/fffrog/133/base -> origin/gh/fffrog/133/base 2025-09-07T07:55:55.9238656Z * [new branch] gh/fffrog/133/head -> origin/gh/fffrog/133/head 2025-09-07T07:55:55.9240267Z * [new branch] gh/fffrog/133/orig -> origin/gh/fffrog/133/orig 2025-09-07T07:55:55.9242575Z * [new branch] gh/fffrog/134/base -> origin/gh/fffrog/134/base 2025-09-07T07:55:55.9244486Z * [new branch] gh/fffrog/134/head -> origin/gh/fffrog/134/head 2025-09-07T07:55:55.9245997Z * [new branch] gh/fffrog/134/orig -> origin/gh/fffrog/134/orig 2025-09-07T07:55:55.9248440Z * [new branch] gh/fffrog/135/base -> origin/gh/fffrog/135/base 2025-09-07T07:55:55.9249737Z * [new branch] gh/fffrog/135/head -> origin/gh/fffrog/135/head 2025-09-07T07:55:55.9251254Z * [new branch] gh/fffrog/135/orig -> origin/gh/fffrog/135/orig 2025-09-07T07:55:55.9253428Z * [new branch] gh/fffrog/136/base -> origin/gh/fffrog/136/base 2025-09-07T07:55:55.9257583Z * [new branch] gh/fffrog/136/head -> origin/gh/fffrog/136/head 2025-09-07T07:55:55.9258426Z * [new branch] gh/fffrog/136/orig -> origin/gh/fffrog/136/orig 2025-09-07T07:55:55.9259049Z * [new branch] gh/fffrog/137/base -> origin/gh/fffrog/137/base 2025-09-07T07:55:55.9260636Z * [new branch] gh/fffrog/137/head -> origin/gh/fffrog/137/head 2025-09-07T07:55:55.9262294Z * [new branch] gh/fffrog/137/orig -> origin/gh/fffrog/137/orig 2025-09-07T07:55:55.9264998Z * [new branch] gh/fffrog/138/base -> origin/gh/fffrog/138/base 2025-09-07T07:55:55.9266410Z * [new branch] gh/fffrog/138/head -> origin/gh/fffrog/138/head 2025-09-07T07:55:55.9267989Z * [new branch] gh/fffrog/138/orig -> origin/gh/fffrog/138/orig 2025-09-07T07:55:55.9270204Z * [new branch] gh/fffrog/139/base -> origin/gh/fffrog/139/base 2025-09-07T07:55:55.9271809Z * [new branch] gh/fffrog/139/head -> origin/gh/fffrog/139/head 2025-09-07T07:55:55.9273385Z * [new branch] gh/fffrog/139/orig -> origin/gh/fffrog/139/orig 2025-09-07T07:55:55.9276161Z * [new branch] gh/fffrog/140/base -> origin/gh/fffrog/140/base 2025-09-07T07:55:55.9277770Z * [new branch] gh/fffrog/140/head -> origin/gh/fffrog/140/head 2025-09-07T07:55:55.9279245Z * [new branch] gh/fffrog/140/orig -> origin/gh/fffrog/140/orig 2025-09-07T07:55:55.9281600Z * [new branch] gh/fffrog/141/base -> origin/gh/fffrog/141/base 2025-09-07T07:55:55.9283098Z * [new branch] gh/fffrog/141/head -> origin/gh/fffrog/141/head 2025-09-07T07:55:55.9284990Z * [new branch] gh/fffrog/141/orig -> origin/gh/fffrog/141/orig 2025-09-07T07:55:55.9287182Z * [new branch] gh/fffrog/142/base -> origin/gh/fffrog/142/base 2025-09-07T07:55:55.9288684Z * [new branch] gh/fffrog/142/head -> origin/gh/fffrog/142/head 2025-09-07T07:55:55.9290737Z * [new branch] gh/fffrog/142/orig -> origin/gh/fffrog/142/orig 2025-09-07T07:55:55.9292443Z * [new branch] gh/fffrog/143/base -> origin/gh/fffrog/143/base 2025-09-07T07:55:55.9294174Z * [new branch] gh/fffrog/143/head -> origin/gh/fffrog/143/head 2025-09-07T07:55:55.9295874Z * [new branch] gh/fffrog/143/orig -> origin/gh/fffrog/143/orig 2025-09-07T07:55:55.9298107Z * [new branch] gh/fffrog/144/base -> origin/gh/fffrog/144/base 2025-09-07T07:55:55.9299697Z * [new branch] gh/fffrog/144/head -> origin/gh/fffrog/144/head 2025-09-07T07:55:55.9301185Z * [new branch] gh/fffrog/144/orig -> origin/gh/fffrog/144/orig 2025-09-07T07:55:55.9303610Z * [new branch] gh/fffrog/145/base -> origin/gh/fffrog/145/base 2025-09-07T07:55:55.9305431Z * [new branch] gh/fffrog/145/head -> origin/gh/fffrog/145/head 2025-09-07T07:55:55.9306918Z * [new branch] gh/fffrog/145/orig -> origin/gh/fffrog/145/orig 2025-09-07T07:55:55.9309052Z * [new branch] gh/fffrog/146/base -> origin/gh/fffrog/146/base 2025-09-07T07:55:55.9310700Z * [new branch] gh/fffrog/146/head -> origin/gh/fffrog/146/head 2025-09-07T07:55:55.9312182Z * [new branch] gh/fffrog/146/orig -> origin/gh/fffrog/146/orig 2025-09-07T07:55:55.9314843Z * [new branch] gh/fffrog/147/base -> origin/gh/fffrog/147/base 2025-09-07T07:55:55.9316317Z * [new branch] gh/fffrog/147/head -> origin/gh/fffrog/147/head 2025-09-07T07:55:55.9318110Z * [new branch] gh/fffrog/147/orig -> origin/gh/fffrog/147/orig 2025-09-07T07:55:55.9320396Z * [new branch] gh/fffrog/148/base -> origin/gh/fffrog/148/base 2025-09-07T07:55:55.9321996Z * [new branch] gh/fffrog/148/head -> origin/gh/fffrog/148/head 2025-09-07T07:55:55.9323544Z * [new branch] gh/fffrog/148/orig -> origin/gh/fffrog/148/orig 2025-09-07T07:55:55.9326176Z * [new branch] gh/fffrog/149/base -> origin/gh/fffrog/149/base 2025-09-07T07:55:55.9327654Z * [new branch] gh/fffrog/149/head -> origin/gh/fffrog/149/head 2025-09-07T07:55:55.9329252Z * [new branch] gh/fffrog/149/orig -> origin/gh/fffrog/149/orig 2025-09-07T07:55:55.9331460Z * [new branch] gh/fffrog/150/base -> origin/gh/fffrog/150/base 2025-09-07T07:55:55.9332918Z * [new branch] gh/fffrog/150/head -> origin/gh/fffrog/150/head 2025-09-07T07:55:55.9334950Z * [new branch] gh/fffrog/150/orig -> origin/gh/fffrog/150/orig 2025-09-07T07:55:55.9337150Z * [new branch] gh/fffrog/151/base -> origin/gh/fffrog/151/base 2025-09-07T07:55:55.9338785Z * [new branch] gh/fffrog/151/head -> origin/gh/fffrog/151/head 2025-09-07T07:55:55.9340324Z * [new branch] gh/fffrog/151/orig -> origin/gh/fffrog/151/orig 2025-09-07T07:55:55.9342581Z * [new branch] gh/fffrog/152/base -> origin/gh/fffrog/152/base 2025-09-07T07:55:55.9344234Z * [new branch] gh/fffrog/152/head -> origin/gh/fffrog/152/head 2025-09-07T07:55:55.9346624Z * [new branch] gh/fffrog/153/base -> origin/gh/fffrog/153/base 2025-09-07T07:55:55.9348121Z * [new branch] gh/fffrog/153/head -> origin/gh/fffrog/153/head 2025-09-07T07:55:55.9349661Z * [new branch] gh/fffrog/153/orig -> origin/gh/fffrog/153/orig 2025-09-07T07:55:55.9352512Z * [new branch] gh/gmagogsfm/1/base -> origin/gh/gmagogsfm/1/base 2025-09-07T07:55:55.9354205Z * [new branch] gh/gmagogsfm/1/head -> origin/gh/gmagogsfm/1/head 2025-09-07T07:55:55.9356029Z * [new branch] gh/gmagogsfm/1/orig -> origin/gh/gmagogsfm/1/orig 2025-09-07T07:55:55.9358237Z * [new branch] gh/gmagogsfm/2/base -> origin/gh/gmagogsfm/2/base 2025-09-07T07:55:55.9359870Z * [new branch] gh/gmagogsfm/2/head -> origin/gh/gmagogsfm/2/head 2025-09-07T07:55:55.9361401Z * [new branch] gh/gmagogsfm/2/orig -> origin/gh/gmagogsfm/2/orig 2025-09-07T07:55:55.9363505Z * [new branch] gh/gmagogsfm/3/base -> origin/gh/gmagogsfm/3/base 2025-09-07T07:55:55.9365583Z * [new branch] gh/gmagogsfm/3/head -> origin/gh/gmagogsfm/3/head 2025-09-07T07:55:55.9367104Z * [new branch] gh/gmagogsfm/3/orig -> origin/gh/gmagogsfm/3/orig 2025-09-07T07:55:55.9369989Z * [new branch] gh/guangyey/134/base -> origin/gh/guangyey/134/base 2025-09-07T07:55:55.9371384Z * [new branch] gh/guangyey/134/head -> origin/gh/guangyey/134/head 2025-09-07T07:55:55.9372988Z * [new branch] gh/guangyey/134/orig -> origin/gh/guangyey/134/orig 2025-09-07T07:55:55.9375482Z * [new branch] gh/guangyey/135/base -> origin/gh/guangyey/135/base 2025-09-07T07:55:55.9376991Z * [new branch] gh/guangyey/135/head -> origin/gh/guangyey/135/head 2025-09-07T07:55:55.9378563Z * [new branch] gh/guangyey/135/orig -> origin/gh/guangyey/135/orig 2025-09-07T07:55:55.9380785Z * [new branch] gh/guangyey/139/base -> origin/gh/guangyey/139/base 2025-09-07T07:55:55.9382356Z * [new branch] gh/guangyey/139/head -> origin/gh/guangyey/139/head 2025-09-07T07:55:55.9383999Z * [new branch] gh/guangyey/139/orig -> origin/gh/guangyey/139/orig 2025-09-07T07:55:55.9386413Z * [new branch] gh/guangyey/140/base -> origin/gh/guangyey/140/base 2025-09-07T07:55:55.9387957Z * [new branch] gh/guangyey/140/head -> origin/gh/guangyey/140/head 2025-09-07T07:55:55.9389472Z * [new branch] gh/guangyey/140/orig -> origin/gh/guangyey/140/orig 2025-09-07T07:55:55.9391745Z * [new branch] gh/guangyey/142/base -> origin/gh/guangyey/142/base 2025-09-07T07:55:55.9393346Z * [new branch] gh/guangyey/142/head -> origin/gh/guangyey/142/head 2025-09-07T07:55:55.9395234Z * [new branch] gh/guangyey/142/orig -> origin/gh/guangyey/142/orig 2025-09-07T07:55:55.9397440Z * [new branch] gh/guangyey/145/base -> origin/gh/guangyey/145/base 2025-09-07T07:55:55.9398968Z * [new branch] gh/guangyey/145/head -> origin/gh/guangyey/145/head 2025-09-07T07:55:55.9400502Z * [new branch] gh/guangyey/145/orig -> origin/gh/guangyey/145/orig 2025-09-07T07:55:55.9402829Z * [new branch] gh/guangyey/153/base -> origin/gh/guangyey/153/base 2025-09-07T07:55:55.9404721Z * [new branch] gh/guangyey/153/head -> origin/gh/guangyey/153/head 2025-09-07T07:55:55.9406257Z * [new branch] gh/guangyey/153/orig -> origin/gh/guangyey/153/orig 2025-09-07T07:55:55.9408395Z * [new branch] gh/guangyey/159/base -> origin/gh/guangyey/159/base 2025-09-07T07:55:55.9410024Z * [new branch] gh/guangyey/159/head -> origin/gh/guangyey/159/head 2025-09-07T07:55:55.9411552Z * [new branch] gh/guangyey/159/orig -> origin/gh/guangyey/159/orig 2025-09-07T07:55:55.9413818Z * [new branch] gh/guangyey/163/base -> origin/gh/guangyey/163/base 2025-09-07T07:55:55.9415719Z * [new branch] gh/guangyey/163/head -> origin/gh/guangyey/163/head 2025-09-07T07:55:55.9417110Z * [new branch] gh/guangyey/163/orig -> origin/gh/guangyey/163/orig 2025-09-07T07:55:55.9419301Z * [new branch] gh/guangyey/168/base -> origin/gh/guangyey/168/base 2025-09-07T07:55:55.9420831Z * [new branch] gh/guangyey/168/head -> origin/gh/guangyey/168/head 2025-09-07T07:55:55.9422420Z * [new branch] gh/guangyey/168/orig -> origin/gh/guangyey/168/orig 2025-09-07T07:55:55.9425044Z * [new branch] gh/guangyey/169/base -> origin/gh/guangyey/169/base 2025-09-07T07:55:55.9426499Z * [new branch] gh/guangyey/169/head -> origin/gh/guangyey/169/head 2025-09-07T07:55:55.9428076Z * [new branch] gh/guangyey/169/orig -> origin/gh/guangyey/169/orig 2025-09-07T07:55:55.9430369Z * [new branch] gh/guangyey/170/base -> origin/gh/guangyey/170/base 2025-09-07T07:55:55.9431899Z * [new branch] gh/guangyey/170/head -> origin/gh/guangyey/170/head 2025-09-07T07:55:55.9433634Z * [new branch] gh/guangyey/170/orig -> origin/gh/guangyey/170/orig 2025-09-07T07:55:55.9436147Z * [new branch] gh/guangyey/171/base -> origin/gh/guangyey/171/base 2025-09-07T07:55:55.9437719Z * [new branch] gh/guangyey/171/head -> origin/gh/guangyey/171/head 2025-09-07T07:55:55.9439209Z * [new branch] gh/guangyey/171/orig -> origin/gh/guangyey/171/orig 2025-09-07T07:55:55.9441419Z * [new branch] gh/guangyey/174/base -> origin/gh/guangyey/174/base 2025-09-07T07:55:55.9443053Z * [new branch] gh/guangyey/174/head -> origin/gh/guangyey/174/head 2025-09-07T07:55:55.9445001Z * [new branch] gh/guangyey/174/orig -> origin/gh/guangyey/174/orig 2025-09-07T07:55:55.9447217Z * [new branch] gh/guangyey/176/base -> origin/gh/guangyey/176/base 2025-09-07T07:55:55.9448980Z * [new branch] gh/guangyey/176/head -> origin/gh/guangyey/176/head 2025-09-07T07:55:55.9450570Z * [new branch] gh/guangyey/176/orig -> origin/gh/guangyey/176/orig 2025-09-07T07:55:55.9452786Z * [new branch] gh/guangyey/178/base -> origin/gh/guangyey/178/base 2025-09-07T07:55:55.9454589Z * [new branch] gh/guangyey/178/head -> origin/gh/guangyey/178/head 2025-09-07T07:55:55.9456216Z * [new branch] gh/guangyey/178/orig -> origin/gh/guangyey/178/orig 2025-09-07T07:55:55.9458489Z * [new branch] gh/guangyey/181/base -> origin/gh/guangyey/181/base 2025-09-07T07:55:55.9460005Z * [new branch] gh/guangyey/181/head -> origin/gh/guangyey/181/head 2025-09-07T07:55:55.9461461Z * [new branch] gh/guangyey/181/orig -> origin/gh/guangyey/181/orig 2025-09-07T07:55:55.9463660Z * [new branch] gh/guangyey/182/base -> origin/gh/guangyey/182/base 2025-09-07T07:55:55.9465700Z * [new branch] gh/guangyey/182/head -> origin/gh/guangyey/182/head 2025-09-07T07:55:55.9467260Z * [new branch] gh/guangyey/182/orig -> origin/gh/guangyey/182/orig 2025-09-07T07:55:55.9469438Z * [new branch] gh/guangyey/183/base -> origin/gh/guangyey/183/base 2025-09-07T07:55:55.9471017Z * [new branch] gh/guangyey/183/head -> origin/gh/guangyey/183/head 2025-09-07T07:55:55.9472567Z * [new branch] gh/guangyey/183/orig -> origin/gh/guangyey/183/orig 2025-09-07T07:55:55.9475173Z * [new branch] gh/guangyey/184/base -> origin/gh/guangyey/184/base 2025-09-07T07:55:55.9476646Z * [new branch] gh/guangyey/184/head -> origin/gh/guangyey/184/head 2025-09-07T07:55:55.9478324Z * [new branch] gh/guangyey/184/orig -> origin/gh/guangyey/184/orig 2025-09-07T07:55:55.9480573Z * [new branch] gh/guangyey/185/base -> origin/gh/guangyey/185/base 2025-09-07T07:55:55.9482106Z * [new branch] gh/guangyey/185/head -> origin/gh/guangyey/185/head 2025-09-07T07:55:55.9483642Z * [new branch] gh/guangyey/185/orig -> origin/gh/guangyey/185/orig 2025-09-07T07:55:55.9486337Z * [new branch] gh/guangyey/186/base -> origin/gh/guangyey/186/base 2025-09-07T07:55:55.9487781Z * [new branch] gh/guangyey/186/head -> origin/gh/guangyey/186/head 2025-09-07T07:55:55.9489242Z * [new branch] gh/guangyey/186/orig -> origin/gh/guangyey/186/orig 2025-09-07T07:55:55.9491526Z * [new branch] gh/guangyey/187/base -> origin/gh/guangyey/187/base 2025-09-07T07:55:55.9493025Z * [new branch] gh/guangyey/187/head -> origin/gh/guangyey/187/head 2025-09-07T07:55:55.9494930Z * [new branch] gh/guangyey/187/orig -> origin/gh/guangyey/187/orig 2025-09-07T07:55:55.9497159Z * [new branch] gh/guangyey/188/base -> origin/gh/guangyey/188/base 2025-09-07T07:55:55.9498946Z * [new branch] gh/guangyey/188/head -> origin/gh/guangyey/188/head 2025-09-07T07:55:55.9500269Z * [new branch] gh/guangyey/188/orig -> origin/gh/guangyey/188/orig 2025-09-07T07:55:55.9502622Z * [new branch] gh/guangyey/189/base -> origin/gh/guangyey/189/base 2025-09-07T07:55:55.9504617Z * [new branch] gh/guangyey/189/head -> origin/gh/guangyey/189/head 2025-09-07T07:55:55.9506179Z * [new branch] gh/guangyey/189/orig -> origin/gh/guangyey/189/orig 2025-09-07T07:55:55.9508409Z * [new branch] gh/guangyey/190/base -> origin/gh/guangyey/190/base 2025-09-07T07:55:55.9509944Z * [new branch] gh/guangyey/190/head -> origin/gh/guangyey/190/head 2025-09-07T07:55:55.9511447Z * [new branch] gh/guangyey/190/orig -> origin/gh/guangyey/190/orig 2025-09-07T07:55:55.9513860Z * [new branch] gh/guangyey/191/base -> origin/gh/guangyey/191/base 2025-09-07T07:55:55.9515614Z * [new branch] gh/guangyey/191/head -> origin/gh/guangyey/191/head 2025-09-07T07:55:55.9517188Z * [new branch] gh/guangyey/191/orig -> origin/gh/guangyey/191/orig 2025-09-07T07:55:55.9519504Z * [new branch] gh/guangyey/192/base -> origin/gh/guangyey/192/base 2025-09-07T07:55:55.9520994Z * [new branch] gh/guangyey/192/head -> origin/gh/guangyey/192/head 2025-09-07T07:55:55.9522628Z * [new branch] gh/guangyey/192/orig -> origin/gh/guangyey/192/orig 2025-09-07T07:55:55.9525291Z * [new branch] gh/guangyey/193/base -> origin/gh/guangyey/193/base 2025-09-07T07:55:55.9526912Z * [new branch] gh/guangyey/193/head -> origin/gh/guangyey/193/head 2025-09-07T07:55:55.9528341Z * [new branch] gh/guangyey/193/orig -> origin/gh/guangyey/193/orig 2025-09-07T07:55:55.9530584Z * [new branch] gh/guangyey/194/base -> origin/gh/guangyey/194/base 2025-09-07T07:55:55.9532081Z * [new branch] gh/guangyey/194/head -> origin/gh/guangyey/194/head 2025-09-07T07:55:55.9533668Z * [new branch] gh/guangyey/194/orig -> origin/gh/guangyey/194/orig 2025-09-07T07:55:55.9536361Z * [new branch] gh/guangyey/195/base -> origin/gh/guangyey/195/base 2025-09-07T07:55:55.9537937Z * [new branch] gh/guangyey/195/head -> origin/gh/guangyey/195/head 2025-09-07T07:55:55.9539448Z * [new branch] gh/guangyey/195/orig -> origin/gh/guangyey/195/orig 2025-09-07T07:55:55.9541966Z * [new branch] gh/guangyey/196/base -> origin/gh/guangyey/196/base 2025-09-07T07:55:55.9543478Z * [new branch] gh/guangyey/196/head -> origin/gh/guangyey/196/head 2025-09-07T07:55:55.9545575Z * [new branch] gh/guangyey/196/orig -> origin/gh/guangyey/196/orig 2025-09-07T07:55:55.9547688Z * [new branch] gh/guangyey/197/base -> origin/gh/guangyey/197/base 2025-09-07T07:55:55.9549215Z * [new branch] gh/guangyey/197/head -> origin/gh/guangyey/197/head 2025-09-07T07:55:55.9550728Z * [new branch] gh/guangyey/197/orig -> origin/gh/guangyey/197/orig 2025-09-07T07:55:55.9552998Z * [new branch] gh/guangyey/198/base -> origin/gh/guangyey/198/base 2025-09-07T07:55:55.9554903Z * [new branch] gh/guangyey/198/head -> origin/gh/guangyey/198/head 2025-09-07T07:55:55.9556621Z * [new branch] gh/guangyey/198/orig -> origin/gh/guangyey/198/orig 2025-09-07T07:55:55.9558833Z * [new branch] gh/guangyey/199/base -> origin/gh/guangyey/199/base 2025-09-07T07:55:55.9560407Z * [new branch] gh/guangyey/199/head -> origin/gh/guangyey/199/head 2025-09-07T07:55:55.9561946Z * [new branch] gh/guangyey/199/orig -> origin/gh/guangyey/199/orig 2025-09-07T07:55:55.9564658Z * [new branch] gh/guangyey/200/base -> origin/gh/guangyey/200/base 2025-09-07T07:55:55.9566143Z * [new branch] gh/guangyey/200/head -> origin/gh/guangyey/200/head 2025-09-07T07:55:55.9567638Z * [new branch] gh/guangyey/200/orig -> origin/gh/guangyey/200/orig 2025-09-07T07:55:55.9569886Z * [new branch] gh/guangyey/201/base -> origin/gh/guangyey/201/base 2025-09-07T07:55:55.9571560Z * [new branch] gh/guangyey/201/head -> origin/gh/guangyey/201/head 2025-09-07T07:55:55.9573139Z * [new branch] gh/guangyey/201/orig -> origin/gh/guangyey/201/orig 2025-09-07T07:55:55.9575770Z * [new branch] gh/guangyey/202/base -> origin/gh/guangyey/202/base 2025-09-07T07:55:55.9577249Z * [new branch] gh/guangyey/202/head -> origin/gh/guangyey/202/head 2025-09-07T07:55:55.9578847Z * [new branch] gh/guangyey/202/orig -> origin/gh/guangyey/202/orig 2025-09-07T07:55:55.9581142Z * [new branch] gh/guangyey/203/base -> origin/gh/guangyey/203/base 2025-09-07T07:55:55.9582691Z * [new branch] gh/guangyey/203/head -> origin/gh/guangyey/203/head 2025-09-07T07:55:55.9584564Z * [new branch] gh/guangyey/203/orig -> origin/gh/guangyey/203/orig 2025-09-07T07:55:55.9586833Z * [new branch] gh/guangyey/204/base -> origin/gh/guangyey/204/base 2025-09-07T07:55:55.9588363Z * [new branch] gh/guangyey/204/head -> origin/gh/guangyey/204/head 2025-09-07T07:55:55.9589916Z * [new branch] gh/guangyey/204/orig -> origin/gh/guangyey/204/orig 2025-09-07T07:55:55.9592125Z * [new branch] gh/guangyey/205/base -> origin/gh/guangyey/205/base 2025-09-07T07:55:55.9593667Z * [new branch] gh/guangyey/205/head -> origin/gh/guangyey/205/head 2025-09-07T07:55:55.9595539Z * [new branch] gh/guangyey/205/orig -> origin/gh/guangyey/205/orig 2025-09-07T07:55:55.9597915Z * [new branch] gh/guangyey/206/base -> origin/gh/guangyey/206/base 2025-09-07T07:55:55.9599480Z * [new branch] gh/guangyey/206/head -> origin/gh/guangyey/206/head 2025-09-07T07:55:55.9601020Z * [new branch] gh/guangyey/206/orig -> origin/gh/guangyey/206/orig 2025-09-07T07:55:55.9603308Z * [new branch] gh/guangyey/207/base -> origin/gh/guangyey/207/base 2025-09-07T07:55:55.9605201Z * [new branch] gh/guangyey/207/head -> origin/gh/guangyey/207/head 2025-09-07T07:55:55.9606636Z * [new branch] gh/guangyey/207/orig -> origin/gh/guangyey/207/orig 2025-09-07T07:55:55.9608901Z * [new branch] gh/guangyey/79/base -> origin/gh/guangyey/79/base 2025-09-07T07:55:55.9610443Z * [new branch] gh/guangyey/79/head -> origin/gh/guangyey/79/head 2025-09-07T07:55:55.9612042Z * [new branch] gh/guangyey/79/orig -> origin/gh/guangyey/79/orig 2025-09-07T07:55:55.9614560Z * [new branch] gh/guangyey/89/base -> origin/gh/guangyey/89/base 2025-09-07T07:55:55.9616338Z * [new branch] gh/guangyey/89/head -> origin/gh/guangyey/89/head 2025-09-07T07:55:55.9617709Z * [new branch] gh/guangyey/89/orig -> origin/gh/guangyey/89/orig 2025-09-07T07:55:55.9620522Z * [new branch] gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base 2025-09-07T07:55:55.9622010Z * [new branch] gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head 2025-09-07T07:55:55.9623550Z * [new branch] gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig 2025-09-07T07:55:55.9626071Z * [new branch] gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base 2025-09-07T07:55:55.9627635Z * [new branch] gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head 2025-09-07T07:55:55.9629344Z * [new branch] gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig 2025-09-07T07:55:55.9631437Z * [new branch] gh/guilhermeleobas/124/base -> origin/gh/guilhermeleobas/124/base 2025-09-07T07:55:55.9632952Z * [new branch] gh/guilhermeleobas/124/head -> origin/gh/guilhermeleobas/124/head 2025-09-07T07:55:55.9635302Z * [new branch] gh/guilhermeleobas/124/orig -> origin/gh/guilhermeleobas/124/orig 2025-09-07T07:55:55.9637787Z * [new branch] gh/guilhermeleobas/147/base -> origin/gh/guilhermeleobas/147/base 2025-09-07T07:55:55.9639206Z * [new branch] gh/guilhermeleobas/147/head -> origin/gh/guilhermeleobas/147/head 2025-09-07T07:55:55.9640847Z * [new branch] gh/guilhermeleobas/147/orig -> origin/gh/guilhermeleobas/147/orig 2025-09-07T07:55:55.9643042Z * [new branch] gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base 2025-09-07T07:55:55.9645029Z * [new branch] gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head 2025-09-07T07:55:55.9646555Z * [new branch] gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig 2025-09-07T07:55:55.9648710Z * [new branch] gh/guilhermeleobas/163/base -> origin/gh/guilhermeleobas/163/base 2025-09-07T07:55:55.9650210Z * [new branch] gh/guilhermeleobas/163/head -> origin/gh/guilhermeleobas/163/head 2025-09-07T07:55:55.9651821Z * [new branch] gh/guilhermeleobas/163/orig -> origin/gh/guilhermeleobas/163/orig 2025-09-07T07:55:55.9654208Z * [new branch] gh/guilhermeleobas/164/base -> origin/gh/guilhermeleobas/164/base 2025-09-07T07:55:55.9655967Z * [new branch] gh/guilhermeleobas/164/head -> origin/gh/guilhermeleobas/164/head 2025-09-07T07:55:55.9657400Z * [new branch] gh/guilhermeleobas/164/orig -> origin/gh/guilhermeleobas/164/orig 2025-09-07T07:55:55.9659664Z * [new branch] gh/guilhermeleobas/165/base -> origin/gh/guilhermeleobas/165/base 2025-09-07T07:55:55.9661232Z * [new branch] gh/guilhermeleobas/165/head -> origin/gh/guilhermeleobas/165/head 2025-09-07T07:55:55.9662728Z * [new branch] gh/guilhermeleobas/165/orig -> origin/gh/guilhermeleobas/165/orig 2025-09-07T07:55:55.9665380Z * [new branch] gh/guilhermeleobas/166/base -> origin/gh/guilhermeleobas/166/base 2025-09-07T07:55:55.9666873Z * [new branch] gh/guilhermeleobas/166/head -> origin/gh/guilhermeleobas/166/head 2025-09-07T07:55:55.9668423Z * [new branch] gh/guilhermeleobas/166/orig -> origin/gh/guilhermeleobas/166/orig 2025-09-07T07:55:55.9670649Z * [new branch] gh/guilhermeleobas/167/base -> origin/gh/guilhermeleobas/167/base 2025-09-07T07:55:55.9672231Z * [new branch] gh/guilhermeleobas/167/head -> origin/gh/guilhermeleobas/167/head 2025-09-07T07:55:55.9673902Z * [new branch] gh/guilhermeleobas/167/orig -> origin/gh/guilhermeleobas/167/orig 2025-09-07T07:55:55.9676708Z * [new branch] gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base 2025-09-07T07:55:55.9678272Z * [new branch] gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head 2025-09-07T07:55:55.9679819Z * [new branch] gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig 2025-09-07T07:55:55.9682030Z * [new branch] gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base 2025-09-07T07:55:55.9683578Z * [new branch] gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head 2025-09-07T07:55:55.9685503Z * [new branch] gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig 2025-09-07T07:55:55.9687690Z * [new branch] gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base 2025-09-07T07:55:55.9689228Z * [new branch] gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head 2025-09-07T07:55:55.9691016Z * [new branch] gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig 2025-09-07T07:55:55.9693130Z * [new branch] gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base 2025-09-07T07:55:55.9695032Z * [new branch] gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head 2025-09-07T07:55:55.9696564Z * [new branch] gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig 2025-09-07T07:55:55.9698834Z * [new branch] gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base 2025-09-07T07:55:55.9700343Z * [new branch] gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head 2025-09-07T07:55:55.9701844Z * [new branch] gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig 2025-09-07T07:55:55.9704267Z * [new branch] gh/guilhermeleobas/192/base -> origin/gh/guilhermeleobas/192/base 2025-09-07T07:55:55.9706016Z * [new branch] gh/guilhermeleobas/192/head -> origin/gh/guilhermeleobas/192/head 2025-09-07T07:55:55.9707586Z * [new branch] gh/guilhermeleobas/192/orig -> origin/gh/guilhermeleobas/192/orig 2025-09-07T07:55:55.9709902Z * [new branch] gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base 2025-09-07T07:55:55.9711453Z * [new branch] gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head 2025-09-07T07:55:55.9713048Z * [new branch] gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig 2025-09-07T07:55:55.9715615Z * [new branch] gh/guilhermeleobas/194/base -> origin/gh/guilhermeleobas/194/base 2025-09-07T07:55:55.9717320Z * [new branch] gh/guilhermeleobas/194/head -> origin/gh/guilhermeleobas/194/head 2025-09-07T07:55:55.9718960Z * [new branch] gh/guilhermeleobas/194/orig -> origin/gh/guilhermeleobas/194/orig 2025-09-07T07:55:55.9721174Z * [new branch] gh/guilhermeleobas/203/base -> origin/gh/guilhermeleobas/203/base 2025-09-07T07:55:55.9722681Z * [new branch] gh/guilhermeleobas/203/head -> origin/gh/guilhermeleobas/203/head 2025-09-07T07:55:55.9724518Z * [new branch] gh/guilhermeleobas/203/orig -> origin/gh/guilhermeleobas/203/orig 2025-09-07T07:55:55.9726841Z * [new branch] gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base 2025-09-07T07:55:55.9728566Z * [new branch] gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head 2025-09-07T07:55:55.9730150Z * [new branch] gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig 2025-09-07T07:55:55.9732407Z * [new branch] gh/guilhermeleobas/205/base -> origin/gh/guilhermeleobas/205/base 2025-09-07T07:55:55.9734056Z * [new branch] gh/guilhermeleobas/205/head -> origin/gh/guilhermeleobas/205/head 2025-09-07T07:55:55.9735820Z * [new branch] gh/guilhermeleobas/205/orig -> origin/gh/guilhermeleobas/205/orig 2025-09-07T07:55:55.9738121Z * [new branch] gh/guilhermeleobas/209/base -> origin/gh/guilhermeleobas/209/base 2025-09-07T07:55:55.9739690Z * [new branch] gh/guilhermeleobas/209/head -> origin/gh/guilhermeleobas/209/head 2025-09-07T07:55:55.9741222Z * [new branch] gh/guilhermeleobas/209/orig -> origin/gh/guilhermeleobas/209/orig 2025-09-07T07:55:55.9743451Z * [new branch] gh/guilhermeleobas/210/base -> origin/gh/guilhermeleobas/210/base 2025-09-07T07:55:55.9745424Z * [new branch] gh/guilhermeleobas/210/head -> origin/gh/guilhermeleobas/210/head 2025-09-07T07:55:55.9746866Z * [new branch] gh/guilhermeleobas/210/orig -> origin/gh/guilhermeleobas/210/orig 2025-09-07T07:55:55.9749187Z * [new branch] gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base 2025-09-07T07:55:55.9750786Z * [new branch] gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head 2025-09-07T07:55:55.9752488Z * [new branch] gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig 2025-09-07T07:55:55.9755126Z * [new branch] gh/guilhermeleobas/214/base -> origin/gh/guilhermeleobas/214/base 2025-09-07T07:55:55.9756628Z * [new branch] gh/guilhermeleobas/214/head -> origin/gh/guilhermeleobas/214/head 2025-09-07T07:55:55.9758290Z * [new branch] gh/guilhermeleobas/214/orig -> origin/gh/guilhermeleobas/214/orig 2025-09-07T07:55:55.9760588Z * [new branch] gh/guilhermeleobas/215/base -> origin/gh/guilhermeleobas/215/base 2025-09-07T07:55:55.9762127Z * [new branch] gh/guilhermeleobas/215/head -> origin/gh/guilhermeleobas/215/head 2025-09-07T07:55:55.9763681Z * [new branch] gh/guilhermeleobas/215/orig -> origin/gh/guilhermeleobas/215/orig 2025-09-07T07:55:55.9766342Z * [new branch] gh/guilhermeleobas/216/base -> origin/gh/guilhermeleobas/216/base 2025-09-07T07:55:55.9767801Z * [new branch] gh/guilhermeleobas/216/head -> origin/gh/guilhermeleobas/216/head 2025-09-07T07:55:55.9769443Z * [new branch] gh/guilhermeleobas/216/orig -> origin/gh/guilhermeleobas/216/orig 2025-09-07T07:55:55.9771789Z * [new branch] gh/guilhermeleobas/217/base -> origin/gh/guilhermeleobas/217/base 2025-09-07T07:55:55.9773332Z * [new branch] gh/guilhermeleobas/217/head -> origin/gh/guilhermeleobas/217/head 2025-09-07T07:55:55.9775244Z * [new branch] gh/guilhermeleobas/217/orig -> origin/gh/guilhermeleobas/217/orig 2025-09-07T07:55:55.9777446Z * [new branch] gh/guilhermeleobas/219/base -> origin/gh/guilhermeleobas/219/base 2025-09-07T07:55:55.9778991Z * [new branch] gh/guilhermeleobas/219/head -> origin/gh/guilhermeleobas/219/head 2025-09-07T07:55:55.9780529Z * [new branch] gh/guilhermeleobas/219/orig -> origin/gh/guilhermeleobas/219/orig 2025-09-07T07:55:55.9782829Z * [new branch] gh/guilhermeleobas/220/base -> origin/gh/guilhermeleobas/220/base 2025-09-07T07:55:55.9784743Z * [new branch] gh/guilhermeleobas/220/head -> origin/gh/guilhermeleobas/220/head 2025-09-07T07:55:55.9786342Z * [new branch] gh/guilhermeleobas/220/orig -> origin/gh/guilhermeleobas/220/orig 2025-09-07T07:55:55.9788593Z * [new branch] gh/guilhermeleobas/221/base -> origin/gh/guilhermeleobas/221/base 2025-09-07T07:55:55.9790138Z * [new branch] gh/guilhermeleobas/221/head -> origin/gh/guilhermeleobas/221/head 2025-09-07T07:55:55.9791720Z * [new branch] gh/guilhermeleobas/221/orig -> origin/gh/guilhermeleobas/221/orig 2025-09-07T07:55:55.9794173Z * [new branch] gh/guilhermeleobas/222/base -> origin/gh/guilhermeleobas/222/base 2025-09-07T07:55:55.9795841Z * [new branch] gh/guilhermeleobas/222/head -> origin/gh/guilhermeleobas/222/head 2025-09-07T07:55:55.9797423Z * [new branch] gh/guilhermeleobas/222/orig -> origin/gh/guilhermeleobas/222/orig 2025-09-07T07:55:55.9799794Z * [new branch] gh/guilhermeleobas/223/base -> origin/gh/guilhermeleobas/223/base 2025-09-07T07:55:55.9801360Z * [new branch] gh/guilhermeleobas/223/head -> origin/gh/guilhermeleobas/223/head 2025-09-07T07:55:55.9803001Z * [new branch] gh/guilhermeleobas/223/orig -> origin/gh/guilhermeleobas/223/orig 2025-09-07T07:55:55.9805696Z * [new branch] gh/guilhermeleobas/224/base -> origin/gh/guilhermeleobas/224/base 2025-09-07T07:55:55.9807177Z * [new branch] gh/guilhermeleobas/224/head -> origin/gh/guilhermeleobas/224/head 2025-09-07T07:55:55.9808715Z * [new branch] gh/guilhermeleobas/224/orig -> origin/gh/guilhermeleobas/224/orig 2025-09-07T07:55:55.9810983Z * [new branch] gh/guilhermeleobas/225/base -> origin/gh/guilhermeleobas/225/base 2025-09-07T07:55:55.9812495Z * [new branch] gh/guilhermeleobas/225/head -> origin/gh/guilhermeleobas/225/head 2025-09-07T07:55:55.9814364Z * [new branch] gh/guilhermeleobas/225/orig -> origin/gh/guilhermeleobas/225/orig 2025-09-07T07:55:55.9816615Z * [new branch] gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base 2025-09-07T07:55:55.9818109Z * [new branch] gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head 2025-09-07T07:55:55.9819684Z * [new branch] gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig 2025-09-07T07:55:55.9822293Z * [new branch] gh/guilhermeleobas/227/base -> origin/gh/guilhermeleobas/227/base 2025-09-07T07:55:55.9824055Z * [new branch] gh/guilhermeleobas/227/head -> origin/gh/guilhermeleobas/227/head 2025-09-07T07:55:55.9825878Z * [new branch] gh/guilhermeleobas/227/orig -> origin/gh/guilhermeleobas/227/orig 2025-09-07T07:55:55.9828099Z * [new branch] gh/guilhermeleobas/228/base -> origin/gh/guilhermeleobas/228/base 2025-09-07T07:55:55.9829607Z * [new branch] gh/guilhermeleobas/228/head -> origin/gh/guilhermeleobas/228/head 2025-09-07T07:55:55.9831059Z * [new branch] gh/guilhermeleobas/228/orig -> origin/gh/guilhermeleobas/228/orig 2025-09-07T07:55:55.9833347Z * [new branch] gh/guilhermeleobas/229/base -> origin/gh/guilhermeleobas/229/base 2025-09-07T07:55:55.9835289Z * [new branch] gh/guilhermeleobas/229/head -> origin/gh/guilhermeleobas/229/head 2025-09-07T07:55:55.9836875Z * [new branch] gh/guilhermeleobas/229/orig -> origin/gh/guilhermeleobas/229/orig 2025-09-07T07:55:55.9839348Z * [new branch] gh/guilhermeleobas/230/base -> origin/gh/guilhermeleobas/230/base 2025-09-07T07:55:55.9840873Z * [new branch] gh/guilhermeleobas/230/head -> origin/gh/guilhermeleobas/230/head 2025-09-07T07:55:55.9842479Z * [new branch] gh/guilhermeleobas/230/orig -> origin/gh/guilhermeleobas/230/orig 2025-09-07T07:55:55.9845119Z * [new branch] gh/guilhermeleobas/231/base -> origin/gh/guilhermeleobas/231/base 2025-09-07T07:55:55.9846675Z * [new branch] gh/guilhermeleobas/231/head -> origin/gh/guilhermeleobas/231/head 2025-09-07T07:55:55.9848255Z * [new branch] gh/guilhermeleobas/231/orig -> origin/gh/guilhermeleobas/231/orig 2025-09-07T07:55:55.9850539Z * [new branch] gh/guilhermeleobas/232/base -> origin/gh/guilhermeleobas/232/base 2025-09-07T07:55:55.9852108Z * [new branch] gh/guilhermeleobas/232/head -> origin/gh/guilhermeleobas/232/head 2025-09-07T07:55:55.9853624Z * [new branch] gh/guilhermeleobas/232/orig -> origin/gh/guilhermeleobas/232/orig 2025-09-07T07:55:55.9856351Z * [new branch] gh/guilhermeleobas/233/base -> origin/gh/guilhermeleobas/233/base 2025-09-07T07:55:55.9857686Z * [new branch] gh/guilhermeleobas/233/head -> origin/gh/guilhermeleobas/233/head 2025-09-07T07:55:55.9859259Z * [new branch] gh/guilhermeleobas/233/orig -> origin/gh/guilhermeleobas/233/orig 2025-09-07T07:55:55.9861668Z * [new branch] gh/guilhermeleobas/234/base -> origin/gh/guilhermeleobas/234/base 2025-09-07T07:55:55.9863203Z * [new branch] gh/guilhermeleobas/234/head -> origin/gh/guilhermeleobas/234/head 2025-09-07T07:55:55.9865208Z * [new branch] gh/guilhermeleobas/234/orig -> origin/gh/guilhermeleobas/234/orig 2025-09-07T07:55:55.9867412Z * [new branch] gh/guilhermeleobas/235/base -> origin/gh/guilhermeleobas/235/base 2025-09-07T07:55:55.9868977Z * [new branch] gh/guilhermeleobas/235/head -> origin/gh/guilhermeleobas/235/head 2025-09-07T07:55:55.9870530Z * [new branch] gh/guilhermeleobas/235/orig -> origin/gh/guilhermeleobas/235/orig 2025-09-07T07:55:55.9872855Z * [new branch] gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base 2025-09-07T07:55:55.9874745Z * [new branch] gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head 2025-09-07T07:55:55.9876418Z * [new branch] gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig 2025-09-07T07:55:55.9878635Z * [new branch] gh/guilhermeleobas/237/base -> origin/gh/guilhermeleobas/237/base 2025-09-07T07:55:55.9880270Z * [new branch] gh/guilhermeleobas/237/head -> origin/gh/guilhermeleobas/237/head 2025-09-07T07:55:55.9881771Z * [new branch] gh/guilhermeleobas/237/orig -> origin/gh/guilhermeleobas/237/orig 2025-09-07T07:55:55.9884187Z * [new branch] gh/guilhermeleobas/238/base -> origin/gh/guilhermeleobas/238/base 2025-09-07T07:55:55.9885975Z * [new branch] gh/guilhermeleobas/238/head -> origin/gh/guilhermeleobas/238/head 2025-09-07T07:55:55.9887540Z * [new branch] gh/guilhermeleobas/238/orig -> origin/gh/guilhermeleobas/238/orig 2025-09-07T07:55:55.9889817Z * [new branch] gh/guilhermeleobas/239/base -> origin/gh/guilhermeleobas/239/base 2025-09-07T07:55:55.9891366Z * [new branch] gh/guilhermeleobas/239/head -> origin/gh/guilhermeleobas/239/head 2025-09-07T07:55:55.9892933Z * [new branch] gh/guilhermeleobas/239/orig -> origin/gh/guilhermeleobas/239/orig 2025-09-07T07:55:55.9895737Z * [new branch] gh/guilhermeleobas/240/base -> origin/gh/guilhermeleobas/240/base 2025-09-07T07:55:55.9897199Z * [new branch] gh/guilhermeleobas/240/head -> origin/gh/guilhermeleobas/240/head 2025-09-07T07:55:55.9898736Z * [new branch] gh/guilhermeleobas/240/orig -> origin/gh/guilhermeleobas/240/orig 2025-09-07T07:55:55.9901114Z * [new branch] gh/guilhermeleobas/241/base -> origin/gh/guilhermeleobas/241/base 2025-09-07T07:55:55.9902727Z * [new branch] gh/guilhermeleobas/241/head -> origin/gh/guilhermeleobas/241/head 2025-09-07T07:55:55.9904576Z * [new branch] gh/guilhermeleobas/241/orig -> origin/gh/guilhermeleobas/241/orig 2025-09-07T07:55:55.9906940Z * [new branch] gh/guilhermeleobas/242/base -> origin/gh/guilhermeleobas/242/base 2025-09-07T07:55:55.9908529Z * [new branch] gh/guilhermeleobas/242/head -> origin/gh/guilhermeleobas/242/head 2025-09-07T07:55:55.9910020Z * [new branch] gh/guilhermeleobas/242/orig -> origin/gh/guilhermeleobas/242/orig 2025-09-07T07:55:55.9912396Z * [new branch] gh/guilhermeleobas/243/base -> origin/gh/guilhermeleobas/243/base 2025-09-07T07:55:55.9914183Z * [new branch] gh/guilhermeleobas/243/head -> origin/gh/guilhermeleobas/243/head 2025-09-07T07:55:55.9916176Z * [new branch] gh/guilhermeleobas/243/orig -> origin/gh/guilhermeleobas/243/orig 2025-09-07T07:55:55.9918640Z * [new branch] gh/guilhermeleobas/244/base -> origin/gh/guilhermeleobas/244/base 2025-09-07T07:55:55.9920203Z * [new branch] gh/guilhermeleobas/244/head -> origin/gh/guilhermeleobas/244/head 2025-09-07T07:55:55.9921687Z * [new branch] gh/guilhermeleobas/244/orig -> origin/gh/guilhermeleobas/244/orig 2025-09-07T07:55:55.9924135Z * [new branch] gh/guilhermeleobas/245/base -> origin/gh/guilhermeleobas/245/base 2025-09-07T07:55:55.9925906Z * [new branch] gh/guilhermeleobas/245/head -> origin/gh/guilhermeleobas/245/head 2025-09-07T07:55:55.9927405Z * [new branch] gh/guilhermeleobas/245/orig -> origin/gh/guilhermeleobas/245/orig 2025-09-07T07:55:55.9929823Z * [new branch] gh/guilhermeleobas/73/base -> origin/gh/guilhermeleobas/73/base 2025-09-07T07:55:55.9931328Z * [new branch] gh/guilhermeleobas/73/head -> origin/gh/guilhermeleobas/73/head 2025-09-07T07:55:55.9932913Z * [new branch] gh/guilhermeleobas/73/orig -> origin/gh/guilhermeleobas/73/orig 2025-09-07T07:55:55.9936092Z * [new branch] gh/henrylhtsang/140/base -> origin/gh/henrylhtsang/140/base 2025-09-07T07:55:55.9937662Z * [new branch] gh/henrylhtsang/140/head -> origin/gh/henrylhtsang/140/head 2025-09-07T07:55:55.9939419Z * [new branch] gh/henrylhtsang/140/orig -> origin/gh/henrylhtsang/140/orig 2025-09-07T07:55:55.9941516Z * [new branch] gh/henrylhtsang/141/base -> origin/gh/henrylhtsang/141/base 2025-09-07T07:55:55.9943060Z * [new branch] gh/henrylhtsang/141/head -> origin/gh/henrylhtsang/141/head 2025-09-07T07:55:55.9944950Z * [new branch] gh/henrylhtsang/141/orig -> origin/gh/henrylhtsang/141/orig 2025-09-07T07:55:55.9947415Z * [new branch] gh/henrylhtsang/142/base -> origin/gh/henrylhtsang/142/base 2025-09-07T07:55:55.9949053Z * [new branch] gh/henrylhtsang/142/head -> origin/gh/henrylhtsang/142/head 2025-09-07T07:55:55.9950665Z * [new branch] gh/henrylhtsang/142/orig -> origin/gh/henrylhtsang/142/orig 2025-09-07T07:55:55.9952961Z * [new branch] gh/henrylhtsang/143/base -> origin/gh/henrylhtsang/143/base 2025-09-07T07:55:55.9954901Z * [new branch] gh/henrylhtsang/143/head -> origin/gh/henrylhtsang/143/head 2025-09-07T07:55:55.9956431Z * [new branch] gh/henrylhtsang/143/orig -> origin/gh/henrylhtsang/143/orig 2025-09-07T07:55:55.9958841Z * [new branch] gh/henrylhtsang/144/base -> origin/gh/henrylhtsang/144/base 2025-09-07T07:55:55.9960376Z * [new branch] gh/henrylhtsang/144/head -> origin/gh/henrylhtsang/144/head 2025-09-07T07:55:55.9962023Z * [new branch] gh/henrylhtsang/144/orig -> origin/gh/henrylhtsang/144/orig 2025-09-07T07:55:55.9964539Z * [new branch] gh/henrylhtsang/145/base -> origin/gh/henrylhtsang/145/base 2025-09-07T07:55:55.9966107Z * [new branch] gh/henrylhtsang/145/head -> origin/gh/henrylhtsang/145/head 2025-09-07T07:55:55.9967627Z * [new branch] gh/henrylhtsang/145/orig -> origin/gh/henrylhtsang/145/orig 2025-09-07T07:55:55.9969943Z * [new branch] gh/henrylhtsang/146/base -> origin/gh/henrylhtsang/146/base 2025-09-07T07:55:55.9971602Z * [new branch] gh/henrylhtsang/146/head -> origin/gh/henrylhtsang/146/head 2025-09-07T07:55:55.9973129Z * [new branch] gh/henrylhtsang/146/orig -> origin/gh/henrylhtsang/146/orig 2025-09-07T07:55:55.9975795Z * [new branch] gh/henrylhtsang/147/base -> origin/gh/henrylhtsang/147/base 2025-09-07T07:55:55.9977292Z * [new branch] gh/henrylhtsang/147/head -> origin/gh/henrylhtsang/147/head 2025-09-07T07:55:55.9978795Z * [new branch] gh/henrylhtsang/147/orig -> origin/gh/henrylhtsang/147/orig 2025-09-07T07:55:55.9981268Z * [new branch] gh/henrylhtsang/148/base -> origin/gh/henrylhtsang/148/base 2025-09-07T07:55:55.9982927Z * [new branch] gh/henrylhtsang/148/head -> origin/gh/henrylhtsang/148/head 2025-09-07T07:55:55.9984990Z * [new branch] gh/henrylhtsang/148/orig -> origin/gh/henrylhtsang/148/orig 2025-09-07T07:55:55.9987187Z * [new branch] gh/henrylhtsang/149/base -> origin/gh/henrylhtsang/149/base 2025-09-07T07:55:55.9988777Z * [new branch] gh/henrylhtsang/149/head -> origin/gh/henrylhtsang/149/head 2025-09-07T07:55:55.9990290Z * [new branch] gh/henrylhtsang/149/orig -> origin/gh/henrylhtsang/149/orig 2025-09-07T07:55:55.9993137Z * [new branch] gh/huydhn/1/next -> origin/gh/huydhn/1/next 2025-09-07T07:55:55.9995600Z * [new branch] gh/huydhn/2/next -> origin/gh/huydhn/2/next 2025-09-07T07:55:55.9997814Z * [new branch] gh/huydhn/3/next -> origin/gh/huydhn/3/next 2025-09-07T07:55:56.0000086Z * [new branch] gh/huydhn/4/next -> origin/gh/huydhn/4/next 2025-09-07T07:55:56.0002483Z * [new branch] gh/huydhn/5/next -> origin/gh/huydhn/5/next 2025-09-07T07:55:56.0004867Z * [new branch] gh/huydhn/6/next -> origin/gh/huydhn/6/next 2025-09-07T07:55:56.0007846Z * [new branch] gh/int3/97/base -> origin/gh/int3/97/base 2025-09-07T07:55:56.0009289Z * [new branch] gh/int3/97/head -> origin/gh/int3/97/head 2025-09-07T07:55:56.0012132Z * [new branch] gh/isuruf/101/base -> origin/gh/isuruf/101/base 2025-09-07T07:55:56.0013613Z * [new branch] gh/isuruf/101/head -> origin/gh/isuruf/101/head 2025-09-07T07:55:56.0016400Z * [new branch] gh/isuruf/141/base -> origin/gh/isuruf/141/base 2025-09-07T07:55:56.0017914Z * [new branch] gh/isuruf/141/head -> origin/gh/isuruf/141/head 2025-09-07T07:55:56.0019451Z * [new branch] gh/isuruf/141/orig -> origin/gh/isuruf/141/orig 2025-09-07T07:55:56.0021680Z * [new branch] gh/isuruf/142/base -> origin/gh/isuruf/142/base 2025-09-07T07:55:56.0023273Z * [new branch] gh/isuruf/142/head -> origin/gh/isuruf/142/head 2025-09-07T07:55:56.0025164Z * [new branch] gh/isuruf/142/orig -> origin/gh/isuruf/142/orig 2025-09-07T07:55:56.0027353Z * [new branch] gh/isuruf/143/base -> origin/gh/isuruf/143/base 2025-09-07T07:55:56.0028886Z * [new branch] gh/isuruf/143/head -> origin/gh/isuruf/143/head 2025-09-07T07:55:56.0030458Z * [new branch] gh/isuruf/143/orig -> origin/gh/isuruf/143/orig 2025-09-07T07:55:56.0032676Z * [new branch] gh/isuruf/144/base -> origin/gh/isuruf/144/base 2025-09-07T07:55:56.0034502Z * [new branch] gh/isuruf/144/head -> origin/gh/isuruf/144/head 2025-09-07T07:55:56.0035979Z * [new branch] gh/isuruf/144/orig -> origin/gh/isuruf/144/orig 2025-09-07T07:55:56.0038328Z * [new branch] gh/isuruf/145/base -> origin/gh/isuruf/145/base 2025-09-07T07:55:56.0039815Z * [new branch] gh/isuruf/145/head -> origin/gh/isuruf/145/head 2025-09-07T07:55:56.0041359Z * [new branch] gh/isuruf/145/orig -> origin/gh/isuruf/145/orig 2025-09-07T07:55:56.0043625Z * [new branch] gh/isuruf/146/base -> origin/gh/isuruf/146/base 2025-09-07T07:55:56.0045709Z * [new branch] gh/isuruf/146/head -> origin/gh/isuruf/146/head 2025-09-07T07:55:56.0047190Z * [new branch] gh/isuruf/146/orig -> origin/gh/isuruf/146/orig 2025-09-07T07:55:56.0049383Z * [new branch] gh/isuruf/81/base -> origin/gh/isuruf/81/base 2025-09-07T07:55:56.0050906Z * [new branch] gh/isuruf/81/head -> origin/gh/isuruf/81/head 2025-09-07T07:55:56.0052417Z * [new branch] gh/isuruf/81/orig -> origin/gh/isuruf/81/orig 2025-09-07T07:55:56.0055652Z * [new branch] gh/jamesjwu/150/base -> origin/gh/jamesjwu/150/base 2025-09-07T07:55:56.0057036Z * [new branch] gh/jamesjwu/150/head -> origin/gh/jamesjwu/150/head 2025-09-07T07:55:56.0058622Z * [new branch] gh/jamesjwu/150/orig -> origin/gh/jamesjwu/150/orig 2025-09-07T07:55:56.0060931Z * [new branch] gh/jamesjwu/154/base -> origin/gh/jamesjwu/154/base 2025-09-07T07:55:56.0062598Z * [new branch] gh/jamesjwu/154/head -> origin/gh/jamesjwu/154/head 2025-09-07T07:55:56.0064270Z * [new branch] gh/jamesjwu/154/orig -> origin/gh/jamesjwu/154/orig 2025-09-07T07:55:56.0066621Z * [new branch] gh/jamesjwu/155/base -> origin/gh/jamesjwu/155/base 2025-09-07T07:55:56.0068179Z * [new branch] gh/jamesjwu/155/head -> origin/gh/jamesjwu/155/head 2025-09-07T07:55:56.0069687Z * [new branch] gh/jamesjwu/155/orig -> origin/gh/jamesjwu/155/orig 2025-09-07T07:55:56.0071925Z * [new branch] gh/jamesjwu/159/base -> origin/gh/jamesjwu/159/base 2025-09-07T07:55:56.0073486Z * [new branch] gh/jamesjwu/159/head -> origin/gh/jamesjwu/159/head 2025-09-07T07:55:56.0075588Z * [new branch] gh/jamesjwu/159/orig -> origin/gh/jamesjwu/159/orig 2025-09-07T07:55:56.0078067Z * [new branch] gh/jamesjwu/163/base -> origin/gh/jamesjwu/163/base 2025-09-07T07:55:56.0079526Z * [new branch] gh/jamesjwu/163/head -> origin/gh/jamesjwu/163/head 2025-09-07T07:55:56.0081108Z * [new branch] gh/jamesjwu/163/orig -> origin/gh/jamesjwu/163/orig 2025-09-07T07:55:56.0083367Z * [new branch] gh/jamesjwu/171/base -> origin/gh/jamesjwu/171/base 2025-09-07T07:55:56.0085658Z * [new branch] gh/jamesjwu/171/head -> origin/gh/jamesjwu/171/head 2025-09-07T07:55:56.0086723Z * [new branch] gh/jamesjwu/171/orig -> origin/gh/jamesjwu/171/orig 2025-09-07T07:55:56.0089244Z * [new branch] gh/jamesjwu/176/base -> origin/gh/jamesjwu/176/base 2025-09-07T07:55:56.0090457Z * [new branch] gh/jamesjwu/176/head -> origin/gh/jamesjwu/176/head 2025-09-07T07:55:56.0091952Z * [new branch] gh/jamesjwu/176/orig -> origin/gh/jamesjwu/176/orig 2025-09-07T07:55:56.0094518Z * [new branch] gh/jamesjwu/181/base -> origin/gh/jamesjwu/181/base 2025-09-07T07:55:56.0096094Z * [new branch] gh/jamesjwu/181/head -> origin/gh/jamesjwu/181/head 2025-09-07T07:55:56.0097700Z * [new branch] gh/jamesjwu/181/orig -> origin/gh/jamesjwu/181/orig 2025-09-07T07:55:56.0099995Z * [new branch] gh/jamesjwu/182/base -> origin/gh/jamesjwu/182/base 2025-09-07T07:55:56.0101532Z * [new branch] gh/jamesjwu/182/head -> origin/gh/jamesjwu/182/head 2025-09-07T07:55:56.0103079Z * [new branch] gh/jamesjwu/182/orig -> origin/gh/jamesjwu/182/orig 2025-09-07T07:55:56.0105792Z * [new branch] gh/jamesjwu/183/base -> origin/gh/jamesjwu/183/base 2025-09-07T07:55:56.0107373Z * [new branch] gh/jamesjwu/183/head -> origin/gh/jamesjwu/183/head 2025-09-07T07:55:56.0109082Z * [new branch] gh/jamesjwu/183/orig -> origin/gh/jamesjwu/183/orig 2025-09-07T07:55:56.0111385Z * [new branch] gh/jamesjwu/184/base -> origin/gh/jamesjwu/184/base 2025-09-07T07:55:56.0112912Z * [new branch] gh/jamesjwu/184/head -> origin/gh/jamesjwu/184/head 2025-09-07T07:55:56.0114878Z * [new branch] gh/jamesjwu/184/orig -> origin/gh/jamesjwu/184/orig 2025-09-07T07:55:56.0117145Z * [new branch] gh/jamesjwu/185/base -> origin/gh/jamesjwu/185/base 2025-09-07T07:55:56.0118721Z * [new branch] gh/jamesjwu/185/head -> origin/gh/jamesjwu/185/head 2025-09-07T07:55:56.0120317Z * [new branch] gh/jamesjwu/185/orig -> origin/gh/jamesjwu/185/orig 2025-09-07T07:55:56.0122547Z * [new branch] gh/jamesjwu/186/base -> origin/gh/jamesjwu/186/base 2025-09-07T07:55:56.0124261Z * [new branch] gh/jamesjwu/186/head -> origin/gh/jamesjwu/186/head 2025-09-07T07:55:56.0125933Z * [new branch] gh/jamesjwu/186/orig -> origin/gh/jamesjwu/186/orig 2025-09-07T07:55:56.0128211Z * [new branch] gh/jamesjwu/187/base -> origin/gh/jamesjwu/187/base 2025-09-07T07:55:56.0129665Z * [new branch] gh/jamesjwu/187/head -> origin/gh/jamesjwu/187/head 2025-09-07T07:55:56.0131198Z * [new branch] gh/jamesjwu/187/orig -> origin/gh/jamesjwu/187/orig 2025-09-07T07:55:56.0133450Z * [new branch] gh/jamesjwu/188/base -> origin/gh/jamesjwu/188/base 2025-09-07T07:55:56.0135480Z * [new branch] gh/jamesjwu/188/head -> origin/gh/jamesjwu/188/head 2025-09-07T07:55:56.0136977Z * [new branch] gh/jamesjwu/188/orig -> origin/gh/jamesjwu/188/orig 2025-09-07T07:55:56.0139222Z * [new branch] gh/jamesjwu/189/base -> origin/gh/jamesjwu/189/base 2025-09-07T07:55:56.0140961Z * [new branch] gh/jamesjwu/189/head -> origin/gh/jamesjwu/189/head 2025-09-07T07:55:56.0142352Z * [new branch] gh/jamesjwu/189/orig -> origin/gh/jamesjwu/189/orig 2025-09-07T07:55:56.0145060Z * [new branch] gh/jamesjwu/190/base -> origin/gh/jamesjwu/190/base 2025-09-07T07:55:56.0146579Z * [new branch] gh/jamesjwu/190/head -> origin/gh/jamesjwu/190/head 2025-09-07T07:55:56.0148119Z * [new branch] gh/jamesjwu/190/orig -> origin/gh/jamesjwu/190/orig 2025-09-07T07:55:56.0150476Z * [new branch] gh/jamesjwu/52/base -> origin/gh/jamesjwu/52/base 2025-09-07T07:55:56.0151996Z * [new branch] gh/jamesjwu/52/head -> origin/gh/jamesjwu/52/head 2025-09-07T07:55:56.0154308Z * [new branch] gh/jamesjwu/53/base -> origin/gh/jamesjwu/53/base 2025-09-07T07:55:56.0156047Z * [new branch] gh/jamesjwu/53/head -> origin/gh/jamesjwu/53/head 2025-09-07T07:55:56.0158277Z * [new branch] gh/jamesjwu/54/base -> origin/gh/jamesjwu/54/base 2025-09-07T07:55:56.0159777Z * [new branch] gh/jamesjwu/54/head -> origin/gh/jamesjwu/54/head 2025-09-07T07:55:56.0161912Z * [new branch] gh/jamesjwu/55/base -> origin/gh/jamesjwu/55/base 2025-09-07T07:55:56.0163405Z * [new branch] gh/jamesjwu/55/head -> origin/gh/jamesjwu/55/head 2025-09-07T07:55:56.0166077Z * [new branch] gh/jamesjwu/56/base -> origin/gh/jamesjwu/56/base 2025-09-07T07:55:56.0167470Z * [new branch] gh/jamesjwu/56/head -> origin/gh/jamesjwu/56/head 2025-09-07T07:55:56.0169638Z * [new branch] gh/jamesjwu/57/base -> origin/gh/jamesjwu/57/base 2025-09-07T07:55:56.0171159Z * [new branch] gh/jamesjwu/57/head -> origin/gh/jamesjwu/57/head 2025-09-07T07:55:56.0173372Z * [new branch] gh/jamesjwu/58/base -> origin/gh/jamesjwu/58/base 2025-09-07T07:55:56.0175402Z * [new branch] gh/jamesjwu/58/head -> origin/gh/jamesjwu/58/head 2025-09-07T07:55:56.0177419Z * [new branch] gh/jamesjwu/59/base -> origin/gh/jamesjwu/59/base 2025-09-07T07:55:56.0178902Z * [new branch] gh/jamesjwu/59/head -> origin/gh/jamesjwu/59/head 2025-09-07T07:55:56.0181070Z * [new branch] gh/jamesjwu/60/base -> origin/gh/jamesjwu/60/base 2025-09-07T07:55:56.0182546Z * [new branch] gh/jamesjwu/60/head -> origin/gh/jamesjwu/60/head 2025-09-07T07:55:56.0185089Z * [new branch] gh/jamesjwu/61/base -> origin/gh/jamesjwu/61/base 2025-09-07T07:55:56.0186677Z * [new branch] gh/jamesjwu/61/head -> origin/gh/jamesjwu/61/head 2025-09-07T07:55:56.0188825Z * [new branch] gh/jamesjwu/62/base -> origin/gh/jamesjwu/62/base 2025-09-07T07:55:56.0190304Z * [new branch] gh/jamesjwu/62/head -> origin/gh/jamesjwu/62/head 2025-09-07T07:55:56.0192485Z * [new branch] gh/jamesjwu/63/base -> origin/gh/jamesjwu/63/base 2025-09-07T07:55:56.0194220Z * [new branch] gh/jamesjwu/63/head -> origin/gh/jamesjwu/63/head 2025-09-07T07:55:56.0196654Z * [new branch] gh/jamesjwu/64/base -> origin/gh/jamesjwu/64/base 2025-09-07T07:55:56.0198291Z * [new branch] gh/jamesjwu/64/head -> origin/gh/jamesjwu/64/head 2025-09-07T07:55:56.0200441Z * [new branch] gh/jamesjwu/65/base -> origin/gh/jamesjwu/65/base 2025-09-07T07:55:56.0201949Z * [new branch] gh/jamesjwu/65/head -> origin/gh/jamesjwu/65/head 2025-09-07T07:55:56.0205434Z * [new branch] gh/janeyx99/165/base -> origin/gh/janeyx99/165/base 2025-09-07T07:55:56.0207005Z * [new branch] gh/janeyx99/165/head -> origin/gh/janeyx99/165/head 2025-09-07T07:55:56.0208768Z * [new branch] gh/janeyx99/165/orig -> origin/gh/janeyx99/165/orig 2025-09-07T07:55:56.0210846Z * [new branch] gh/janeyx99/201/base -> origin/gh/janeyx99/201/base 2025-09-07T07:55:56.0212379Z * [new branch] gh/janeyx99/201/head -> origin/gh/janeyx99/201/head 2025-09-07T07:55:56.0214036Z * [new branch] gh/janeyx99/201/orig -> origin/gh/janeyx99/201/orig 2025-09-07T07:55:56.0216770Z * [new branch] gh/janeyx99/225/base -> origin/gh/janeyx99/225/base 2025-09-07T07:55:56.0218299Z * [new branch] gh/janeyx99/225/head -> origin/gh/janeyx99/225/head 2025-09-07T07:55:56.0219801Z * [new branch] gh/janeyx99/225/orig -> origin/gh/janeyx99/225/orig 2025-09-07T07:55:56.0222050Z * [new branch] gh/janeyx99/296/base -> origin/gh/janeyx99/296/base 2025-09-07T07:55:56.0223606Z * [new branch] gh/janeyx99/296/head -> origin/gh/janeyx99/296/head 2025-09-07T07:55:56.0225661Z * [new branch] gh/janeyx99/296/orig -> origin/gh/janeyx99/296/orig 2025-09-07T07:55:56.0227846Z * [new branch] gh/janeyx99/297/base -> origin/gh/janeyx99/297/base 2025-09-07T07:55:56.0229435Z * [new branch] gh/janeyx99/297/head -> origin/gh/janeyx99/297/head 2025-09-07T07:55:56.0230985Z * [new branch] gh/janeyx99/297/orig -> origin/gh/janeyx99/297/orig 2025-09-07T07:55:56.0233300Z * [new branch] gh/janeyx99/298/base -> origin/gh/janeyx99/298/base 2025-09-07T07:55:56.0235123Z * [new branch] gh/janeyx99/298/head -> origin/gh/janeyx99/298/head 2025-09-07T07:55:56.0236582Z * [new branch] gh/janeyx99/298/orig -> origin/gh/janeyx99/298/orig 2025-09-07T07:55:56.0238970Z * [new branch] gh/janeyx99/299/base -> origin/gh/janeyx99/299/base 2025-09-07T07:55:56.0240546Z * [new branch] gh/janeyx99/299/head -> origin/gh/janeyx99/299/head 2025-09-07T07:55:56.0242103Z * [new branch] gh/janeyx99/299/orig -> origin/gh/janeyx99/299/orig 2025-09-07T07:55:56.0244900Z * [new branch] gh/janeyx99/300/base -> origin/gh/janeyx99/300/base 2025-09-07T07:55:56.0246590Z * [new branch] gh/janeyx99/300/head -> origin/gh/janeyx99/300/head 2025-09-07T07:55:56.0248138Z * [new branch] gh/janeyx99/300/orig -> origin/gh/janeyx99/300/orig 2025-09-07T07:55:56.0250361Z * [new branch] gh/janeyx99/301/base -> origin/gh/janeyx99/301/base 2025-09-07T07:55:56.0251958Z * [new branch] gh/janeyx99/301/head -> origin/gh/janeyx99/301/head 2025-09-07T07:55:56.0253488Z * [new branch] gh/janeyx99/301/orig -> origin/gh/janeyx99/301/orig 2025-09-07T07:55:56.0256023Z * [new branch] gh/janeyx99/302/base -> origin/gh/janeyx99/302/base 2025-09-07T07:55:56.0257608Z * [new branch] gh/janeyx99/302/head -> origin/gh/janeyx99/302/head 2025-09-07T07:55:56.0259717Z * [new branch] gh/janeyx99/303/base -> origin/gh/janeyx99/303/base 2025-09-07T07:55:56.0261221Z * [new branch] gh/janeyx99/303/head -> origin/gh/janeyx99/303/head 2025-09-07T07:55:56.0263635Z * [new branch] gh/janeyx99/88/base -> origin/gh/janeyx99/88/base 2025-09-07T07:55:56.0265664Z * [new branch] gh/janeyx99/88/head -> origin/gh/janeyx99/88/head 2025-09-07T07:55:56.0267156Z * [new branch] gh/janeyx99/88/orig -> origin/gh/janeyx99/88/orig 2025-09-07T07:55:56.0270011Z * [new branch] gh/jansel/360/base -> origin/gh/jansel/360/base 2025-09-07T07:55:56.0271552Z * [new branch] gh/jansel/360/head -> origin/gh/jansel/360/head 2025-09-07T07:55:56.0273861Z * [new branch] gh/jansel/451/base -> origin/gh/jansel/451/base 2025-09-07T07:55:56.0275788Z * [new branch] gh/jansel/451/head -> origin/gh/jansel/451/head 2025-09-07T07:55:56.0277187Z * [new branch] gh/jansel/451/orig -> origin/gh/jansel/451/orig 2025-09-07T07:55:56.0279519Z * [new branch] gh/jansel/462/base -> origin/gh/jansel/462/base 2025-09-07T07:55:56.0281017Z * [new branch] gh/jansel/462/head -> origin/gh/jansel/462/head 2025-09-07T07:55:56.0282643Z * [new branch] gh/jansel/462/orig -> origin/gh/jansel/462/orig 2025-09-07T07:55:56.0285336Z * [new branch] gh/jansel/531/base -> origin/gh/jansel/531/base 2025-09-07T07:55:56.0286770Z * [new branch] gh/jansel/531/head -> origin/gh/jansel/531/head 2025-09-07T07:55:56.0288285Z * [new branch] gh/jansel/531/orig -> origin/gh/jansel/531/orig 2025-09-07T07:55:56.0291133Z * [new branch] gh/jbschlosser/208/head -> origin/gh/jbschlosser/208/head 2025-09-07T07:55:56.0293418Z * [new branch] gh/jbschlosser/247/base -> origin/gh/jbschlosser/247/base 2025-09-07T07:55:56.0295478Z * [new branch] gh/jbschlosser/247/head -> origin/gh/jbschlosser/247/head 2025-09-07T07:55:56.0296913Z * [new branch] gh/jbschlosser/247/orig -> origin/gh/jbschlosser/247/orig 2025-09-07T07:55:56.0299159Z * [new branch] gh/jbschlosser/248/base -> origin/gh/jbschlosser/248/base 2025-09-07T07:55:56.0301040Z * [new branch] gh/jbschlosser/248/head -> origin/gh/jbschlosser/248/head 2025-09-07T07:55:56.0302559Z * [new branch] gh/jbschlosser/248/orig -> origin/gh/jbschlosser/248/orig 2025-09-07T07:55:56.0305275Z * [new branch] gh/jbschlosser/250/base -> origin/gh/jbschlosser/250/base 2025-09-07T07:55:56.0306817Z * [new branch] gh/jbschlosser/250/head -> origin/gh/jbschlosser/250/head 2025-09-07T07:55:56.0308401Z * [new branch] gh/jbschlosser/250/orig -> origin/gh/jbschlosser/250/orig 2025-09-07T07:55:56.0311168Z * [new branch] gh/jiayisunx/59/base -> origin/gh/jiayisunx/59/base 2025-09-07T07:55:56.0312708Z * [new branch] gh/jiayisunx/59/head -> origin/gh/jiayisunx/59/head 2025-09-07T07:55:56.0314597Z * [new branch] gh/jiayisunx/59/orig -> origin/gh/jiayisunx/59/orig 2025-09-07T07:55:56.0316788Z * [new branch] gh/jiayisunx/61/base -> origin/gh/jiayisunx/61/base 2025-09-07T07:55:56.0318448Z * [new branch] gh/jiayisunx/61/head -> origin/gh/jiayisunx/61/head 2025-09-07T07:55:56.0319981Z * [new branch] gh/jiayisunx/61/orig -> origin/gh/jiayisunx/61/orig 2025-09-07T07:55:56.0322237Z * [new branch] gh/jiayisunx/64/base -> origin/gh/jiayisunx/64/base 2025-09-07T07:55:56.0324019Z * [new branch] gh/jiayisunx/64/head -> origin/gh/jiayisunx/64/head 2025-09-07T07:55:56.0325718Z * [new branch] gh/jiayisunx/64/orig -> origin/gh/jiayisunx/64/orig 2025-09-07T07:55:56.0328005Z * [new branch] gh/jiayisunx/65/base -> origin/gh/jiayisunx/65/base 2025-09-07T07:55:56.0329573Z * [new branch] gh/jiayisunx/65/head -> origin/gh/jiayisunx/65/head 2025-09-07T07:55:56.0331117Z * [new branch] gh/jiayisunx/65/orig -> origin/gh/jiayisunx/65/orig 2025-09-07T07:55:56.0333337Z * [new branch] gh/jiayisunx/66/base -> origin/gh/jiayisunx/66/base 2025-09-07T07:55:56.0335309Z * [new branch] gh/jiayisunx/66/head -> origin/gh/jiayisunx/66/head 2025-09-07T07:55:56.0336794Z * [new branch] gh/jiayisunx/66/orig -> origin/gh/jiayisunx/66/orig 2025-09-07T07:55:56.0339141Z * [new branch] gh/jiayisunx/67/base -> origin/gh/jiayisunx/67/base 2025-09-07T07:55:56.0340624Z * [new branch] gh/jiayisunx/67/head -> origin/gh/jiayisunx/67/head 2025-09-07T07:55:56.0342392Z * [new branch] gh/jiayisunx/67/orig -> origin/gh/jiayisunx/67/orig 2025-09-07T07:55:56.0344881Z * [new branch] gh/jiayisunx/68/base -> origin/gh/jiayisunx/68/base 2025-09-07T07:55:56.0346336Z * [new branch] gh/jiayisunx/68/head -> origin/gh/jiayisunx/68/head 2025-09-07T07:55:56.0347855Z * [new branch] gh/jiayisunx/68/orig -> origin/gh/jiayisunx/68/orig 2025-09-07T07:55:56.0350116Z * [new branch] gh/jiayisunx/69/base -> origin/gh/jiayisunx/69/base 2025-09-07T07:55:56.0351694Z * [new branch] gh/jiayisunx/69/head -> origin/gh/jiayisunx/69/head 2025-09-07T07:55:56.0353207Z * [new branch] gh/jiayisunx/69/orig -> origin/gh/jiayisunx/69/orig 2025-09-07T07:55:56.0355898Z * [new branch] gh/jiayisunx/70/base -> origin/gh/jiayisunx/70/base 2025-09-07T07:55:56.0357429Z * [new branch] gh/jiayisunx/70/head -> origin/gh/jiayisunx/70/head 2025-09-07T07:55:56.0358962Z * [new branch] gh/jiayisunx/70/orig -> origin/gh/jiayisunx/70/orig 2025-09-07T07:55:56.0361226Z * [new branch] gh/jiayisunx/71/base -> origin/gh/jiayisunx/71/base 2025-09-07T07:55:56.0362770Z * [new branch] gh/jiayisunx/71/head -> origin/gh/jiayisunx/71/head 2025-09-07T07:55:56.0364631Z * [new branch] gh/jiayisunx/71/orig -> origin/gh/jiayisunx/71/orig 2025-09-07T07:55:56.0367058Z * [new branch] gh/jiayisunx/72/base -> origin/gh/jiayisunx/72/base 2025-09-07T07:55:56.0368512Z * [new branch] gh/jiayisunx/72/head -> origin/gh/jiayisunx/72/head 2025-09-07T07:55:56.0370016Z * [new branch] gh/jiayisunx/72/orig -> origin/gh/jiayisunx/72/orig 2025-09-07T07:55:56.0372312Z * [new branch] gh/jiayisunx/73/base -> origin/gh/jiayisunx/73/base 2025-09-07T07:55:56.0374047Z * [new branch] gh/jiayisunx/73/head -> origin/gh/jiayisunx/73/head 2025-09-07T07:55:56.0376065Z * [new branch] gh/jiayisunx/73/orig -> origin/gh/jiayisunx/73/orig 2025-09-07T07:55:56.0378070Z * [new branch] gh/jiayisunx/74/base -> origin/gh/jiayisunx/74/base 2025-09-07T07:55:56.0379592Z * [new branch] gh/jiayisunx/74/head -> origin/gh/jiayisunx/74/head 2025-09-07T07:55:56.0381156Z * [new branch] gh/jiayisunx/74/orig -> origin/gh/jiayisunx/74/orig 2025-09-07T07:55:56.0386382Z * [new branch] gh/jiayisunx/75/base -> origin/gh/jiayisunx/75/base 2025-09-07T07:55:56.0387861Z * [new branch] gh/jiayisunx/75/head -> origin/gh/jiayisunx/75/head 2025-09-07T07:55:56.0389337Z * [new branch] gh/jiayisunx/75/orig -> origin/gh/jiayisunx/75/orig 2025-09-07T07:55:56.0391570Z * [new branch] gh/jiayisunx/76/base -> origin/gh/jiayisunx/76/base 2025-09-07T07:55:56.0393017Z * [new branch] gh/jiayisunx/76/head -> origin/gh/jiayisunx/76/head 2025-09-07T07:55:56.0395029Z * [new branch] gh/jiayisunx/76/orig -> origin/gh/jiayisunx/76/orig 2025-09-07T07:55:56.0398059Z * [new branch] gh/jjwu@meta.com/1/base -> origin/gh/jjwu@meta.com/1/base 2025-09-07T07:55:56.0399556Z * [new branch] gh/jjwu@meta.com/1/head -> origin/gh/jjwu@meta.com/1/head 2025-09-07T07:55:56.0402368Z * [new branch] gh/justinchuby/111/base -> origin/gh/justinchuby/111/base 2025-09-07T07:55:56.0404410Z * [new branch] gh/justinchuby/111/head -> origin/gh/justinchuby/111/head 2025-09-07T07:55:56.0406119Z * [new branch] gh/justinchuby/111/orig -> origin/gh/justinchuby/111/orig 2025-09-07T07:55:56.0408305Z * [new branch] gh/justinchuby/112/base -> origin/gh/justinchuby/112/base 2025-09-07T07:55:56.0409881Z * [new branch] gh/justinchuby/112/head -> origin/gh/justinchuby/112/head 2025-09-07T07:55:56.0411684Z * [new branch] gh/justinchuby/112/orig -> origin/gh/justinchuby/112/orig 2025-09-07T07:55:56.0413903Z * [new branch] gh/justinchuby/113/base -> origin/gh/justinchuby/113/base 2025-09-07T07:55:56.0415697Z * [new branch] gh/justinchuby/113/head -> origin/gh/justinchuby/113/head 2025-09-07T07:55:56.0417232Z * [new branch] gh/justinchuby/113/orig -> origin/gh/justinchuby/113/orig 2025-09-07T07:55:56.0419387Z * [new branch] gh/justinchuby/114/base -> origin/gh/justinchuby/114/base 2025-09-07T07:55:56.0420910Z * [new branch] gh/justinchuby/114/head -> origin/gh/justinchuby/114/head 2025-09-07T07:55:56.0422457Z * [new branch] gh/justinchuby/114/orig -> origin/gh/justinchuby/114/orig 2025-09-07T07:55:56.0425108Z * [new branch] gh/justinchuby/115/base -> origin/gh/justinchuby/115/base 2025-09-07T07:55:56.0426633Z * [new branch] gh/justinchuby/115/head -> origin/gh/justinchuby/115/head 2025-09-07T07:55:56.0428095Z * [new branch] gh/justinchuby/115/orig -> origin/gh/justinchuby/115/orig 2025-09-07T07:55:56.0430988Z * [new branch] gh/karthickai/1/base -> origin/gh/karthickai/1/base 2025-09-07T07:55:56.0432575Z * [new branch] gh/karthickai/1/head -> origin/gh/karthickai/1/head 2025-09-07T07:55:56.0434346Z * [new branch] gh/karthickai/1/orig -> origin/gh/karthickai/1/orig 2025-09-07T07:55:56.0436620Z * [new branch] gh/karthickai/2/base -> origin/gh/karthickai/2/base 2025-09-07T07:55:56.0438401Z * [new branch] gh/karthickai/2/head -> origin/gh/karthickai/2/head 2025-09-07T07:55:56.0439909Z * [new branch] gh/karthickai/2/orig -> origin/gh/karthickai/2/orig 2025-09-07T07:55:56.0442768Z * [new branch] gh/kurtamohler/32/base -> origin/gh/kurtamohler/32/base 2025-09-07T07:55:56.0444621Z * [new branch] gh/kurtamohler/32/head -> origin/gh/kurtamohler/32/head 2025-09-07T07:55:56.0446251Z * [new branch] gh/kurtamohler/32/orig -> origin/gh/kurtamohler/32/orig 2025-09-07T07:55:56.0448392Z * [new branch] gh/kurtamohler/33/base -> origin/gh/kurtamohler/33/base 2025-09-07T07:55:56.0450016Z * [new branch] gh/kurtamohler/33/head -> origin/gh/kurtamohler/33/head 2025-09-07T07:55:56.0451549Z * [new branch] gh/kurtamohler/33/orig -> origin/gh/kurtamohler/33/orig 2025-09-07T07:55:56.0453881Z * [new branch] gh/kurtamohler/34/base -> origin/gh/kurtamohler/34/base 2025-09-07T07:55:56.0455749Z * [new branch] gh/kurtamohler/34/head -> origin/gh/kurtamohler/34/head 2025-09-07T07:55:56.0457204Z * [new branch] gh/kurtamohler/34/orig -> origin/gh/kurtamohler/34/orig 2025-09-07T07:55:56.0459464Z * [new branch] gh/kurtamohler/41/base -> origin/gh/kurtamohler/41/base 2025-09-07T07:55:56.0461064Z * [new branch] gh/kurtamohler/41/head -> origin/gh/kurtamohler/41/head 2025-09-07T07:55:56.0462553Z * [new branch] gh/kurtamohler/41/orig -> origin/gh/kurtamohler/41/orig 2025-09-07T07:55:56.0465193Z * [new branch] gh/kurtamohler/46/base -> origin/gh/kurtamohler/46/base 2025-09-07T07:55:56.0466652Z * [new branch] gh/kurtamohler/46/head -> origin/gh/kurtamohler/46/head 2025-09-07T07:55:56.0468281Z * [new branch] gh/kurtamohler/46/orig -> origin/gh/kurtamohler/46/orig 2025-09-07T07:55:56.0470580Z * [new branch] gh/kurtamohler/47/base -> origin/gh/kurtamohler/47/base 2025-09-07T07:55:56.0472211Z * [new branch] gh/kurtamohler/47/head -> origin/gh/kurtamohler/47/head 2025-09-07T07:55:56.0473857Z * [new branch] gh/kurtamohler/47/orig -> origin/gh/kurtamohler/47/orig 2025-09-07T07:55:56.0476470Z * [new branch] gh/kurtamohler/48/base -> origin/gh/kurtamohler/48/base 2025-09-07T07:55:56.0477913Z * [new branch] gh/kurtamohler/48/head -> origin/gh/kurtamohler/48/head 2025-09-07T07:55:56.0479395Z * [new branch] gh/kurtamohler/48/orig -> origin/gh/kurtamohler/48/orig 2025-09-07T07:55:56.0481696Z * [new branch] gh/kurtamohler/49/base -> origin/gh/kurtamohler/49/base 2025-09-07T07:55:56.0483231Z * [new branch] gh/kurtamohler/49/head -> origin/gh/kurtamohler/49/head 2025-09-07T07:55:56.0485119Z * [new branch] gh/kurtamohler/49/orig -> origin/gh/kurtamohler/49/orig 2025-09-07T07:55:56.0487399Z * [new branch] gh/kurtamohler/50/base -> origin/gh/kurtamohler/50/base 2025-09-07T07:55:56.0488901Z * [new branch] gh/kurtamohler/50/head -> origin/gh/kurtamohler/50/head 2025-09-07T07:55:56.0490442Z * [new branch] gh/kurtamohler/50/orig -> origin/gh/kurtamohler/50/orig 2025-09-07T07:55:56.0493560Z * [new branch] gh/kwen2501/130/base -> origin/gh/kwen2501/130/base 2025-09-07T07:55:56.0495583Z * [new branch] gh/kwen2501/130/head -> origin/gh/kwen2501/130/head 2025-09-07T07:55:56.0497183Z * [new branch] gh/kwen2501/130/orig -> origin/gh/kwen2501/130/orig 2025-09-07T07:55:56.0499421Z * [new branch] gh/kwen2501/15/base -> origin/gh/kwen2501/15/base 2025-09-07T07:55:56.0501016Z * [new branch] gh/kwen2501/15/head -> origin/gh/kwen2501/15/head 2025-09-07T07:55:56.0503203Z * [new branch] gh/kwen2501/156/base -> origin/gh/kwen2501/156/base 2025-09-07T07:55:56.0505281Z * [new branch] gh/kwen2501/156/head -> origin/gh/kwen2501/156/head 2025-09-07T07:55:56.0506713Z * [new branch] gh/kwen2501/156/orig -> origin/gh/kwen2501/156/orig 2025-09-07T07:55:56.0508901Z * [new branch] gh/kwen2501/170/base -> origin/gh/kwen2501/170/base 2025-09-07T07:55:56.0510429Z * [new branch] gh/kwen2501/170/head -> origin/gh/kwen2501/170/head 2025-09-07T07:55:56.0512809Z * [new branch] gh/kwen2501/186/base -> origin/gh/kwen2501/186/base 2025-09-07T07:55:56.0514761Z * [new branch] gh/kwen2501/186/head -> origin/gh/kwen2501/186/head 2025-09-07T07:55:56.0516248Z * [new branch] gh/kwen2501/186/orig -> origin/gh/kwen2501/186/orig 2025-09-07T07:55:56.0518473Z * [new branch] gh/kwen2501/187/base -> origin/gh/kwen2501/187/base 2025-09-07T07:55:56.0520117Z * [new branch] gh/kwen2501/187/head -> origin/gh/kwen2501/187/head 2025-09-07T07:55:56.0521663Z * [new branch] gh/kwen2501/187/orig -> origin/gh/kwen2501/187/orig 2025-09-07T07:55:56.0524229Z * [new branch] gh/kwen2501/188/base -> origin/gh/kwen2501/188/base 2025-09-07T07:55:56.0525868Z * [new branch] gh/kwen2501/188/head -> origin/gh/kwen2501/188/head 2025-09-07T07:55:56.0527409Z * [new branch] gh/kwen2501/188/orig -> origin/gh/kwen2501/188/orig 2025-09-07T07:55:56.0529625Z * [new branch] gh/kwen2501/194/base -> origin/gh/kwen2501/194/base 2025-09-07T07:55:56.0531181Z * [new branch] gh/kwen2501/194/head -> origin/gh/kwen2501/194/head 2025-09-07T07:55:56.0532735Z * [new branch] gh/kwen2501/194/orig -> origin/gh/kwen2501/194/orig 2025-09-07T07:55:56.0535345Z * [new branch] gh/kwen2501/199/base -> origin/gh/kwen2501/199/base 2025-09-07T07:55:56.0536844Z * [new branch] gh/kwen2501/199/head -> origin/gh/kwen2501/199/head 2025-09-07T07:55:56.0538363Z * [new branch] gh/kwen2501/199/orig -> origin/gh/kwen2501/199/orig 2025-09-07T07:55:56.0540516Z * [new branch] gh/kwen2501/200/base -> origin/gh/kwen2501/200/base 2025-09-07T07:55:56.0542444Z * [new branch] gh/kwen2501/200/head -> origin/gh/kwen2501/200/head 2025-09-07T07:55:56.0543895Z * [new branch] gh/kwen2501/200/orig -> origin/gh/kwen2501/200/orig 2025-09-07T07:55:56.0546472Z * [new branch] gh/kwen2501/201/base -> origin/gh/kwen2501/201/base 2025-09-07T07:55:56.0547965Z * [new branch] gh/kwen2501/201/head -> origin/gh/kwen2501/201/head 2025-09-07T07:55:56.0549503Z * [new branch] gh/kwen2501/201/orig -> origin/gh/kwen2501/201/orig 2025-09-07T07:55:56.0551692Z * [new branch] gh/kwen2501/203/base -> origin/gh/kwen2501/203/base 2025-09-07T07:55:56.0553276Z * [new branch] gh/kwen2501/203/head -> origin/gh/kwen2501/203/head 2025-09-07T07:55:56.0555167Z * [new branch] gh/kwen2501/203/orig -> origin/gh/kwen2501/203/orig 2025-09-07T07:55:56.0557490Z * [new branch] gh/kwen2501/204/base -> origin/gh/kwen2501/204/base 2025-09-07T07:55:56.0559016Z * [new branch] gh/kwen2501/204/head -> origin/gh/kwen2501/204/head 2025-09-07T07:55:56.0560568Z * [new branch] gh/kwen2501/204/orig -> origin/gh/kwen2501/204/orig 2025-09-07T07:55:56.0562891Z * [new branch] gh/kwen2501/205/base -> origin/gh/kwen2501/205/base 2025-09-07T07:55:56.0564822Z * [new branch] gh/kwen2501/205/head -> origin/gh/kwen2501/205/head 2025-09-07T07:55:56.0566374Z * [new branch] gh/kwen2501/205/orig -> origin/gh/kwen2501/205/orig 2025-09-07T07:55:56.0568612Z * [new branch] gh/kwen2501/206/base -> origin/gh/kwen2501/206/base 2025-09-07T07:55:56.0570119Z * [new branch] gh/kwen2501/206/head -> origin/gh/kwen2501/206/head 2025-09-07T07:55:56.0571696Z * [new branch] gh/kwen2501/206/orig -> origin/gh/kwen2501/206/orig 2025-09-07T07:55:56.0574030Z * [new branch] gh/kwen2501/207/base -> origin/gh/kwen2501/207/base 2025-09-07T07:55:56.0575724Z * [new branch] gh/kwen2501/207/head -> origin/gh/kwen2501/207/head 2025-09-07T07:55:56.0577262Z * [new branch] gh/kwen2501/207/orig -> origin/gh/kwen2501/207/orig 2025-09-07T07:55:56.0579551Z * [new branch] gh/kwen2501/208/base -> origin/gh/kwen2501/208/base 2025-09-07T07:55:56.0581070Z * [new branch] gh/kwen2501/208/head -> origin/gh/kwen2501/208/head 2025-09-07T07:55:56.0582593Z * [new branch] gh/kwen2501/208/orig -> origin/gh/kwen2501/208/orig 2025-09-07T07:55:56.0585303Z * [new branch] gh/kwen2501/209/base -> origin/gh/kwen2501/209/base 2025-09-07T07:55:56.0586971Z * [new branch] gh/kwen2501/209/head -> origin/gh/kwen2501/209/head 2025-09-07T07:55:56.0588486Z * [new branch] gh/kwen2501/209/orig -> origin/gh/kwen2501/209/orig 2025-09-07T07:55:56.0590857Z * [new branch] gh/kwen2501/210/base -> origin/gh/kwen2501/210/base 2025-09-07T07:55:56.0592427Z * [new branch] gh/kwen2501/210/head -> origin/gh/kwen2501/210/head 2025-09-07T07:55:56.0594052Z * [new branch] gh/kwen2501/210/orig -> origin/gh/kwen2501/210/orig 2025-09-07T07:55:56.0596466Z * [new branch] gh/kwen2501/211/base -> origin/gh/kwen2501/211/base 2025-09-07T07:55:56.0598133Z * [new branch] gh/kwen2501/211/head -> origin/gh/kwen2501/211/head 2025-09-07T07:55:56.0600449Z * [new branch] gh/kwen2501/212/base -> origin/gh/kwen2501/212/base 2025-09-07T07:55:56.0602006Z * [new branch] gh/kwen2501/212/head -> origin/gh/kwen2501/212/head 2025-09-07T07:55:56.0603540Z * [new branch] gh/kwen2501/212/orig -> origin/gh/kwen2501/212/orig 2025-09-07T07:55:56.0606151Z * [new branch] gh/kwen2501/213/base -> origin/gh/kwen2501/213/base 2025-09-07T07:55:56.0607827Z * [new branch] gh/kwen2501/213/head -> origin/gh/kwen2501/213/head 2025-09-07T07:55:56.0609231Z * [new branch] gh/kwen2501/213/orig -> origin/gh/kwen2501/213/orig 2025-09-07T07:55:56.0611552Z * [new branch] gh/kwen2501/214/base -> origin/gh/kwen2501/214/base 2025-09-07T07:55:56.0613095Z * [new branch] gh/kwen2501/214/head -> origin/gh/kwen2501/214/head 2025-09-07T07:55:56.0615003Z * [new branch] gh/kwen2501/214/orig -> origin/gh/kwen2501/214/orig 2025-09-07T07:55:56.0617285Z * [new branch] gh/kwen2501/215/base -> origin/gh/kwen2501/215/base 2025-09-07T07:55:56.0618794Z * [new branch] gh/kwen2501/215/head -> origin/gh/kwen2501/215/head 2025-09-07T07:55:56.0620334Z * [new branch] gh/kwen2501/215/orig -> origin/gh/kwen2501/215/orig 2025-09-07T07:55:56.0622571Z * [new branch] gh/kwen2501/216/base -> origin/gh/kwen2501/216/base 2025-09-07T07:55:56.0624354Z * [new branch] gh/kwen2501/216/head -> origin/gh/kwen2501/216/head 2025-09-07T07:55:56.0626011Z * [new branch] gh/kwen2501/216/orig -> origin/gh/kwen2501/216/orig 2025-09-07T07:55:56.0628197Z * [new branch] gh/kwen2501/217/base -> origin/gh/kwen2501/217/base 2025-09-07T07:55:56.0629746Z * [new branch] gh/kwen2501/217/head -> origin/gh/kwen2501/217/head 2025-09-07T07:55:56.0631287Z * [new branch] gh/kwen2501/217/orig -> origin/gh/kwen2501/217/orig 2025-09-07T07:55:56.0633507Z * [new branch] gh/kwen2501/218/base -> origin/gh/kwen2501/218/base 2025-09-07T07:55:56.0635600Z * [new branch] gh/kwen2501/218/head -> origin/gh/kwen2501/218/head 2025-09-07T07:55:56.0637187Z * [new branch] gh/kwen2501/218/orig -> origin/gh/kwen2501/218/orig 2025-09-07T07:55:56.0639521Z * [new branch] gh/kwen2501/219/base -> origin/gh/kwen2501/219/base 2025-09-07T07:55:56.0641021Z * [new branch] gh/kwen2501/219/head -> origin/gh/kwen2501/219/head 2025-09-07T07:55:56.0642572Z * [new branch] gh/kwen2501/219/orig -> origin/gh/kwen2501/219/orig 2025-09-07T07:55:56.0645224Z * [new branch] gh/kwen2501/220/base -> origin/gh/kwen2501/220/base 2025-09-07T07:55:56.0646658Z * [new branch] gh/kwen2501/220/head -> origin/gh/kwen2501/220/head 2025-09-07T07:55:56.0648197Z * [new branch] gh/kwen2501/220/orig -> origin/gh/kwen2501/220/orig 2025-09-07T07:55:56.0650493Z * [new branch] gh/kwen2501/221/base -> origin/gh/kwen2501/221/base 2025-09-07T07:55:56.0652038Z * [new branch] gh/kwen2501/221/head -> origin/gh/kwen2501/221/head 2025-09-07T07:55:56.0653576Z * [new branch] gh/kwen2501/221/orig -> origin/gh/kwen2501/221/orig 2025-09-07T07:55:56.0656346Z * [new branch] gh/kwen2501/222/base -> origin/gh/kwen2501/222/base 2025-09-07T07:55:56.0657815Z * [new branch] gh/kwen2501/222/head -> origin/gh/kwen2501/222/head 2025-09-07T07:55:56.0659335Z * [new branch] gh/kwen2501/222/orig -> origin/gh/kwen2501/222/orig 2025-09-07T07:55:56.0661548Z * [new branch] gh/kwen2501/223/base -> origin/gh/kwen2501/223/base 2025-09-07T07:55:56.0663104Z * [new branch] gh/kwen2501/223/head -> origin/gh/kwen2501/223/head 2025-09-07T07:55:56.0665066Z * [new branch] gh/kwen2501/223/orig -> origin/gh/kwen2501/223/orig 2025-09-07T07:55:56.0667222Z * [new branch] gh/kwen2501/224/base -> origin/gh/kwen2501/224/base 2025-09-07T07:55:56.0668798Z * [new branch] gh/kwen2501/224/head -> origin/gh/kwen2501/224/head 2025-09-07T07:55:56.0670320Z * [new branch] gh/kwen2501/224/orig -> origin/gh/kwen2501/224/orig 2025-09-07T07:55:56.0672756Z * [new branch] gh/kwen2501/225/base -> origin/gh/kwen2501/225/base 2025-09-07T07:55:56.0674519Z * [new branch] gh/kwen2501/225/head -> origin/gh/kwen2501/225/head 2025-09-07T07:55:56.0676077Z * [new branch] gh/kwen2501/225/orig -> origin/gh/kwen2501/225/orig 2025-09-07T07:55:56.0678436Z * [new branch] gh/kwen2501/226/base -> origin/gh/kwen2501/226/base 2025-09-07T07:55:56.0680158Z * [new branch] gh/kwen2501/226/head -> origin/gh/kwen2501/226/head 2025-09-07T07:55:56.0681856Z * [new branch] gh/kwen2501/226/orig -> origin/gh/kwen2501/226/orig 2025-09-07T07:55:56.0684319Z * [new branch] gh/kwen2501/227/base -> origin/gh/kwen2501/227/base 2025-09-07T07:55:56.0685978Z * [new branch] gh/kwen2501/227/head -> origin/gh/kwen2501/227/head 2025-09-07T07:55:56.0687443Z * [new branch] gh/kwen2501/227/orig -> origin/gh/kwen2501/227/orig 2025-09-07T07:55:56.0689724Z * [new branch] gh/kwen2501/228/base -> origin/gh/kwen2501/228/base 2025-09-07T07:55:56.0691233Z * [new branch] gh/kwen2501/228/head -> origin/gh/kwen2501/228/head 2025-09-07T07:55:56.0692797Z * [new branch] gh/kwen2501/228/orig -> origin/gh/kwen2501/228/orig 2025-09-07T07:55:56.0695579Z * [new branch] gh/kwen2501/229/base -> origin/gh/kwen2501/229/base 2025-09-07T07:55:56.0697076Z * [new branch] gh/kwen2501/229/head -> origin/gh/kwen2501/229/head 2025-09-07T07:55:56.0698557Z * [new branch] gh/kwen2501/229/orig -> origin/gh/kwen2501/229/orig 2025-09-07T07:55:56.0700869Z * [new branch] gh/kwen2501/230/base -> origin/gh/kwen2501/230/base 2025-09-07T07:55:56.0702455Z * [new branch] gh/kwen2501/230/head -> origin/gh/kwen2501/230/head 2025-09-07T07:55:56.0704056Z * [new branch] gh/kwen2501/230/orig -> origin/gh/kwen2501/230/orig 2025-09-07T07:55:56.0706677Z * [new branch] gh/kwen2501/231/base -> origin/gh/kwen2501/231/base 2025-09-07T07:55:56.0708112Z * [new branch] gh/kwen2501/231/head -> origin/gh/kwen2501/231/head 2025-09-07T07:55:56.0709696Z * [new branch] gh/kwen2501/231/orig -> origin/gh/kwen2501/231/orig 2025-09-07T07:55:56.0711970Z * [new branch] gh/kwen2501/232/base -> origin/gh/kwen2501/232/base 2025-09-07T07:55:56.0713555Z * [new branch] gh/kwen2501/232/head -> origin/gh/kwen2501/232/head 2025-09-07T07:55:56.0715502Z * [new branch] gh/kwen2501/232/orig -> origin/gh/kwen2501/232/orig 2025-09-07T07:55:56.0718466Z * [new branch] gh/laithsakka/156/base -> origin/gh/laithsakka/156/base 2025-09-07T07:55:56.0720039Z * [new branch] gh/laithsakka/156/head -> origin/gh/laithsakka/156/head 2025-09-07T07:55:56.0721578Z * [new branch] gh/laithsakka/156/orig -> origin/gh/laithsakka/156/orig 2025-09-07T07:55:56.0723976Z * [new branch] gh/laithsakka/160/base -> origin/gh/laithsakka/160/base 2025-09-07T07:55:56.0725819Z * [new branch] gh/laithsakka/160/head -> origin/gh/laithsakka/160/head 2025-09-07T07:55:56.0727268Z * [new branch] gh/laithsakka/160/orig -> origin/gh/laithsakka/160/orig 2025-09-07T07:55:56.0729550Z * [new branch] gh/laithsakka/178/base -> origin/gh/laithsakka/178/base 2025-09-07T07:55:56.0731167Z * [new branch] gh/laithsakka/178/head -> origin/gh/laithsakka/178/head 2025-09-07T07:55:56.0732709Z * [new branch] gh/laithsakka/178/orig -> origin/gh/laithsakka/178/orig 2025-09-07T07:55:56.0735416Z * [new branch] gh/laithsakka/191/base -> origin/gh/laithsakka/191/base 2025-09-07T07:55:56.0736895Z * [new branch] gh/laithsakka/191/head -> origin/gh/laithsakka/191/head 2025-09-07T07:55:56.0738615Z * [new branch] gh/laithsakka/191/orig -> origin/gh/laithsakka/191/orig 2025-09-07T07:55:56.0740698Z * [new branch] gh/laithsakka/237/base -> origin/gh/laithsakka/237/base 2025-09-07T07:55:56.0742256Z * [new branch] gh/laithsakka/237/head -> origin/gh/laithsakka/237/head 2025-09-07T07:55:56.0743850Z * [new branch] gh/laithsakka/237/orig -> origin/gh/laithsakka/237/orig 2025-09-07T07:55:56.0746387Z * [new branch] gh/laithsakka/249/base -> origin/gh/laithsakka/249/base 2025-09-07T07:55:56.0747850Z * [new branch] gh/laithsakka/249/head -> origin/gh/laithsakka/249/head 2025-09-07T07:55:56.0749421Z * [new branch] gh/laithsakka/249/orig -> origin/gh/laithsakka/249/orig 2025-09-07T07:55:56.0751693Z * [new branch] gh/laithsakka/251/base -> origin/gh/laithsakka/251/base 2025-09-07T07:55:56.0753241Z * [new branch] gh/laithsakka/251/head -> origin/gh/laithsakka/251/head 2025-09-07T07:55:56.0755148Z * [new branch] gh/laithsakka/251/orig -> origin/gh/laithsakka/251/orig 2025-09-07T07:55:56.0757522Z * [new branch] gh/laithsakka/254/base -> origin/gh/laithsakka/254/base 2025-09-07T07:55:56.0759038Z * [new branch] gh/laithsakka/254/head -> origin/gh/laithsakka/254/head 2025-09-07T07:55:56.0760698Z * [new branch] gh/laithsakka/254/orig -> origin/gh/laithsakka/254/orig 2025-09-07T07:55:56.0763003Z * [new branch] gh/laithsakka/255/base -> origin/gh/laithsakka/255/base 2025-09-07T07:55:56.0764839Z * [new branch] gh/laithsakka/255/head -> origin/gh/laithsakka/255/head 2025-09-07T07:55:56.0766279Z * [new branch] gh/laithsakka/255/orig -> origin/gh/laithsakka/255/orig 2025-09-07T07:55:56.0768541Z * [new branch] gh/laithsakka/256/base -> origin/gh/laithsakka/256/base 2025-09-07T07:55:56.0770150Z * [new branch] gh/laithsakka/256/head -> origin/gh/laithsakka/256/head 2025-09-07T07:55:56.0771583Z * [new branch] gh/laithsakka/256/orig -> origin/gh/laithsakka/256/orig 2025-09-07T07:55:56.0774106Z * [new branch] gh/laithsakka/257/base -> origin/gh/laithsakka/257/base 2025-09-07T07:55:56.0775817Z * [new branch] gh/laithsakka/257/head -> origin/gh/laithsakka/257/head 2025-09-07T07:55:56.0777334Z * [new branch] gh/laithsakka/257/orig -> origin/gh/laithsakka/257/orig 2025-09-07T07:55:56.0779668Z * [new branch] gh/laithsakka/258/base -> origin/gh/laithsakka/258/base 2025-09-07T07:55:56.0781189Z * [new branch] gh/laithsakka/258/head -> origin/gh/laithsakka/258/head 2025-09-07T07:55:56.0782737Z * [new branch] gh/laithsakka/258/orig -> origin/gh/laithsakka/258/orig 2025-09-07T07:55:56.0785383Z * [new branch] gh/laithsakka/259/base -> origin/gh/laithsakka/259/base 2025-09-07T07:55:56.0786955Z * [new branch] gh/laithsakka/259/head -> origin/gh/laithsakka/259/head 2025-09-07T07:55:56.0788474Z * [new branch] gh/laithsakka/259/orig -> origin/gh/laithsakka/259/orig 2025-09-07T07:55:56.0790685Z * [new branch] gh/laithsakka/260/base -> origin/gh/laithsakka/260/base 2025-09-07T07:55:56.0792225Z * [new branch] gh/laithsakka/260/head -> origin/gh/laithsakka/260/head 2025-09-07T07:55:56.0793929Z * [new branch] gh/laithsakka/260/orig -> origin/gh/laithsakka/260/orig 2025-09-07T07:55:56.0796396Z * [new branch] gh/laithsakka/261/base -> origin/gh/laithsakka/261/base 2025-09-07T07:55:56.0797997Z * [new branch] gh/laithsakka/261/head -> origin/gh/laithsakka/261/head 2025-09-07T07:55:56.0799495Z * [new branch] gh/laithsakka/261/orig -> origin/gh/laithsakka/261/orig 2025-09-07T07:55:56.0802051Z * [new branch] gh/laithsakka/262/base -> origin/gh/laithsakka/262/base 2025-09-07T07:55:56.0804303Z * [new branch] gh/laithsakka/262/head -> origin/gh/laithsakka/262/head 2025-09-07T07:55:56.0806213Z * [new branch] gh/laithsakka/262/orig -> origin/gh/laithsakka/262/orig 2025-09-07T07:55:56.0808317Z * [new branch] gh/laithsakka/263/base -> origin/gh/laithsakka/263/base 2025-09-07T07:55:56.0809826Z * [new branch] gh/laithsakka/263/head -> origin/gh/laithsakka/263/head 2025-09-07T07:55:56.0811369Z * [new branch] gh/laithsakka/263/orig -> origin/gh/laithsakka/263/orig 2025-09-07T07:55:56.0813535Z * [new branch] gh/laithsakka/264/base -> origin/gh/laithsakka/264/base 2025-09-07T07:55:56.0815438Z * [new branch] gh/laithsakka/264/head -> origin/gh/laithsakka/264/head 2025-09-07T07:55:56.0816922Z * [new branch] gh/laithsakka/264/orig -> origin/gh/laithsakka/264/orig 2025-09-07T07:55:56.0819438Z * [new branch] gh/laithsakka/265/base -> origin/gh/laithsakka/265/base 2025-09-07T07:55:56.0820949Z * [new branch] gh/laithsakka/265/head -> origin/gh/laithsakka/265/head 2025-09-07T07:55:56.0822447Z * [new branch] gh/laithsakka/265/orig -> origin/gh/laithsakka/265/orig 2025-09-07T07:55:56.0825187Z * [new branch] gh/laithsakka/266/base -> origin/gh/laithsakka/266/base 2025-09-07T07:55:56.0826768Z * [new branch] gh/laithsakka/266/head -> origin/gh/laithsakka/266/head 2025-09-07T07:55:56.5399152Z * [new branch] gh/laithsakka/266/orig -> origin/gh/laithsakka/266/orig 2025-09-07T07:55:56.5402957Z * [new branch] gh/laithsakka/267/base -> origin/gh/laithsakka/267/base 2025-09-07T07:55:56.5404907Z * [new branch] gh/laithsakka/267/head -> origin/gh/laithsakka/267/head 2025-09-07T07:55:56.5406570Z * [new branch] gh/laithsakka/267/orig -> origin/gh/laithsakka/267/orig 2025-09-07T07:55:56.5409027Z * [new branch] gh/laithsakka/268/base -> origin/gh/laithsakka/268/base 2025-09-07T07:55:56.5410545Z * [new branch] gh/laithsakka/268/head -> origin/gh/laithsakka/268/head 2025-09-07T07:55:56.5412287Z * [new branch] gh/laithsakka/268/orig -> origin/gh/laithsakka/268/orig 2025-09-07T07:55:56.5415078Z * [new branch] gh/laithsakka/28/base -> origin/gh/laithsakka/28/base 2025-09-07T07:55:56.5417364Z * [new branch] gh/laithsakka/29/base -> origin/gh/laithsakka/29/base 2025-09-07T07:55:56.5419535Z * [new branch] gh/laithsakka/30/base -> origin/gh/laithsakka/30/base 2025-09-07T07:55:56.5421100Z * [new branch] gh/laithsakka/30/head -> origin/gh/laithsakka/30/head 2025-09-07T07:55:56.5423291Z * [new branch] gh/laithsakka/31/base -> origin/gh/laithsakka/31/base 2025-09-07T07:55:56.5425173Z * [new branch] gh/laithsakka/31/head -> origin/gh/laithsakka/31/head 2025-09-07T07:55:56.5427265Z * [new branch] gh/laithsakka/32/base -> origin/gh/laithsakka/32/base 2025-09-07T07:55:56.5428724Z * [new branch] gh/laithsakka/32/head -> origin/gh/laithsakka/32/head 2025-09-07T07:55:56.5433071Z * [new branch] gh/lucaskabela/1/base -> origin/gh/lucaskabela/1/base 2025-09-07T07:55:56.5434995Z * [new branch] gh/lucaskabela/1/head -> origin/gh/lucaskabela/1/head 2025-09-07T07:55:56.5437401Z * [new branch] gh/lucaskabela/10/base -> origin/gh/lucaskabela/10/base 2025-09-07T07:55:56.5438959Z * [new branch] gh/lucaskabela/10/head -> origin/gh/lucaskabela/10/head 2025-09-07T07:55:56.5440533Z * [new branch] gh/lucaskabela/10/orig -> origin/gh/lucaskabela/10/orig 2025-09-07T07:55:56.5442637Z * [new branch] gh/lucaskabela/11/base -> origin/gh/lucaskabela/11/base 2025-09-07T07:55:56.5444958Z * [new branch] gh/lucaskabela/11/head -> origin/gh/lucaskabela/11/head 2025-09-07T07:55:56.5446203Z * [new branch] gh/lucaskabela/11/orig -> origin/gh/lucaskabela/11/orig 2025-09-07T07:55:56.5448456Z * [new branch] gh/lucaskabela/12/base -> origin/gh/lucaskabela/12/base 2025-09-07T07:55:56.5450015Z * [new branch] gh/lucaskabela/12/head -> origin/gh/lucaskabela/12/head 2025-09-07T07:55:56.5451606Z * [new branch] gh/lucaskabela/12/orig -> origin/gh/lucaskabela/12/orig 2025-09-07T07:55:56.5453905Z * [new branch] gh/lucaskabela/13/base -> origin/gh/lucaskabela/13/base 2025-09-07T07:55:56.5455758Z * [new branch] gh/lucaskabela/13/head -> origin/gh/lucaskabela/13/head 2025-09-07T07:55:56.5457216Z * [new branch] gh/lucaskabela/13/orig -> origin/gh/lucaskabela/13/orig 2025-09-07T07:55:56.5459345Z * [new branch] gh/lucaskabela/14/base -> origin/gh/lucaskabela/14/base 2025-09-07T07:55:56.5460922Z * [new branch] gh/lucaskabela/14/head -> origin/gh/lucaskabela/14/head 2025-09-07T07:55:56.5462533Z * [new branch] gh/lucaskabela/14/orig -> origin/gh/lucaskabela/14/orig 2025-09-07T07:55:56.5465045Z * [new branch] gh/lucaskabela/15/base -> origin/gh/lucaskabela/15/base 2025-09-07T07:55:56.5466611Z * [new branch] gh/lucaskabela/15/head -> origin/gh/lucaskabela/15/head 2025-09-07T07:55:56.5468097Z * [new branch] gh/lucaskabela/15/orig -> origin/gh/lucaskabela/15/orig 2025-09-07T07:55:56.5470193Z * [new branch] gh/lucaskabela/16/base -> origin/gh/lucaskabela/16/base 2025-09-07T07:55:56.5471831Z * [new branch] gh/lucaskabela/16/head -> origin/gh/lucaskabela/16/head 2025-09-07T07:55:56.5473375Z * [new branch] gh/lucaskabela/16/orig -> origin/gh/lucaskabela/16/orig 2025-09-07T07:55:56.5475997Z * [new branch] gh/lucaskabela/17/base -> origin/gh/lucaskabela/17/base 2025-09-07T07:55:56.5477491Z * [new branch] gh/lucaskabela/17/head -> origin/gh/lucaskabela/17/head 2025-09-07T07:55:56.5479069Z * [new branch] gh/lucaskabela/17/orig -> origin/gh/lucaskabela/17/orig 2025-09-07T07:55:56.5481389Z * [new branch] gh/lucaskabela/2/base -> origin/gh/lucaskabela/2/base 2025-09-07T07:55:56.5482929Z * [new branch] gh/lucaskabela/2/head -> origin/gh/lucaskabela/2/head 2025-09-07T07:55:56.5484826Z * [new branch] gh/lucaskabela/2/orig -> origin/gh/lucaskabela/2/orig 2025-09-07T07:55:56.5487134Z * [new branch] gh/lucaskabela/3/base -> origin/gh/lucaskabela/3/base 2025-09-07T07:55:56.5488640Z * [new branch] gh/lucaskabela/3/head -> origin/gh/lucaskabela/3/head 2025-09-07T07:55:56.5490226Z * [new branch] gh/lucaskabela/3/orig -> origin/gh/lucaskabela/3/orig 2025-09-07T07:55:56.5492439Z * [new branch] gh/lucaskabela/4/base -> origin/gh/lucaskabela/4/base 2025-09-07T07:55:56.5494203Z * [new branch] gh/lucaskabela/4/head -> origin/gh/lucaskabela/4/head 2025-09-07T07:55:56.5495819Z * [new branch] gh/lucaskabela/4/orig -> origin/gh/lucaskabela/4/orig 2025-09-07T07:55:56.5498166Z * [new branch] gh/lucaskabela/5/base -> origin/gh/lucaskabela/5/base 2025-09-07T07:55:56.5499644Z * [new branch] gh/lucaskabela/5/head -> origin/gh/lucaskabela/5/head 2025-09-07T07:55:56.5501194Z * [new branch] gh/lucaskabela/5/orig -> origin/gh/lucaskabela/5/orig 2025-09-07T07:55:56.5503379Z * [new branch] gh/lucaskabela/6/base -> origin/gh/lucaskabela/6/base 2025-09-07T07:55:56.5505304Z * [new branch] gh/lucaskabela/6/head -> origin/gh/lucaskabela/6/head 2025-09-07T07:55:56.5506779Z * [new branch] gh/lucaskabela/6/orig -> origin/gh/lucaskabela/6/orig 2025-09-07T07:55:56.5509276Z * [new branch] gh/lucaskabela/7/base -> origin/gh/lucaskabela/7/base 2025-09-07T07:55:56.5510647Z * [new branch] gh/lucaskabela/7/head -> origin/gh/lucaskabela/7/head 2025-09-07T07:55:56.5512378Z * [new branch] gh/lucaskabela/7/orig -> origin/gh/lucaskabela/7/orig 2025-09-07T07:55:56.5514763Z * [new branch] gh/lucaskabela/8/base -> origin/gh/lucaskabela/8/base 2025-09-07T07:55:56.5516228Z * [new branch] gh/lucaskabela/8/head -> origin/gh/lucaskabela/8/head 2025-09-07T07:55:56.5518158Z * [new branch] gh/lucaskabela/8/orig -> origin/gh/lucaskabela/8/orig 2025-09-07T07:55:56.5520246Z * [new branch] gh/lucaskabela/9/base -> origin/gh/lucaskabela/9/base 2025-09-07T07:55:56.5521756Z * [new branch] gh/lucaskabela/9/head -> origin/gh/lucaskabela/9/head 2025-09-07T07:55:56.5523260Z * [new branch] gh/lucaskabela/9/orig -> origin/gh/lucaskabela/9/orig 2025-09-07T07:55:56.5526326Z * [new branch] gh/lw/3/base -> origin/gh/lw/3/base 2025-09-07T07:55:56.5527937Z * [new branch] gh/lw/3/head -> origin/gh/lw/3/head 2025-09-07T07:55:56.5529387Z * [new branch] gh/lw/3/orig -> origin/gh/lw/3/orig 2025-09-07T07:55:56.5532154Z * [new branch] gh/malfet/14/base -> origin/gh/malfet/14/base 2025-09-07T07:55:56.5534787Z * [new branch] gh/malfet/330/base -> origin/gh/malfet/330/base 2025-09-07T07:55:56.5536326Z * [new branch] gh/malfet/330/head -> origin/gh/malfet/330/head 2025-09-07T07:55:56.5538120Z * [new branch] gh/malfet/330/orig -> origin/gh/malfet/330/orig 2025-09-07T07:55:56.5545880Z * [new branch] gh/malfet/396/base -> origin/gh/malfet/396/base 2025-09-07T07:55:56.5546413Z * [new branch] gh/malfet/396/head -> origin/gh/malfet/396/head 2025-09-07T07:55:56.5546834Z * [new branch] gh/malfet/396/orig -> origin/gh/malfet/396/orig 2025-09-07T07:55:56.5547222Z * [new branch] gh/malfet/397/base -> origin/gh/malfet/397/base 2025-09-07T07:55:56.5547604Z * [new branch] gh/malfet/397/head -> origin/gh/malfet/397/head 2025-09-07T07:55:56.5549239Z * [new branch] gh/malfet/397/orig -> origin/gh/malfet/397/orig 2025-09-07T07:55:56.5551399Z * [new branch] gh/malfet/398/base -> origin/gh/malfet/398/base 2025-09-07T07:55:56.5552878Z * [new branch] gh/malfet/398/head -> origin/gh/malfet/398/head 2025-09-07T07:55:56.5554735Z * [new branch] gh/malfet/398/orig -> origin/gh/malfet/398/orig 2025-09-07T07:55:56.5556906Z * [new branch] gh/malfet/399/base -> origin/gh/malfet/399/base 2025-09-07T07:55:56.5558608Z * [new branch] gh/malfet/399/head -> origin/gh/malfet/399/head 2025-09-07T07:55:56.5560106Z * [new branch] gh/malfet/399/orig -> origin/gh/malfet/399/orig 2025-09-07T07:55:56.5562399Z * [new branch] gh/malfet/414/base -> origin/gh/malfet/414/base 2025-09-07T07:55:56.5564062Z * [new branch] gh/malfet/414/head -> origin/gh/malfet/414/head 2025-09-07T07:55:56.5565884Z * [new branch] gh/malfet/414/orig -> origin/gh/malfet/414/orig 2025-09-07T07:55:56.5568051Z * [new branch] gh/malfet/417/base -> origin/gh/malfet/417/base 2025-09-07T07:55:56.5569585Z * [new branch] gh/malfet/417/head -> origin/gh/malfet/417/head 2025-09-07T07:55:56.5571127Z * [new branch] gh/malfet/417/orig -> origin/gh/malfet/417/orig 2025-09-07T07:55:56.5573310Z * [new branch] gh/malfet/418/base -> origin/gh/malfet/418/base 2025-09-07T07:55:56.5575490Z * [new branch] gh/malfet/418/head -> origin/gh/malfet/418/head 2025-09-07T07:55:56.5576827Z * [new branch] gh/malfet/418/orig -> origin/gh/malfet/418/orig 2025-09-07T07:55:56.5579019Z * [new branch] gh/malfet/475/base -> origin/gh/malfet/475/base 2025-09-07T07:55:56.5580737Z * [new branch] gh/malfet/475/head -> origin/gh/malfet/475/head 2025-09-07T07:55:56.5582322Z * [new branch] gh/malfet/475/orig -> origin/gh/malfet/475/orig 2025-09-07T07:55:56.5584884Z * [new branch] gh/malfet/476/base -> origin/gh/malfet/476/base 2025-09-07T07:55:56.5586419Z * [new branch] gh/malfet/476/head -> origin/gh/malfet/476/head 2025-09-07T07:55:56.5588012Z * [new branch] gh/malfet/476/orig -> origin/gh/malfet/476/orig 2025-09-07T07:55:56.5590094Z * [new branch] gh/malfet/477/base -> origin/gh/malfet/477/base 2025-09-07T07:55:56.5591711Z * [new branch] gh/malfet/477/head -> origin/gh/malfet/477/head 2025-09-07T07:55:56.5593242Z * [new branch] gh/malfet/477/orig -> origin/gh/malfet/477/orig 2025-09-07T07:55:56.5595796Z * [new branch] gh/malfet/478/base -> origin/gh/malfet/478/base 2025-09-07T07:55:56.5597338Z * [new branch] gh/malfet/478/head -> origin/gh/malfet/478/head 2025-09-07T07:55:56.5598947Z * [new branch] gh/malfet/478/orig -> origin/gh/malfet/478/orig 2025-09-07T07:55:56.5601100Z * [new branch] gh/malfet/479/base -> origin/gh/malfet/479/base 2025-09-07T07:55:56.5602670Z * [new branch] gh/malfet/479/head -> origin/gh/malfet/479/head 2025-09-07T07:55:56.5604526Z * [new branch] gh/malfet/479/orig -> origin/gh/malfet/479/orig 2025-09-07T07:55:56.5606822Z * [new branch] gh/malfet/480/base -> origin/gh/malfet/480/base 2025-09-07T07:55:56.5608443Z * [new branch] gh/malfet/480/head -> origin/gh/malfet/480/head 2025-09-07T07:55:56.5611274Z * [new branch] gh/malfet/480/orig -> origin/gh/malfet/480/orig 2025-09-07T07:55:56.5612995Z * [new branch] gh/malfet/481/base -> origin/gh/malfet/481/base 2025-09-07T07:55:56.5613613Z * [new branch] gh/malfet/481/head -> origin/gh/malfet/481/head 2025-09-07T07:55:56.5615592Z * [new branch] gh/malfet/481/orig -> origin/gh/malfet/481/orig 2025-09-07T07:55:56.5617749Z * [new branch] gh/malfet/482/base -> origin/gh/malfet/482/base 2025-09-07T07:55:56.5619348Z * [new branch] gh/malfet/482/head -> origin/gh/malfet/482/head 2025-09-07T07:55:56.5620958Z * [new branch] gh/malfet/482/orig -> origin/gh/malfet/482/orig 2025-09-07T07:55:56.5623215Z * [new branch] gh/malfet/483/base -> origin/gh/malfet/483/base 2025-09-07T07:55:56.5625073Z * [new branch] gh/malfet/483/head -> origin/gh/malfet/483/head 2025-09-07T07:55:56.5626576Z * [new branch] gh/malfet/483/orig -> origin/gh/malfet/483/orig 2025-09-07T07:55:56.5628844Z * [new branch] gh/malfet/484/base -> origin/gh/malfet/484/base 2025-09-07T07:55:56.5630598Z * [new branch] gh/malfet/484/head -> origin/gh/malfet/484/head 2025-09-07T07:55:56.5632182Z * [new branch] gh/malfet/484/orig -> origin/gh/malfet/484/orig 2025-09-07T07:55:56.5634718Z * [new branch] gh/malfet/485/base -> origin/gh/malfet/485/base 2025-09-07T07:55:56.5636256Z * [new branch] gh/malfet/485/head -> origin/gh/malfet/485/head 2025-09-07T07:55:56.5637990Z * [new branch] gh/malfet/485/orig -> origin/gh/malfet/485/orig 2025-09-07T07:55:56.5640273Z * [new branch] gh/malfet/486/base -> origin/gh/malfet/486/base 2025-09-07T07:55:56.5642078Z * [new branch] gh/malfet/486/head -> origin/gh/malfet/486/head 2025-09-07T07:55:56.5643421Z * [new branch] gh/malfet/486/orig -> origin/gh/malfet/486/orig 2025-09-07T07:55:56.5645982Z * [new branch] gh/malfet/487/base -> origin/gh/malfet/487/base 2025-09-07T07:55:56.5647455Z * [new branch] gh/malfet/487/head -> origin/gh/malfet/487/head 2025-09-07T07:55:56.5648942Z * [new branch] gh/malfet/487/orig -> origin/gh/malfet/487/orig 2025-09-07T07:55:56.5651209Z * [new branch] gh/malfet/488/base -> origin/gh/malfet/488/base 2025-09-07T07:55:56.5652727Z * [new branch] gh/malfet/488/head -> origin/gh/malfet/488/head 2025-09-07T07:55:56.5654657Z * [new branch] gh/malfet/488/orig -> origin/gh/malfet/488/orig 2025-09-07T07:55:56.5657031Z * [new branch] gh/malfet/489/base -> origin/gh/malfet/489/base 2025-09-07T07:55:56.5658549Z * [new branch] gh/malfet/489/head -> origin/gh/malfet/489/head 2025-09-07T07:55:56.5660190Z * [new branch] gh/malfet/489/orig -> origin/gh/malfet/489/orig 2025-09-07T07:55:56.5662492Z * [new branch] gh/malfet/490/base -> origin/gh/malfet/490/base 2025-09-07T07:55:56.5664363Z * [new branch] gh/malfet/490/head -> origin/gh/malfet/490/head 2025-09-07T07:55:56.5666076Z * [new branch] gh/malfet/490/orig -> origin/gh/malfet/490/orig 2025-09-07T07:55:56.5668262Z * [new branch] gh/malfet/491/base -> origin/gh/malfet/491/base 2025-09-07T07:55:56.5669858Z * [new branch] gh/malfet/491/head -> origin/gh/malfet/491/head 2025-09-07T07:55:56.5671496Z * [new branch] gh/malfet/491/orig -> origin/gh/malfet/491/orig 2025-09-07T07:55:56.5673850Z * [new branch] gh/malfet/492/base -> origin/gh/malfet/492/base 2025-09-07T07:55:56.5675814Z * [new branch] gh/malfet/492/head -> origin/gh/malfet/492/head 2025-09-07T07:55:56.5677285Z * [new branch] gh/malfet/492/orig -> origin/gh/malfet/492/orig 2025-09-07T07:55:56.5679581Z * [new branch] gh/malfet/493/base -> origin/gh/malfet/493/base 2025-09-07T07:55:56.5681117Z * [new branch] gh/malfet/493/head -> origin/gh/malfet/493/head 2025-09-07T07:55:56.5682644Z * [new branch] gh/malfet/493/orig -> origin/gh/malfet/493/orig 2025-09-07T07:55:56.5685229Z * [new branch] gh/malfet/494/base -> origin/gh/malfet/494/base 2025-09-07T07:55:56.5686727Z * [new branch] gh/malfet/494/head -> origin/gh/malfet/494/head 2025-09-07T07:55:56.5688391Z * [new branch] gh/malfet/494/orig -> origin/gh/malfet/494/orig 2025-09-07T07:55:56.5690472Z * [new branch] gh/malfet/495/base -> origin/gh/malfet/495/base 2025-09-07T07:55:56.5692064Z * [new branch] gh/malfet/495/head -> origin/gh/malfet/495/head 2025-09-07T07:55:56.5693576Z * [new branch] gh/malfet/495/orig -> origin/gh/malfet/495/orig 2025-09-07T07:55:56.5696334Z * [new branch] gh/malfet/496/base -> origin/gh/malfet/496/base 2025-09-07T07:55:56.5697761Z * [new branch] gh/malfet/496/head -> origin/gh/malfet/496/head 2025-09-07T07:55:56.5699251Z * [new branch] gh/malfet/496/orig -> origin/gh/malfet/496/orig 2025-09-07T07:55:56.5701483Z * [new branch] gh/malfet/497/base -> origin/gh/malfet/497/base 2025-09-07T07:55:56.5703092Z * [new branch] gh/malfet/497/head -> origin/gh/malfet/497/head 2025-09-07T07:55:56.5705079Z * [new branch] gh/malfet/497/orig -> origin/gh/malfet/497/orig 2025-09-07T07:55:56.5707468Z * [new branch] gh/malfet/498/base -> origin/gh/malfet/498/base 2025-09-07T07:55:56.5708843Z * [new branch] gh/malfet/498/head -> origin/gh/malfet/498/head 2025-09-07T07:55:56.5710352Z * [new branch] gh/malfet/498/orig -> origin/gh/malfet/498/orig 2025-09-07T07:55:56.5712554Z * [new branch] gh/malfet/499/base -> origin/gh/malfet/499/base 2025-09-07T07:55:56.5714332Z * [new branch] gh/malfet/499/head -> origin/gh/malfet/499/head 2025-09-07T07:55:56.5715931Z * [new branch] gh/malfet/499/orig -> origin/gh/malfet/499/orig 2025-09-07T07:55:56.5718347Z * [new branch] gh/malfet/500/base -> origin/gh/malfet/500/base 2025-09-07T07:55:56.5719821Z * [new branch] gh/malfet/500/head -> origin/gh/malfet/500/head 2025-09-07T07:55:56.5721363Z * [new branch] gh/malfet/500/orig -> origin/gh/malfet/500/orig 2025-09-07T07:55:56.5724047Z * [new branch] gh/malfet/501/base -> origin/gh/malfet/501/base 2025-09-07T07:55:56.5725787Z * [new branch] gh/malfet/501/head -> origin/gh/malfet/501/head 2025-09-07T07:55:56.5727261Z * [new branch] gh/malfet/501/orig -> origin/gh/malfet/501/orig 2025-09-07T07:55:56.5729528Z * [new branch] gh/malfet/502/base -> origin/gh/malfet/502/base 2025-09-07T07:55:56.5731129Z * [new branch] gh/malfet/502/head -> origin/gh/malfet/502/head 2025-09-07T07:55:56.5732776Z * [new branch] gh/malfet/502/orig -> origin/gh/malfet/502/orig 2025-09-07T07:55:56.5735565Z * [new branch] gh/malfet/503/base -> origin/gh/malfet/503/base 2025-09-07T07:55:56.5737043Z * [new branch] gh/malfet/503/head -> origin/gh/malfet/503/head 2025-09-07T07:55:56.5738612Z * [new branch] gh/malfet/503/orig -> origin/gh/malfet/503/orig 2025-09-07T07:55:56.5740853Z * [new branch] gh/malfet/504/base -> origin/gh/malfet/504/base 2025-09-07T07:55:56.5742354Z * [new branch] gh/malfet/504/head -> origin/gh/malfet/504/head 2025-09-07T07:55:56.5744018Z * [new branch] gh/malfet/504/orig -> origin/gh/malfet/504/orig 2025-09-07T07:55:56.5746607Z * [new branch] gh/malfet/505/base -> origin/gh/malfet/505/base 2025-09-07T07:55:56.5748093Z * [new branch] gh/malfet/505/head -> origin/gh/malfet/505/head 2025-09-07T07:55:56.5749588Z * [new branch] gh/malfet/505/orig -> origin/gh/malfet/505/orig 2025-09-07T07:55:56.5751975Z * [new branch] gh/malfet/506/base -> origin/gh/malfet/506/base 2025-09-07T07:55:56.5753514Z * [new branch] gh/malfet/506/head -> origin/gh/malfet/506/head 2025-09-07T07:55:56.5755374Z * [new branch] gh/malfet/506/orig -> origin/gh/malfet/506/orig 2025-09-07T07:55:56.5757707Z * [new branch] gh/malfet/507/base -> origin/gh/malfet/507/base 2025-09-07T07:55:56.5759271Z * [new branch] gh/malfet/507/head -> origin/gh/malfet/507/head 2025-09-07T07:55:56.5760864Z * [new branch] gh/malfet/507/orig -> origin/gh/malfet/507/orig 2025-09-07T07:55:56.5763283Z * [new branch] gh/malfet/508/base -> origin/gh/malfet/508/base 2025-09-07T07:55:56.5765167Z * [new branch] gh/malfet/508/head -> origin/gh/malfet/508/head 2025-09-07T07:55:56.5766688Z * [new branch] gh/malfet/508/orig -> origin/gh/malfet/508/orig 2025-09-07T07:55:56.5768899Z * [new branch] gh/malfet/509/base -> origin/gh/malfet/509/base 2025-09-07T07:55:56.5770468Z * [new branch] gh/malfet/509/head -> origin/gh/malfet/509/head 2025-09-07T07:55:56.5772106Z * [new branch] gh/malfet/509/orig -> origin/gh/malfet/509/orig 2025-09-07T07:55:56.5774919Z * [new branch] gh/malfet/510/base -> origin/gh/malfet/510/base 2025-09-07T07:55:56.5776285Z * [new branch] gh/malfet/510/head -> origin/gh/malfet/510/head 2025-09-07T07:55:56.5777784Z * [new branch] gh/malfet/510/orig -> origin/gh/malfet/510/orig 2025-09-07T07:55:56.5780058Z * [new branch] gh/malfet/511/base -> origin/gh/malfet/511/base 2025-09-07T07:55:56.5781565Z * [new branch] gh/malfet/511/head -> origin/gh/malfet/511/head 2025-09-07T07:55:56.5783115Z * [new branch] gh/malfet/511/orig -> origin/gh/malfet/511/orig 2025-09-07T07:55:56.5785709Z * [new branch] gh/malfet/512/base -> origin/gh/malfet/512/base 2025-09-07T07:55:56.5787252Z * [new branch] gh/malfet/512/head -> origin/gh/malfet/512/head 2025-09-07T07:55:56.5788787Z * [new branch] gh/malfet/512/orig -> origin/gh/malfet/512/orig 2025-09-07T07:55:56.5791102Z * [new branch] gh/malfet/513/base -> origin/gh/malfet/513/base 2025-09-07T07:55:56.5792628Z * [new branch] gh/malfet/513/head -> origin/gh/malfet/513/head 2025-09-07T07:55:56.5794443Z * [new branch] gh/malfet/513/orig -> origin/gh/malfet/513/orig 2025-09-07T07:55:56.5796726Z * [new branch] gh/malfet/64/base -> origin/gh/malfet/64/base 2025-09-07T07:55:56.5798496Z * [new branch] gh/malfet/64/head -> origin/gh/malfet/64/head 2025-09-07T07:55:56.5801274Z * [new branch] gh/manuelcandales/10/base -> origin/gh/manuelcandales/10/base 2025-09-07T07:55:56.5802769Z * [new branch] gh/manuelcandales/10/head -> origin/gh/manuelcandales/10/head 2025-09-07T07:55:56.5804546Z * [new branch] gh/manuelcandales/10/orig -> origin/gh/manuelcandales/10/orig 2025-09-07T07:55:56.5806906Z * [new branch] gh/manuelcandales/11/base -> origin/gh/manuelcandales/11/base 2025-09-07T07:55:56.5808435Z * [new branch] gh/manuelcandales/11/head -> origin/gh/manuelcandales/11/head 2025-09-07T07:55:56.5810153Z * [new branch] gh/manuelcandales/11/orig -> origin/gh/manuelcandales/11/orig 2025-09-07T07:55:56.5812219Z * [new branch] gh/manuelcandales/9/base -> origin/gh/manuelcandales/9/base 2025-09-07T07:55:56.5813911Z * [new branch] gh/manuelcandales/9/head -> origin/gh/manuelcandales/9/head 2025-09-07T07:55:56.5815640Z * [new branch] gh/manuelcandales/9/orig -> origin/gh/manuelcandales/9/orig 2025-09-07T07:55:56.5818649Z * [new branch] gh/markkm/1/base -> origin/gh/markkm/1/base 2025-09-07T07:55:56.5821673Z * [new branch] gh/masnesral/204/base -> origin/gh/masnesral/204/base 2025-09-07T07:55:56.5823487Z * [new branch] gh/masnesral/204/head -> origin/gh/masnesral/204/head 2025-09-07T07:55:56.5825813Z * [new branch] gh/masnesral/204/orig -> origin/gh/masnesral/204/orig 2025-09-07T07:55:56.5827725Z * [new branch] gh/masnesral/235/base -> origin/gh/masnesral/235/base 2025-09-07T07:55:56.5829311Z * [new branch] gh/masnesral/235/head -> origin/gh/masnesral/235/head 2025-09-07T07:55:56.5830940Z * [new branch] gh/masnesral/235/orig -> origin/gh/masnesral/235/orig 2025-09-07T07:55:56.5833214Z * [new branch] gh/masnesral/34/base -> origin/gh/masnesral/34/base 2025-09-07T07:55:56.5836399Z * [new branch] gh/mhorowitz/0/base -> origin/gh/mhorowitz/0/base 2025-09-07T07:55:56.5837982Z * [new branch] gh/mhorowitz/0/head -> origin/gh/mhorowitz/0/head 2025-09-07T07:55:56.5840078Z * [new branch] gh/mhorowitz/1/base -> origin/gh/mhorowitz/1/base 2025-09-07T07:55:56.5841704Z * [new branch] gh/mhorowitz/1/head -> origin/gh/mhorowitz/1/head 2025-09-07T07:55:56.5844086Z * [new branch] gh/mhorowitz/2/base -> origin/gh/mhorowitz/2/base 2025-09-07T07:55:56.5845690Z * [new branch] gh/mhorowitz/2/head -> origin/gh/mhorowitz/2/head 2025-09-07T07:55:56.5847832Z * [new branch] gh/mhorowitz/3/base -> origin/gh/mhorowitz/3/base 2025-09-07T07:55:56.5849365Z * [new branch] gh/mhorowitz/3/head -> origin/gh/mhorowitz/3/head 2025-09-07T07:55:56.5851553Z * [new branch] gh/mhorowitz/4/base -> origin/gh/mhorowitz/4/base 2025-09-07T07:55:56.5853085Z * [new branch] gh/mhorowitz/4/head -> origin/gh/mhorowitz/4/head 2025-09-07T07:55:56.5855535Z * [new branch] gh/mhorowitz/5/base -> origin/gh/mhorowitz/5/base 2025-09-07T07:55:56.5856948Z * [new branch] gh/mhorowitz/5/head -> origin/gh/mhorowitz/5/head 2025-09-07T07:55:56.5859167Z * [new branch] gh/mhorowitz/6/base -> origin/gh/mhorowitz/6/base 2025-09-07T07:55:56.5860622Z * [new branch] gh/mhorowitz/6/head -> origin/gh/mhorowitz/6/head 2025-09-07T07:55:56.5863614Z * [new branch] gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base 2025-09-07T07:55:56.5865567Z * [new branch] gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head 2025-09-07T07:55:56.5867894Z * [new branch] gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base 2025-09-07T07:55:56.5869351Z * [new branch] gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head 2025-09-07T07:55:56.5871536Z * [new branch] gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base 2025-09-07T07:55:56.5873026Z * [new branch] gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head 2025-09-07T07:55:56.5875701Z * [new branch] gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base 2025-09-07T07:55:56.5877224Z * [new branch] gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head 2025-09-07T07:55:56.5879589Z * [new branch] gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base 2025-09-07T07:55:56.5881193Z * [new branch] gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head 2025-09-07T07:55:56.5883455Z * [new branch] gh/mikaylagawarecki/317/base -> origin/gh/mikaylagawarecki/317/base 2025-09-07T07:55:56.5885392Z * [new branch] gh/mikaylagawarecki/317/head -> origin/gh/mikaylagawarecki/317/head 2025-09-07T07:55:56.5886883Z * [new branch] gh/mikaylagawarecki/317/orig -> origin/gh/mikaylagawarecki/317/orig 2025-09-07T07:55:56.5889249Z * [new branch] gh/mikaylagawarecki/320/base -> origin/gh/mikaylagawarecki/320/base 2025-09-07T07:55:56.5890879Z * [new branch] gh/mikaylagawarecki/320/head -> origin/gh/mikaylagawarecki/320/head 2025-09-07T07:55:56.5892441Z * [new branch] gh/mikaylagawarecki/320/orig -> origin/gh/mikaylagawarecki/320/orig 2025-09-07T07:55:56.5895117Z * [new branch] gh/mikaylagawarecki/329/base -> origin/gh/mikaylagawarecki/329/base 2025-09-07T07:55:56.5896635Z * [new branch] gh/mikaylagawarecki/329/head -> origin/gh/mikaylagawarecki/329/head 2025-09-07T07:55:56.5898143Z * [new branch] gh/mikaylagawarecki/329/orig -> origin/gh/mikaylagawarecki/329/orig 2025-09-07T07:55:56.5900373Z * [new branch] gh/mikaylagawarecki/330/base -> origin/gh/mikaylagawarecki/330/base 2025-09-07T07:55:56.5901933Z * [new branch] gh/mikaylagawarecki/330/head -> origin/gh/mikaylagawarecki/330/head 2025-09-07T07:55:56.5903460Z * [new branch] gh/mikaylagawarecki/330/orig -> origin/gh/mikaylagawarecki/330/orig 2025-09-07T07:55:56.5906153Z * [new branch] gh/mikaylagawarecki/331/base -> origin/gh/mikaylagawarecki/331/base 2025-09-07T07:55:56.5907928Z * [new branch] gh/mikaylagawarecki/331/head -> origin/gh/mikaylagawarecki/331/head 2025-09-07T07:55:56.5909115Z * [new branch] gh/mikaylagawarecki/331/orig -> origin/gh/mikaylagawarecki/331/orig 2025-09-07T07:55:56.5911702Z * [new branch] gh/mikaylagawarecki/332/base -> origin/gh/mikaylagawarecki/332/base 2025-09-07T07:55:56.5913371Z * [new branch] gh/mikaylagawarecki/332/head -> origin/gh/mikaylagawarecki/332/head 2025-09-07T07:55:56.5915231Z * [new branch] gh/mikaylagawarecki/332/orig -> origin/gh/mikaylagawarecki/332/orig 2025-09-07T07:55:56.5917733Z * [new branch] gh/mikaylagawarecki/334/base -> origin/gh/mikaylagawarecki/334/base 2025-09-07T07:55:56.5919270Z * [new branch] gh/mikaylagawarecki/334/head -> origin/gh/mikaylagawarecki/334/head 2025-09-07T07:55:56.5920811Z * [new branch] gh/mikaylagawarecki/334/orig -> origin/gh/mikaylagawarecki/334/orig 2025-09-07T07:55:56.5923083Z * [new branch] gh/mikaylagawarecki/335/base -> origin/gh/mikaylagawarecki/335/base 2025-09-07T07:55:56.5924980Z * [new branch] gh/mikaylagawarecki/335/head -> origin/gh/mikaylagawarecki/335/head 2025-09-07T07:55:56.5926577Z * [new branch] gh/mikaylagawarecki/335/orig -> origin/gh/mikaylagawarecki/335/orig 2025-09-07T07:55:56.5928836Z * [new branch] gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base 2025-09-07T07:55:56.5930419Z * [new branch] gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head 2025-09-07T07:55:56.5932052Z * [new branch] gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig 2025-09-07T07:55:56.5934453Z * [new branch] gh/mikaylagawarecki/337/base -> origin/gh/mikaylagawarecki/337/base 2025-09-07T07:55:56.5936054Z * [new branch] gh/mikaylagawarecki/337/head -> origin/gh/mikaylagawarecki/337/head 2025-09-07T07:55:56.5937559Z * [new branch] gh/mikaylagawarecki/337/orig -> origin/gh/mikaylagawarecki/337/orig 2025-09-07T07:55:56.5939812Z * [new branch] gh/mikaylagawarecki/338/base -> origin/gh/mikaylagawarecki/338/base 2025-09-07T07:55:56.5941476Z * [new branch] gh/mikaylagawarecki/338/head -> origin/gh/mikaylagawarecki/338/head 2025-09-07T07:55:56.5942996Z * [new branch] gh/mikaylagawarecki/338/orig -> origin/gh/mikaylagawarecki/338/orig 2025-09-07T07:55:56.5945505Z * [new branch] gh/mikaylagawarecki/339/base -> origin/gh/mikaylagawarecki/339/base 2025-09-07T07:55:56.5947047Z * [new branch] gh/mikaylagawarecki/339/head -> origin/gh/mikaylagawarecki/339/head 2025-09-07T07:55:56.5948546Z * [new branch] gh/mikaylagawarecki/339/orig -> origin/gh/mikaylagawarecki/339/orig 2025-09-07T07:55:56.5951483Z * [new branch] gh/mlazos/1/base -> origin/gh/mlazos/1/base 2025-09-07T07:55:56.5953124Z * [new branch] gh/mlazos/1/head -> origin/gh/mlazos/1/head 2025-09-07T07:55:56.5955375Z * [new branch] gh/mlazos/1/orig -> origin/gh/mlazos/1/orig 2025-09-07T07:55:56.5957735Z * [new branch] gh/mlazos/12/base -> origin/gh/mlazos/12/base 2025-09-07T07:55:56.5959232Z * [new branch] gh/mlazos/12/head -> origin/gh/mlazos/12/head 2025-09-07T07:55:56.5960813Z * [new branch] gh/mlazos/12/orig -> origin/gh/mlazos/12/orig 2025-09-07T07:55:56.5963068Z * [new branch] gh/mlazos/13/base -> origin/gh/mlazos/13/base 2025-09-07T07:55:56.5965042Z * [new branch] gh/mlazos/13/head -> origin/gh/mlazos/13/head 2025-09-07T07:55:56.5966519Z * [new branch] gh/mlazos/13/orig -> origin/gh/mlazos/13/orig 2025-09-07T07:55:56.5968852Z * [new branch] gh/mlazos/14/base -> origin/gh/mlazos/14/base 2025-09-07T07:55:56.5970385Z * [new branch] gh/mlazos/14/head -> origin/gh/mlazos/14/head 2025-09-07T07:55:56.5972208Z * [new branch] gh/mlazos/14/orig -> origin/gh/mlazos/14/orig 2025-09-07T07:55:56.5974762Z * [new branch] gh/mlazos/15/base -> origin/gh/mlazos/15/base 2025-09-07T07:55:56.5976285Z * [new branch] gh/mlazos/15/head -> origin/gh/mlazos/15/head 2025-09-07T07:55:56.5977830Z * [new branch] gh/mlazos/15/orig -> origin/gh/mlazos/15/orig 2025-09-07T07:55:56.5980102Z * [new branch] gh/mlazos/16/base -> origin/gh/mlazos/16/base 2025-09-07T07:55:56.5981774Z * [new branch] gh/mlazos/16/head -> origin/gh/mlazos/16/head 2025-09-07T07:55:56.5983336Z * [new branch] gh/mlazos/16/orig -> origin/gh/mlazos/16/orig 2025-09-07T07:55:56.5985918Z * [new branch] gh/mlazos/17/base -> origin/gh/mlazos/17/base 2025-09-07T07:55:56.5987288Z * [new branch] gh/mlazos/17/head -> origin/gh/mlazos/17/head 2025-09-07T07:55:56.5988865Z * [new branch] gh/mlazos/17/orig -> origin/gh/mlazos/17/orig 2025-09-07T07:55:56.5991172Z * [new branch] gh/mlazos/2/base -> origin/gh/mlazos/2/base 2025-09-07T07:55:56.5992694Z * [new branch] gh/mlazos/2/head -> origin/gh/mlazos/2/head 2025-09-07T07:55:56.5994501Z * [new branch] gh/mlazos/2/orig -> origin/gh/mlazos/2/orig 2025-09-07T07:55:56.5996825Z * [new branch] gh/mlazos/3/base -> origin/gh/mlazos/3/base 2025-09-07T07:55:56.5998460Z * [new branch] gh/mlazos/3/head -> origin/gh/mlazos/3/head 2025-09-07T07:55:56.6000029Z * [new branch] gh/mlazos/3/orig -> origin/gh/mlazos/3/orig 2025-09-07T07:55:56.6002904Z * [new branch] gh/mrmiywj/1/base -> origin/gh/mrmiywj/1/base 2025-09-07T07:55:56.6004797Z * [new branch] gh/mrmiywj/1/head -> origin/gh/mrmiywj/1/head 2025-09-07T07:55:56.6007751Z * [new branch] gh/muchulee8/62/base -> origin/gh/muchulee8/62/base 2025-09-07T07:55:56.6009415Z * [new branch] gh/muchulee8/62/head -> origin/gh/muchulee8/62/head 2025-09-07T07:55:56.6011034Z * [new branch] gh/muchulee8/62/orig -> origin/gh/muchulee8/62/orig 2025-09-07T07:55:56.6013460Z * [new branch] gh/muchulee8/63/base -> origin/gh/muchulee8/63/base 2025-09-07T07:55:56.6015403Z * [new branch] gh/muchulee8/63/head -> origin/gh/muchulee8/63/head 2025-09-07T07:55:56.6016903Z * [new branch] gh/muchulee8/63/orig -> origin/gh/muchulee8/63/orig 2025-09-07T07:55:56.6019333Z * [new branch] gh/muchulee8/64/base -> origin/gh/muchulee8/64/base 2025-09-07T07:55:56.6020892Z * [new branch] gh/muchulee8/64/head -> origin/gh/muchulee8/64/head 2025-09-07T07:55:56.6022504Z * [new branch] gh/muchulee8/64/orig -> origin/gh/muchulee8/64/orig 2025-09-07T07:55:56.6025284Z * [new branch] gh/muchulee8/65/base -> origin/gh/muchulee8/65/base 2025-09-07T07:55:56.6026713Z * [new branch] gh/muchulee8/65/head -> origin/gh/muchulee8/65/head 2025-09-07T07:55:56.6028364Z * [new branch] gh/muchulee8/65/orig -> origin/gh/muchulee8/65/orig 2025-09-07T07:55:56.6031261Z * [new branch] gh/naveenthangudu/1/base -> origin/gh/naveenthangudu/1/base 2025-09-07T07:55:56.6032795Z * [new branch] gh/naveenthangudu/1/head -> origin/gh/naveenthangudu/1/head 2025-09-07T07:55:56.6034791Z * [new branch] gh/naveenthangudu/1/orig -> origin/gh/naveenthangudu/1/orig 2025-09-07T07:55:56.6036988Z * [new branch] gh/naveenthangudu/2/base -> origin/gh/naveenthangudu/2/base 2025-09-07T07:55:56.6038677Z * [new branch] gh/naveenthangudu/2/head -> origin/gh/naveenthangudu/2/head 2025-09-07T07:55:56.6040412Z * [new branch] gh/naveenthangudu/2/orig -> origin/gh/naveenthangudu/2/orig 2025-09-07T07:55:56.6042585Z * [new branch] gh/naveenthangudu/3/base -> origin/gh/naveenthangudu/3/base 2025-09-07T07:55:56.6044310Z * [new branch] gh/naveenthangudu/3/head -> origin/gh/naveenthangudu/3/head 2025-09-07T07:55:56.6046054Z * [new branch] gh/naveenthangudu/3/orig -> origin/gh/naveenthangudu/3/orig 2025-09-07T07:55:56.6048189Z * [new branch] gh/naveenthangudu/4/base -> origin/gh/naveenthangudu/4/base 2025-09-07T07:55:56.6049685Z * [new branch] gh/naveenthangudu/4/head -> origin/gh/naveenthangudu/4/head 2025-09-07T07:55:56.6051342Z * [new branch] gh/naveenthangudu/4/orig -> origin/gh/naveenthangudu/4/orig 2025-09-07T07:55:56.6053627Z * [new branch] gh/naveenthangudu/5/base -> origin/gh/naveenthangudu/5/base 2025-09-07T07:55:56.6055562Z * [new branch] gh/naveenthangudu/5/head -> origin/gh/naveenthangudu/5/head 2025-09-07T07:55:56.6057097Z * [new branch] gh/naveenthangudu/5/orig -> origin/gh/naveenthangudu/5/orig 2025-09-07T07:55:56.6059323Z * [new branch] gh/naveenthangudu/6/base -> origin/gh/naveenthangudu/6/base 2025-09-07T07:55:56.6061040Z * [new branch] gh/naveenthangudu/6/head -> origin/gh/naveenthangudu/6/head 2025-09-07T07:55:56.6062391Z * [new branch] gh/naveenthangudu/6/orig -> origin/gh/naveenthangudu/6/orig 2025-09-07T07:55:56.6065654Z * [new branch] gh/oulgen/35/base -> origin/gh/oulgen/35/base 2025-09-07T07:55:56.6067148Z * [new branch] gh/oulgen/35/head -> origin/gh/oulgen/35/head 2025-09-07T07:55:56.6068690Z * [new branch] gh/oulgen/35/orig -> origin/gh/oulgen/35/orig 2025-09-07T07:55:56.6070929Z * [new branch] gh/oulgen/48/base -> origin/gh/oulgen/48/base 2025-09-07T07:55:56.6072441Z * [new branch] gh/oulgen/48/head -> origin/gh/oulgen/48/head 2025-09-07T07:55:56.6074273Z * [new branch] gh/oulgen/48/orig -> origin/gh/oulgen/48/orig 2025-09-07T07:55:56.6076693Z * [new branch] gh/oulgen/49/base -> origin/gh/oulgen/49/base 2025-09-07T07:55:56.6078294Z * [new branch] gh/oulgen/49/head -> origin/gh/oulgen/49/head 2025-09-07T07:55:56.6079962Z * [new branch] gh/oulgen/49/orig -> origin/gh/oulgen/49/orig 2025-09-07T07:55:56.6082959Z * [new branch] gh/pearu/108/base -> origin/gh/pearu/108/base 2025-09-07T07:55:56.6084959Z * [new branch] gh/pearu/108/head -> origin/gh/pearu/108/head 2025-09-07T07:55:56.6086583Z * [new branch] gh/pearu/108/orig -> origin/gh/pearu/108/orig 2025-09-07T07:55:56.6088802Z * [new branch] gh/pearu/109/base -> origin/gh/pearu/109/base 2025-09-07T07:55:56.6090387Z * [new branch] gh/pearu/109/head -> origin/gh/pearu/109/head 2025-09-07T07:55:56.6091923Z * [new branch] gh/pearu/109/orig -> origin/gh/pearu/109/orig 2025-09-07T07:55:56.6094266Z * [new branch] gh/pearu/110/base -> origin/gh/pearu/110/base 2025-09-07T07:55:56.6095993Z * [new branch] gh/pearu/110/head -> origin/gh/pearu/110/head 2025-09-07T07:55:56.6097502Z * [new branch] gh/pearu/110/orig -> origin/gh/pearu/110/orig 2025-09-07T07:55:56.6099964Z * [new branch] gh/pearu/111/base -> origin/gh/pearu/111/base 2025-09-07T07:55:56.6101470Z * [new branch] gh/pearu/111/head -> origin/gh/pearu/111/head 2025-09-07T07:55:56.6103004Z * [new branch] gh/pearu/111/orig -> origin/gh/pearu/111/orig 2025-09-07T07:55:56.6105490Z * [new branch] gh/pearu/112/base -> origin/gh/pearu/112/base 2025-09-07T07:55:56.6107195Z * [new branch] gh/pearu/112/head -> origin/gh/pearu/112/head 2025-09-07T07:55:56.6108741Z * [new branch] gh/pearu/112/orig -> origin/gh/pearu/112/orig 2025-09-07T07:55:56.6110954Z * [new branch] gh/pearu/113/base -> origin/gh/pearu/113/base 2025-09-07T07:55:56.6112628Z * [new branch] gh/pearu/113/head -> origin/gh/pearu/113/head 2025-09-07T07:55:56.6114340Z * [new branch] gh/pearu/113/orig -> origin/gh/pearu/113/orig 2025-09-07T07:55:56.6116826Z * [new branch] gh/pearu/114/base -> origin/gh/pearu/114/base 2025-09-07T07:55:56.6118509Z * [new branch] gh/pearu/114/head -> origin/gh/pearu/114/head 2025-09-07T07:55:56.6120148Z * [new branch] gh/pearu/114/orig -> origin/gh/pearu/114/orig 2025-09-07T07:55:56.6122474Z * [new branch] gh/pearu/115/base -> origin/gh/pearu/115/base 2025-09-07T07:55:56.6124159Z * [new branch] gh/pearu/115/head -> origin/gh/pearu/115/head 2025-09-07T07:55:56.6125941Z * [new branch] gh/pearu/115/orig -> origin/gh/pearu/115/orig 2025-09-07T07:55:56.6128086Z * [new branch] gh/pearu/116/base -> origin/gh/pearu/116/base 2025-09-07T07:55:56.6129569Z * [new branch] gh/pearu/116/head -> origin/gh/pearu/116/head 2025-09-07T07:55:56.6131278Z * [new branch] gh/pearu/116/orig -> origin/gh/pearu/116/orig 2025-09-07T07:55:56.6133640Z * [new branch] gh/pearu/117/base -> origin/gh/pearu/117/base 2025-09-07T07:55:56.6135492Z * [new branch] gh/pearu/117/head -> origin/gh/pearu/117/head 2025-09-07T07:55:56.6136999Z * [new branch] gh/pearu/117/orig -> origin/gh/pearu/117/orig 2025-09-07T07:55:56.6139621Z * [new branch] gh/pearu/56/base -> origin/gh/pearu/56/base 2025-09-07T07:55:56.6141328Z * [new branch] gh/pearu/56/head -> origin/gh/pearu/56/head 2025-09-07T07:55:56.6142964Z * [new branch] gh/pearu/56/orig -> origin/gh/pearu/56/orig 2025-09-07T07:55:56.6145887Z * [new branch] gh/pearu/97/base -> origin/gh/pearu/97/base 2025-09-07T07:55:56.6147407Z * [new branch] gh/pearu/97/head -> origin/gh/pearu/97/head 2025-09-07T07:55:56.6149071Z * [new branch] gh/pearu/97/orig -> origin/gh/pearu/97/orig 2025-09-07T07:55:56.6151864Z * [new branch] gh/qqaatw/29/base -> origin/gh/qqaatw/29/base 2025-09-07T07:55:56.6153457Z * [new branch] gh/qqaatw/29/head -> origin/gh/qqaatw/29/head 2025-09-07T07:55:56.6155427Z * [new branch] gh/qqaatw/29/orig -> origin/gh/qqaatw/29/orig 2025-09-07T07:55:56.6157806Z * [new branch] gh/raymo/refresh-script -> origin/gh/raymo/refresh-script 2025-09-07T07:55:56.6160541Z * [new branch] gh/rec/141/base -> origin/gh/rec/141/base 2025-09-07T07:55:56.6162082Z * [new branch] gh/rec/141/head -> origin/gh/rec/141/head 2025-09-07T07:55:56.6164566Z * [new branch] gh/rec/153/base -> origin/gh/rec/153/base 2025-09-07T07:55:56.6166074Z * [new branch] gh/rec/153/head -> origin/gh/rec/153/head 2025-09-07T07:55:56.6167612Z * [new branch] gh/rec/153/orig -> origin/gh/rec/153/orig 2025-09-07T07:55:56.6169836Z * [new branch] gh/rec/154/base -> origin/gh/rec/154/base 2025-09-07T07:55:56.6171404Z * [new branch] gh/rec/154/head -> origin/gh/rec/154/head 2025-09-07T07:55:56.6172986Z * [new branch] gh/rec/154/orig -> origin/gh/rec/154/orig 2025-09-07T07:55:56.6175609Z * [new branch] gh/rec/156/base -> origin/gh/rec/156/base 2025-09-07T07:55:56.6177329Z * [new branch] gh/rec/156/head -> origin/gh/rec/156/head 2025-09-07T07:55:56.6178655Z * [new branch] gh/rec/156/orig -> origin/gh/rec/156/orig 2025-09-07T07:55:56.6180834Z * [new branch] gh/rec/160/base -> origin/gh/rec/160/base 2025-09-07T07:55:56.6182457Z * [new branch] gh/rec/160/head -> origin/gh/rec/160/head 2025-09-07T07:55:56.6184060Z * [new branch] gh/rec/160/orig -> origin/gh/rec/160/orig 2025-09-07T07:55:56.6186469Z * [new branch] gh/rec/162/base -> origin/gh/rec/162/base 2025-09-07T07:55:56.6188099Z * [new branch] gh/rec/162/head -> origin/gh/rec/162/head 2025-09-07T07:55:56.6189674Z * [new branch] gh/rec/162/orig -> origin/gh/rec/162/orig 2025-09-07T07:55:56.6191932Z * [new branch] gh/rec/163/base -> origin/gh/rec/163/base 2025-09-07T07:55:56.6193476Z * [new branch] gh/rec/163/head -> origin/gh/rec/163/head 2025-09-07T07:55:56.6195393Z * [new branch] gh/rec/163/orig -> origin/gh/rec/163/orig 2025-09-07T07:55:56.6197730Z * [new branch] gh/rec/164/base -> origin/gh/rec/164/base 2025-09-07T07:55:56.6199267Z * [new branch] gh/rec/164/head -> origin/gh/rec/164/head 2025-09-07T07:55:56.6200848Z * [new branch] gh/rec/164/orig -> origin/gh/rec/164/orig 2025-09-07T07:55:56.6203147Z * [new branch] gh/rec/165/base -> origin/gh/rec/165/base 2025-09-07T07:55:56.6205271Z * [new branch] gh/rec/165/head -> origin/gh/rec/165/head 2025-09-07T07:55:56.6206739Z * [new branch] gh/rec/165/orig -> origin/gh/rec/165/orig 2025-09-07T07:55:56.6208968Z * [new branch] gh/rec/166/base -> origin/gh/rec/166/base 2025-09-07T07:55:56.6210588Z * [new branch] gh/rec/166/head -> origin/gh/rec/166/head 2025-09-07T07:55:56.6212148Z * [new branch] gh/rec/166/orig -> origin/gh/rec/166/orig 2025-09-07T07:55:56.6215378Z * [new branch] gh/robert-hardwick/1/base -> origin/gh/robert-hardwick/1/base 2025-09-07T07:55:56.6216895Z * [new branch] gh/robert-hardwick/1/head -> origin/gh/robert-hardwick/1/head 2025-09-07T07:55:56.6218515Z * [new branch] gh/robert-hardwick/1/orig -> origin/gh/robert-hardwick/1/orig 2025-09-07T07:55:56.6220752Z * [new branch] gh/robert-hardwick/2/base -> origin/gh/robert-hardwick/2/base 2025-09-07T07:55:56.6222335Z * [new branch] gh/robert-hardwick/2/head -> origin/gh/robert-hardwick/2/head 2025-09-07T07:55:56.6223949Z * [new branch] gh/robert-hardwick/2/orig -> origin/gh/robert-hardwick/2/orig 2025-09-07T07:55:56.6226453Z * [new branch] gh/robert-hardwick/3/base -> origin/gh/robert-hardwick/3/base 2025-09-07T07:55:56.6228020Z * [new branch] gh/robert-hardwick/3/head -> origin/gh/robert-hardwick/3/head 2025-09-07T07:55:56.6229636Z * [new branch] gh/robert-hardwick/3/orig -> origin/gh/robert-hardwick/3/orig 2025-09-07T07:55:56.6231904Z * [new branch] gh/robert-hardwick/4/base -> origin/gh/robert-hardwick/4/base 2025-09-07T07:55:56.6233435Z * [new branch] gh/robert-hardwick/4/head -> origin/gh/robert-hardwick/4/head 2025-09-07T07:55:56.6235308Z * [new branch] gh/robert-hardwick/4/orig -> origin/gh/robert-hardwick/4/orig 2025-09-07T07:55:56.6238132Z * [new branch] gh/rtimpe/1/base -> origin/gh/rtimpe/1/base 2025-09-07T07:55:56.6239697Z * [new branch] gh/rtimpe/1/head -> origin/gh/rtimpe/1/head 2025-09-07T07:55:56.6241944Z * [new branch] gh/rtimpe/10/base -> origin/gh/rtimpe/10/base 2025-09-07T07:55:56.6243487Z * [new branch] gh/rtimpe/10/head -> origin/gh/rtimpe/10/head 2025-09-07T07:55:56.6245481Z * [new branch] gh/rtimpe/10/orig -> origin/gh/rtimpe/10/orig 2025-09-07T07:55:56.6247729Z * [new branch] gh/rtimpe/11/base -> origin/gh/rtimpe/11/base 2025-09-07T07:55:56.6249251Z * [new branch] gh/rtimpe/11/head -> origin/gh/rtimpe/11/head 2025-09-07T07:55:56.6250829Z * [new branch] gh/rtimpe/11/orig -> origin/gh/rtimpe/11/orig 2025-09-07T07:55:56.6253010Z * [new branch] gh/rtimpe/12/base -> origin/gh/rtimpe/12/base 2025-09-07T07:55:56.6254922Z * [new branch] gh/rtimpe/12/head -> origin/gh/rtimpe/12/head 2025-09-07T07:55:56.6256387Z * [new branch] gh/rtimpe/12/orig -> origin/gh/rtimpe/12/orig 2025-09-07T07:55:56.6258623Z * [new branch] gh/rtimpe/13/base -> origin/gh/rtimpe/13/base 2025-09-07T07:55:56.6260149Z * [new branch] gh/rtimpe/13/head -> origin/gh/rtimpe/13/head 2025-09-07T07:55:56.6261721Z * [new branch] gh/rtimpe/13/orig -> origin/gh/rtimpe/13/orig 2025-09-07T07:55:56.6264179Z * [new branch] gh/rtimpe/14/base -> origin/gh/rtimpe/14/base 2025-09-07T07:55:56.6265879Z * [new branch] gh/rtimpe/14/head -> origin/gh/rtimpe/14/head 2025-09-07T07:55:56.6267391Z * [new branch] gh/rtimpe/14/orig -> origin/gh/rtimpe/14/orig 2025-09-07T07:55:56.6269734Z * [new branch] gh/rtimpe/15/base -> origin/gh/rtimpe/15/base 2025-09-07T07:55:56.6271307Z * [new branch] gh/rtimpe/15/head -> origin/gh/rtimpe/15/head 2025-09-07T07:55:56.6272945Z * [new branch] gh/rtimpe/15/orig -> origin/gh/rtimpe/15/orig 2025-09-07T07:55:56.6275526Z * [new branch] gh/rtimpe/2/base -> origin/gh/rtimpe/2/base 2025-09-07T07:55:56.6276981Z * [new branch] gh/rtimpe/2/head -> origin/gh/rtimpe/2/head 2025-09-07T07:55:56.6279261Z * [new branch] gh/rtimpe/3/base -> origin/gh/rtimpe/3/base 2025-09-07T07:55:56.6280724Z * [new branch] gh/rtimpe/3/head -> origin/gh/rtimpe/3/head 2025-09-07T07:55:56.6283002Z * [new branch] gh/rtimpe/4/base -> origin/gh/rtimpe/4/base 2025-09-07T07:55:56.6284937Z * [new branch] gh/rtimpe/4/head -> origin/gh/rtimpe/4/head 2025-09-07T07:55:56.6287172Z * [new branch] gh/rtimpe/9/base -> origin/gh/rtimpe/9/base 2025-09-07T07:55:56.6288694Z * [new branch] gh/rtimpe/9/head -> origin/gh/rtimpe/9/head 2025-09-07T07:55:56.6290205Z * [new branch] gh/rtimpe/9/orig -> origin/gh/rtimpe/9/orig 2025-09-07T07:55:56.6293088Z * [new branch] gh/ruisizhang123/1/base -> origin/gh/ruisizhang123/1/base 2025-09-07T07:55:56.6295077Z * [new branch] gh/ruisizhang123/1/head -> origin/gh/ruisizhang123/1/head 2025-09-07T07:55:56.6296579Z * [new branch] gh/ruisizhang123/1/orig -> origin/gh/ruisizhang123/1/orig 2025-09-07T07:55:56.6298813Z * [new branch] gh/ruisizhang123/4/base -> origin/gh/ruisizhang123/4/base 2025-09-07T07:55:56.6300624Z * [new branch] gh/ruisizhang123/4/head -> origin/gh/ruisizhang123/4/head 2025-09-07T07:55:56.6302218Z * [new branch] gh/ruisizhang123/4/orig -> origin/gh/ruisizhang123/4/orig 2025-09-07T07:55:56.6304759Z * [new branch] gh/ruisizhang123/5/base -> origin/gh/ruisizhang123/5/base 2025-09-07T07:55:56.6306303Z * [new branch] gh/ruisizhang123/5/head -> origin/gh/ruisizhang123/5/head 2025-09-07T07:55:56.6307830Z * [new branch] gh/ruisizhang123/5/orig -> origin/gh/ruisizhang123/5/orig 2025-09-07T07:55:56.6310048Z * [new branch] gh/ruisizhang123/6/base -> origin/gh/ruisizhang123/6/base 2025-09-07T07:55:56.6311812Z * [new branch] gh/ruisizhang123/6/head -> origin/gh/ruisizhang123/6/head 2025-09-07T07:55:56.6313184Z * [new branch] gh/ruisizhang123/6/orig -> origin/gh/ruisizhang123/6/orig 2025-09-07T07:55:56.6315825Z * [new branch] gh/ruisizhang123/7/base -> origin/gh/ruisizhang123/7/base 2025-09-07T07:55:56.6317503Z * [new branch] gh/ruisizhang123/7/head -> origin/gh/ruisizhang123/7/head 2025-09-07T07:55:56.6319132Z * [new branch] gh/ruisizhang123/7/orig -> origin/gh/ruisizhang123/7/orig 2025-09-07T07:55:56.6321350Z * [new branch] gh/ruisizhang123/8/base -> origin/gh/ruisizhang123/8/base 2025-09-07T07:55:56.6322908Z * [new branch] gh/ruisizhang123/8/head -> origin/gh/ruisizhang123/8/head 2025-09-07T07:55:56.6324722Z * [new branch] gh/ruisizhang123/8/orig -> origin/gh/ruisizhang123/8/orig 2025-09-07T07:55:56.6327035Z * [new branch] gh/ruisizhang123/9/base -> origin/gh/ruisizhang123/9/base 2025-09-07T07:55:56.6328555Z * [new branch] gh/ruisizhang123/9/head -> origin/gh/ruisizhang123/9/head 2025-09-07T07:55:56.6330206Z * [new branch] gh/ruisizhang123/9/orig -> origin/gh/ruisizhang123/9/orig 2025-09-07T07:55:56.6333037Z * [new branch] gh/sarckk/2/base -> origin/gh/sarckk/2/base 2025-09-07T07:55:56.6334820Z * [new branch] gh/sarckk/2/head -> origin/gh/sarckk/2/head 2025-09-07T07:55:56.6336411Z * [new branch] gh/sarckk/2/orig -> origin/gh/sarckk/2/orig 2025-09-07T07:55:56.6339266Z * [new branch] gh/seemethere/35/base -> origin/gh/seemethere/35/base 2025-09-07T07:55:56.6340795Z * [new branch] gh/seemethere/35/head -> origin/gh/seemethere/35/head 2025-09-07T07:55:56.6342375Z * [new branch] gh/seemethere/35/orig -> origin/gh/seemethere/35/orig 2025-09-07T07:55:56.6344953Z * [new branch] gh/seemethere/37/base -> origin/gh/seemethere/37/base 2025-09-07T07:55:56.6346491Z * [new branch] gh/seemethere/37/head -> origin/gh/seemethere/37/head 2025-09-07T07:55:56.6347975Z * [new branch] gh/seemethere/37/orig -> origin/gh/seemethere/37/orig 2025-09-07T07:55:56.6350172Z * [new branch] gh/seemethere/43/base -> origin/gh/seemethere/43/base 2025-09-07T07:55:56.6351762Z * [new branch] gh/seemethere/43/head -> origin/gh/seemethere/43/head 2025-09-07T07:55:56.6353286Z * [new branch] gh/seemethere/43/orig -> origin/gh/seemethere/43/orig 2025-09-07T07:55:56.6355949Z * [new branch] gh/seemethere/44/base -> origin/gh/seemethere/44/base 2025-09-07T07:55:56.6357463Z * [new branch] gh/seemethere/44/head -> origin/gh/seemethere/44/head 2025-09-07T07:55:56.6359063Z * [new branch] gh/seemethere/44/orig -> origin/gh/seemethere/44/orig 2025-09-07T07:55:56.6361278Z * [new branch] gh/seemethere/48/base -> origin/gh/seemethere/48/base 2025-09-07T07:55:56.6362865Z * [new branch] gh/seemethere/48/head -> origin/gh/seemethere/48/head 2025-09-07T07:55:56.6364712Z * [new branch] gh/seemethere/48/orig -> origin/gh/seemethere/48/orig 2025-09-07T07:55:56.6366942Z * [new branch] gh/seemethere/49/base -> origin/gh/seemethere/49/base 2025-09-07T07:55:56.6368452Z * [new branch] gh/seemethere/49/head -> origin/gh/seemethere/49/head 2025-09-07T07:55:56.6370009Z * [new branch] gh/seemethere/49/orig -> origin/gh/seemethere/49/orig 2025-09-07T07:55:56.6372298Z * [new branch] gh/seemethere/52/base -> origin/gh/seemethere/52/base 2025-09-07T07:55:56.6374024Z * [new branch] gh/seemethere/52/head -> origin/gh/seemethere/52/head 2025-09-07T07:55:56.6375974Z * [new branch] gh/seemethere/52/orig -> origin/gh/seemethere/52/orig 2025-09-07T07:55:56.6378277Z * [new branch] gh/seemethere/53/base -> origin/gh/seemethere/53/base 2025-09-07T07:55:56.6379723Z * [new branch] gh/seemethere/53/head -> origin/gh/seemethere/53/head 2025-09-07T07:55:56.6381221Z * [new branch] gh/seemethere/53/orig -> origin/gh/seemethere/53/orig 2025-09-07T07:55:56.6383462Z * [new branch] gh/seemethere/54/base -> origin/gh/seemethere/54/base 2025-09-07T07:55:56.6385359Z * [new branch] gh/seemethere/54/head -> origin/gh/seemethere/54/head 2025-09-07T07:55:56.6387086Z * [new branch] gh/seemethere/54/orig -> origin/gh/seemethere/54/orig 2025-09-07T07:55:56.6389230Z * [new branch] gh/seemethere/55/base -> origin/gh/seemethere/55/base 2025-09-07T07:55:56.6390734Z * [new branch] gh/seemethere/55/head -> origin/gh/seemethere/55/head 2025-09-07T07:55:56.6392211Z * [new branch] gh/seemethere/55/orig -> origin/gh/seemethere/55/orig 2025-09-07T07:55:56.6395195Z * [new branch] gh/seemethere/56/base -> origin/gh/seemethere/56/base 2025-09-07T07:55:56.6396726Z * [new branch] gh/seemethere/56/head -> origin/gh/seemethere/56/head 2025-09-07T07:55:56.6398464Z * [new branch] gh/seemethere/56/orig -> origin/gh/seemethere/56/orig 2025-09-07T07:55:56.6400753Z * [new branch] gh/seemethere/57/base -> origin/gh/seemethere/57/base 2025-09-07T07:55:56.6402333Z * [new branch] gh/seemethere/57/head -> origin/gh/seemethere/57/head 2025-09-07T07:55:56.6404065Z * [new branch] gh/seemethere/57/orig -> origin/gh/seemethere/57/orig 2025-09-07T07:55:56.6406366Z * [new branch] gh/seemethere/58/base -> origin/gh/seemethere/58/base 2025-09-07T07:55:56.6407826Z * [new branch] gh/seemethere/58/head -> origin/gh/seemethere/58/head 2025-09-07T07:55:56.6409386Z * [new branch] gh/seemethere/58/orig -> origin/gh/seemethere/58/orig 2025-09-07T07:55:56.6411630Z * [new branch] gh/seemethere/59/base -> origin/gh/seemethere/59/base 2025-09-07T07:55:56.6413171Z * [new branch] gh/seemethere/59/head -> origin/gh/seemethere/59/head 2025-09-07T07:55:56.6415126Z * [new branch] gh/seemethere/59/orig -> origin/gh/seemethere/59/orig 2025-09-07T07:55:56.6417279Z * [new branch] gh/seemethere/60/base -> origin/gh/seemethere/60/base 2025-09-07T07:55:56.6418832Z * [new branch] gh/seemethere/60/head -> origin/gh/seemethere/60/head 2025-09-07T07:55:56.6420450Z * [new branch] gh/seemethere/60/orig -> origin/gh/seemethere/60/orig 2025-09-07T07:55:56.6422664Z * [new branch] gh/seemethere/61/base -> origin/gh/seemethere/61/base 2025-09-07T07:55:56.6424509Z * [new branch] gh/seemethere/61/head -> origin/gh/seemethere/61/head 2025-09-07T07:55:56.6426086Z * [new branch] gh/seemethere/61/orig -> origin/gh/seemethere/61/orig 2025-09-07T07:55:56.6428324Z * [new branch] gh/seemethere/62/base -> origin/gh/seemethere/62/base 2025-09-07T07:55:56.6429890Z * [new branch] gh/seemethere/62/head -> origin/gh/seemethere/62/head 2025-09-07T07:55:56.6431482Z * [new branch] gh/seemethere/62/orig -> origin/gh/seemethere/62/orig 2025-09-07T07:55:56.6433861Z * [new branch] gh/seemethere/63/base -> origin/gh/seemethere/63/base 2025-09-07T07:55:56.6435641Z * [new branch] gh/seemethere/63/head -> origin/gh/seemethere/63/head 2025-09-07T07:55:56.6437258Z * [new branch] gh/seemethere/63/orig -> origin/gh/seemethere/63/orig 2025-09-07T07:55:56.6440302Z * [new branch] gh/shunting314/145/base -> origin/gh/shunting314/145/base 2025-09-07T07:55:56.6441978Z * [new branch] gh/shunting314/145/head -> origin/gh/shunting314/145/head 2025-09-07T07:55:56.6443946Z * [new branch] gh/shunting314/145/orig -> origin/gh/shunting314/145/orig 2025-09-07T07:55:56.6446436Z * [new branch] gh/shunting314/176/base -> origin/gh/shunting314/176/base 2025-09-07T07:55:56.6448060Z * [new branch] gh/shunting314/176/head -> origin/gh/shunting314/176/head 2025-09-07T07:55:56.6449694Z * [new branch] gh/shunting314/176/orig -> origin/gh/shunting314/176/orig 2025-09-07T07:55:56.6451987Z * [new branch] gh/shunting314/211/base -> origin/gh/shunting314/211/base 2025-09-07T07:55:56.6453537Z * [new branch] gh/shunting314/211/head -> origin/gh/shunting314/211/head 2025-09-07T07:55:56.6455442Z * [new branch] gh/shunting314/211/orig -> origin/gh/shunting314/211/orig 2025-09-07T07:55:56.6457567Z * [new branch] gh/shunting314/212/base -> origin/gh/shunting314/212/base 2025-09-07T07:55:56.6459114Z * [new branch] gh/shunting314/212/head -> origin/gh/shunting314/212/head 2025-09-07T07:55:56.6460672Z * [new branch] gh/shunting314/212/orig -> origin/gh/shunting314/212/orig 2025-09-07T07:55:56.6463301Z * [new branch] gh/shunting314/213/base -> origin/gh/shunting314/213/base 2025-09-07T07:55:56.6465320Z * [new branch] gh/shunting314/213/head -> origin/gh/shunting314/213/head 2025-09-07T07:55:56.6466848Z * [new branch] gh/shunting314/213/orig -> origin/gh/shunting314/213/orig 2025-09-07T07:55:56.6469153Z * [new branch] gh/shunting314/214/base -> origin/gh/shunting314/214/base 2025-09-07T07:55:56.6470685Z * [new branch] gh/shunting314/214/head -> origin/gh/shunting314/214/head 2025-09-07T07:55:56.6472276Z * [new branch] gh/shunting314/214/orig -> origin/gh/shunting314/214/orig 2025-09-07T07:55:56.6475023Z * [new branch] gh/shunting314/215/base -> origin/gh/shunting314/215/base 2025-09-07T07:55:56.6476544Z * [new branch] gh/shunting314/215/head -> origin/gh/shunting314/215/head 2025-09-07T07:55:56.6478239Z * [new branch] gh/shunting314/215/orig -> origin/gh/shunting314/215/orig 2025-09-07T07:55:56.6481176Z * [new branch] gh/shunting314/216/base -> origin/gh/shunting314/216/base 2025-09-07T07:55:56.6481871Z * [new branch] gh/shunting314/216/head -> origin/gh/shunting314/216/head 2025-09-07T07:55:56.6483488Z * [new branch] gh/shunting314/216/orig -> origin/gh/shunting314/216/orig 2025-09-07T07:55:56.6486173Z * [new branch] gh/shunting314/217/base -> origin/gh/shunting314/217/base 2025-09-07T07:55:56.6487729Z * [new branch] gh/shunting314/217/head -> origin/gh/shunting314/217/head 2025-09-07T07:55:56.6489503Z * [new branch] gh/shunting314/217/orig -> origin/gh/shunting314/217/orig 2025-09-07T07:55:56.6491910Z * [new branch] gh/shunting314/218/base -> origin/gh/shunting314/218/base 2025-09-07T07:55:56.6493459Z * [new branch] gh/shunting314/218/head -> origin/gh/shunting314/218/head 2025-09-07T07:55:56.6495294Z * [new branch] gh/shunting314/218/orig -> origin/gh/shunting314/218/orig 2025-09-07T07:55:56.6497405Z * [new branch] gh/shunting314/219/base -> origin/gh/shunting314/219/base 2025-09-07T07:55:56.6498911Z * [new branch] gh/shunting314/219/head -> origin/gh/shunting314/219/head 2025-09-07T07:55:56.6500476Z * [new branch] gh/shunting314/219/orig -> origin/gh/shunting314/219/orig 2025-09-07T07:55:56.6502924Z * [new branch] gh/shunting314/220/base -> origin/gh/shunting314/220/base 2025-09-07T07:55:56.6504988Z * [new branch] gh/shunting314/220/head -> origin/gh/shunting314/220/head 2025-09-07T07:55:56.6506537Z * [new branch] gh/shunting314/220/orig -> origin/gh/shunting314/220/orig 2025-09-07T07:55:56.6509046Z * [new branch] gh/shunting314/221/base -> origin/gh/shunting314/221/base 2025-09-07T07:55:56.6510407Z * [new branch] gh/shunting314/221/head -> origin/gh/shunting314/221/head 2025-09-07T07:55:56.6511922Z * [new branch] gh/shunting314/221/orig -> origin/gh/shunting314/221/orig 2025-09-07T07:55:56.6514448Z * [new branch] gh/shunting314/222/base -> origin/gh/shunting314/222/base 2025-09-07T07:55:56.6515934Z * [new branch] gh/shunting314/222/head -> origin/gh/shunting314/222/head 2025-09-07T07:55:56.6517598Z * [new branch] gh/shunting314/222/orig -> origin/gh/shunting314/222/orig 2025-09-07T07:55:56.6519707Z * [new branch] gh/shunting314/223/base -> origin/gh/shunting314/223/base 2025-09-07T07:55:56.6521270Z * [new branch] gh/shunting314/223/head -> origin/gh/shunting314/223/head 2025-09-07T07:55:56.6522842Z * [new branch] gh/shunting314/223/orig -> origin/gh/shunting314/223/orig 2025-09-07T07:55:56.6526326Z * [new branch] gh/silverguo/1/base -> origin/gh/silverguo/1/base 2025-09-07T07:55:56.6527856Z * [new branch] gh/silverguo/1/head -> origin/gh/silverguo/1/head 2025-09-07T07:55:56.6529933Z * [new branch] gh/silverguo/2/base -> origin/gh/silverguo/2/base 2025-09-07T07:55:56.6531522Z * [new branch] gh/silverguo/2/head -> origin/gh/silverguo/2/head 2025-09-07T07:55:56.6533596Z * [new branch] gh/silverguo/3/base -> origin/gh/silverguo/3/base 2025-09-07T07:55:56.6535446Z * [new branch] gh/silverguo/3/head -> origin/gh/silverguo/3/head 2025-09-07T07:55:56.6537481Z * [new branch] gh/silverguo/4/base -> origin/gh/silverguo/4/base 2025-09-07T07:55:56.6539046Z * [new branch] gh/silverguo/4/head -> origin/gh/silverguo/4/head 2025-09-07T07:55:56.6541985Z * [new branch] gh/sinhaanhsul/1/base -> origin/gh/sinhaanhsul/1/base 2025-09-07T07:55:56.6543535Z * [new branch] gh/sinhaanhsul/1/head -> origin/gh/sinhaanhsul/1/head 2025-09-07T07:55:56.6546811Z * [new branch] gh/skarjala/17/base -> origin/gh/skarjala/17/base 2025-09-07T07:55:56.6548260Z * [new branch] gh/skarjala/17/head -> origin/gh/skarjala/17/head 2025-09-07T07:55:56.6549838Z * [new branch] gh/skarjala/17/orig -> origin/gh/skarjala/17/orig 2025-09-07T07:55:56.6552137Z * [new branch] gh/skarjala/18/base -> origin/gh/skarjala/18/base 2025-09-07T07:55:56.6553694Z * [new branch] gh/skarjala/18/head -> origin/gh/skarjala/18/head 2025-09-07T07:55:56.6555647Z * [new branch] gh/skarjala/18/orig -> origin/gh/skarjala/18/orig 2025-09-07T07:55:56.6557863Z * [new branch] gh/skarjala/19/base -> origin/gh/skarjala/19/base 2025-09-07T07:55:56.6559475Z * [new branch] gh/skarjala/19/head -> origin/gh/skarjala/19/head 2025-09-07T07:55:56.6561039Z * [new branch] gh/skarjala/19/orig -> origin/gh/skarjala/19/orig 2025-09-07T07:55:56.6564144Z * [new branch] gh/slayton58/1/base -> origin/gh/slayton58/1/base 2025-09-07T07:55:56.6565839Z * [new branch] gh/slayton58/1/head -> origin/gh/slayton58/1/head 2025-09-07T07:55:56.6567341Z * [new branch] gh/slayton58/1/orig -> origin/gh/slayton58/1/orig 2025-09-07T07:55:56.6569526Z * [new branch] gh/slayton58/2/base -> origin/gh/slayton58/2/base 2025-09-07T07:55:56.6571086Z * [new branch] gh/slayton58/2/head -> origin/gh/slayton58/2/head 2025-09-07T07:55:56.6572607Z * [new branch] gh/slayton58/2/orig -> origin/gh/slayton58/2/orig 2025-09-07T07:55:56.6575098Z * [new branch] gh/slayton58/3/base -> origin/gh/slayton58/3/base 2025-09-07T07:55:56.6576937Z * [new branch] gh/slayton58/3/head -> origin/gh/slayton58/3/head 2025-09-07T07:55:56.6578351Z * [new branch] gh/slayton58/3/orig -> origin/gh/slayton58/3/orig 2025-09-07T07:55:56.6580525Z * [new branch] gh/slayton58/4/base -> origin/gh/slayton58/4/base 2025-09-07T07:55:56.6582058Z * [new branch] gh/slayton58/4/head -> origin/gh/slayton58/4/head 2025-09-07T07:55:56.6583575Z * [new branch] gh/slayton58/4/orig -> origin/gh/slayton58/4/orig 2025-09-07T07:55:56.6586326Z * [new branch] gh/slayton58/5/base -> origin/gh/slayton58/5/base 2025-09-07T07:55:56.6587843Z * [new branch] gh/slayton58/5/head -> origin/gh/slayton58/5/head 2025-09-07T07:55:56.6589436Z * [new branch] gh/slayton58/5/orig -> origin/gh/slayton58/5/orig 2025-09-07T07:55:56.6592477Z * [new branch] gh/soulitzer/269/base -> origin/gh/soulitzer/269/base 2025-09-07T07:55:56.6594158Z * [new branch] gh/soulitzer/269/head -> origin/gh/soulitzer/269/head 2025-09-07T07:55:56.6596045Z * [new branch] gh/soulitzer/269/orig -> origin/gh/soulitzer/269/orig 2025-09-07T07:55:56.6598457Z * [new branch] gh/soulitzer/276/base -> origin/gh/soulitzer/276/base 2025-09-07T07:55:56.6600082Z * [new branch] gh/soulitzer/276/head -> origin/gh/soulitzer/276/head 2025-09-07T07:55:56.6601618Z * [new branch] gh/soulitzer/276/orig -> origin/gh/soulitzer/276/orig 2025-09-07T07:55:56.6604334Z * [new branch] gh/soulitzer/287/base -> origin/gh/soulitzer/287/base 2025-09-07T07:55:56.6606064Z * [new branch] gh/soulitzer/287/head -> origin/gh/soulitzer/287/head 2025-09-07T07:55:56.6607636Z * [new branch] gh/soulitzer/287/orig -> origin/gh/soulitzer/287/orig 2025-09-07T07:55:56.6610015Z * [new branch] gh/soulitzer/296/base -> origin/gh/soulitzer/296/base 2025-09-07T07:55:56.6611564Z * [new branch] gh/soulitzer/296/head -> origin/gh/soulitzer/296/head 2025-09-07T07:55:56.6613075Z * [new branch] gh/soulitzer/296/orig -> origin/gh/soulitzer/296/orig 2025-09-07T07:55:56.6615751Z * [new branch] gh/soulitzer/299/base -> origin/gh/soulitzer/299/base 2025-09-07T07:55:56.6617394Z * [new branch] gh/soulitzer/299/head -> origin/gh/soulitzer/299/head 2025-09-07T07:55:56.6618911Z * [new branch] gh/soulitzer/299/orig -> origin/gh/soulitzer/299/orig 2025-09-07T07:55:56.6621128Z * [new branch] gh/soulitzer/300/base -> origin/gh/soulitzer/300/base 2025-09-07T07:55:56.6622863Z * [new branch] gh/soulitzer/300/head -> origin/gh/soulitzer/300/head 2025-09-07T07:55:56.6624659Z * [new branch] gh/soulitzer/300/orig -> origin/gh/soulitzer/300/orig 2025-09-07T07:55:56.6627039Z * [new branch] gh/soulitzer/301/base -> origin/gh/soulitzer/301/base 2025-09-07T07:55:56.6628614Z * [new branch] gh/soulitzer/301/head -> origin/gh/soulitzer/301/head 2025-09-07T07:55:56.6630213Z * [new branch] gh/soulitzer/301/orig -> origin/gh/soulitzer/301/orig 2025-09-07T07:55:56.6632405Z * [new branch] gh/soulitzer/313/base -> origin/gh/soulitzer/313/base 2025-09-07T07:55:56.6634245Z * [new branch] gh/soulitzer/313/head -> origin/gh/soulitzer/313/head 2025-09-07T07:55:56.6635984Z * [new branch] gh/soulitzer/313/orig -> origin/gh/soulitzer/313/orig 2025-09-07T07:55:56.6638337Z * [new branch] gh/soulitzer/319/base -> origin/gh/soulitzer/319/base 2025-09-07T07:55:56.6639895Z * [new branch] gh/soulitzer/319/head -> origin/gh/soulitzer/319/head 2025-09-07T07:55:56.6641472Z * [new branch] gh/soulitzer/319/orig -> origin/gh/soulitzer/319/orig 2025-09-07T07:55:56.6644096Z * [new branch] gh/soulitzer/320/base -> origin/gh/soulitzer/320/base 2025-09-07T07:55:56.6645658Z * [new branch] gh/soulitzer/320/head -> origin/gh/soulitzer/320/head 2025-09-07T07:55:56.6647105Z * [new branch] gh/soulitzer/320/orig -> origin/gh/soulitzer/320/orig 2025-09-07T07:55:56.6649458Z * [new branch] gh/soulitzer/336/base -> origin/gh/soulitzer/336/base 2025-09-07T07:55:56.6650939Z * [new branch] gh/soulitzer/336/head -> origin/gh/soulitzer/336/head 2025-09-07T07:55:56.6652479Z * [new branch] gh/soulitzer/336/orig -> origin/gh/soulitzer/336/orig 2025-09-07T07:55:56.6655272Z * [new branch] gh/soulitzer/347/base -> origin/gh/soulitzer/347/base 2025-09-07T07:55:56.6656692Z * [new branch] gh/soulitzer/347/head -> origin/gh/soulitzer/347/head 2025-09-07T07:55:56.6658180Z * [new branch] gh/soulitzer/347/orig -> origin/gh/soulitzer/347/orig 2025-09-07T07:55:56.6660589Z * [new branch] gh/soulitzer/349/base -> origin/gh/soulitzer/349/base 2025-09-07T07:55:56.6662210Z * [new branch] gh/soulitzer/349/head -> origin/gh/soulitzer/349/head 2025-09-07T07:55:56.6663902Z * [new branch] gh/soulitzer/349/orig -> origin/gh/soulitzer/349/orig 2025-09-07T07:55:56.6666407Z * [new branch] gh/soulitzer/350/base -> origin/gh/soulitzer/350/base 2025-09-07T07:55:56.6667827Z * [new branch] gh/soulitzer/350/head -> origin/gh/soulitzer/350/head 2025-09-07T07:55:56.6669402Z * [new branch] gh/soulitzer/350/orig -> origin/gh/soulitzer/350/orig 2025-09-07T07:55:56.6671773Z * [new branch] gh/soulitzer/351/base -> origin/gh/soulitzer/351/base 2025-09-07T07:55:56.6673299Z * [new branch] gh/soulitzer/351/head -> origin/gh/soulitzer/351/head 2025-09-07T07:55:56.6675250Z * [new branch] gh/soulitzer/351/orig -> origin/gh/soulitzer/351/orig 2025-09-07T07:55:56.6677529Z * [new branch] gh/soulitzer/353/base -> origin/gh/soulitzer/353/base 2025-09-07T07:55:56.6679145Z * [new branch] gh/soulitzer/353/head -> origin/gh/soulitzer/353/head 2025-09-07T07:55:56.6680860Z * [new branch] gh/soulitzer/353/orig -> origin/gh/soulitzer/353/orig 2025-09-07T07:55:56.6683354Z * [new branch] gh/soulitzer/358/base -> origin/gh/soulitzer/358/base 2025-09-07T07:55:56.6685398Z * [new branch] gh/soulitzer/358/head -> origin/gh/soulitzer/358/head 2025-09-07T07:55:56.6686865Z * [new branch] gh/soulitzer/358/orig -> origin/gh/soulitzer/358/orig 2025-09-07T07:55:56.6692185Z * [new branch] gh/soulitzer/359/base -> origin/gh/soulitzer/359/base 2025-09-07T07:55:56.6693876Z * [new branch] gh/soulitzer/359/head -> origin/gh/soulitzer/359/head 2025-09-07T07:55:56.6695750Z * [new branch] gh/soulitzer/359/orig -> origin/gh/soulitzer/359/orig 2025-09-07T07:55:56.6697932Z * [new branch] gh/soulitzer/362/base -> origin/gh/soulitzer/362/base 2025-09-07T07:55:56.6699540Z * [new branch] gh/soulitzer/362/head -> origin/gh/soulitzer/362/head 2025-09-07T07:55:56.6701104Z * [new branch] gh/soulitzer/362/orig -> origin/gh/soulitzer/362/orig 2025-09-07T07:55:56.6703366Z * [new branch] gh/soulitzer/372/base -> origin/gh/soulitzer/372/base 2025-09-07T07:55:56.6705264Z * [new branch] gh/soulitzer/372/head -> origin/gh/soulitzer/372/head 2025-09-07T07:55:56.6706778Z * [new branch] gh/soulitzer/372/orig -> origin/gh/soulitzer/372/orig 2025-09-07T07:55:56.6708996Z * [new branch] gh/soulitzer/373/base -> origin/gh/soulitzer/373/base 2025-09-07T07:55:56.6710725Z * [new branch] gh/soulitzer/373/head -> origin/gh/soulitzer/373/head 2025-09-07T07:55:56.6712179Z * [new branch] gh/soulitzer/373/orig -> origin/gh/soulitzer/373/orig 2025-09-07T07:55:56.6715454Z * [new branch] gh/soulitzer/374/base -> origin/gh/soulitzer/374/base 2025-09-07T07:55:56.6717015Z * [new branch] gh/soulitzer/374/head -> origin/gh/soulitzer/374/head 2025-09-07T07:55:56.6718652Z * [new branch] gh/soulitzer/374/orig -> origin/gh/soulitzer/374/orig 2025-09-07T07:55:56.6720926Z * [new branch] gh/soulitzer/375/base -> origin/gh/soulitzer/375/base 2025-09-07T07:55:56.6722455Z * [new branch] gh/soulitzer/375/head -> origin/gh/soulitzer/375/head 2025-09-07T07:55:56.6724179Z * [new branch] gh/soulitzer/375/orig -> origin/gh/soulitzer/375/orig 2025-09-07T07:55:56.6726571Z * [new branch] gh/soulitzer/376/base -> origin/gh/soulitzer/376/base 2025-09-07T07:55:56.6728084Z * [new branch] gh/soulitzer/376/head -> origin/gh/soulitzer/376/head 2025-09-07T07:55:56.6729584Z * [new branch] gh/soulitzer/376/orig -> origin/gh/soulitzer/376/orig 2025-09-07T07:55:56.6731865Z * [new branch] gh/soulitzer/377/base -> origin/gh/soulitzer/377/base 2025-09-07T07:55:56.6733421Z * [new branch] gh/soulitzer/377/head -> origin/gh/soulitzer/377/head 2025-09-07T07:55:56.6735322Z * [new branch] gh/soulitzer/377/orig -> origin/gh/soulitzer/377/orig 2025-09-07T07:55:56.6737547Z * [new branch] gh/soulitzer/378/base -> origin/gh/soulitzer/378/base 2025-09-07T07:55:56.6739116Z * [new branch] gh/soulitzer/378/head -> origin/gh/soulitzer/378/head 2025-09-07T07:55:56.6740758Z * [new branch] gh/soulitzer/378/orig -> origin/gh/soulitzer/378/orig 2025-09-07T07:55:56.6742990Z * [new branch] gh/soulitzer/379/base -> origin/gh/soulitzer/379/base 2025-09-07T07:55:56.6744960Z * [new branch] gh/soulitzer/379/head -> origin/gh/soulitzer/379/head 2025-09-07T07:55:56.6746447Z * [new branch] gh/soulitzer/379/orig -> origin/gh/soulitzer/379/orig 2025-09-07T07:55:56.6749313Z * [new branch] gh/swolchok/728/next -> origin/gh/swolchok/728/next 2025-09-07T07:55:56.6751928Z * [new branch] gh/swolchok/767/base -> origin/gh/swolchok/767/base 2025-09-07T07:55:56.6753866Z * [new branch] gh/swolchok/767/head -> origin/gh/swolchok/767/head 2025-09-07T07:55:56.6755866Z * [new branch] gh/swolchok/767/orig -> origin/gh/swolchok/767/orig 2025-09-07T07:55:56.6758181Z * [new branch] gh/swolchok/768/base -> origin/gh/swolchok/768/base 2025-09-07T07:55:56.6759784Z * [new branch] gh/swolchok/768/head -> origin/gh/swolchok/768/head 2025-09-07T07:55:56.6761453Z * [new branch] gh/swolchok/768/orig -> origin/gh/swolchok/768/orig 2025-09-07T07:55:56.6763992Z * [new branch] gh/swolchok/769/base -> origin/gh/swolchok/769/base 2025-09-07T07:55:56.6765957Z * [new branch] gh/swolchok/769/head -> origin/gh/swolchok/769/head 2025-09-07T07:55:56.6767545Z * [new branch] gh/swolchok/769/orig -> origin/gh/swolchok/769/orig 2025-09-07T07:55:56.6769817Z * [new branch] gh/swolchok/771/base -> origin/gh/swolchok/771/base 2025-09-07T07:55:56.6771528Z * [new branch] gh/swolchok/771/head -> origin/gh/swolchok/771/head 2025-09-07T07:55:56.6773072Z * [new branch] gh/swolchok/771/orig -> origin/gh/swolchok/771/orig 2025-09-07T07:55:56.6775702Z * [new branch] gh/swolchok/772/base -> origin/gh/swolchok/772/base 2025-09-07T07:55:56.6777244Z * [new branch] gh/swolchok/772/head -> origin/gh/swolchok/772/head 2025-09-07T07:55:56.6778988Z * [new branch] gh/swolchok/772/orig -> origin/gh/swolchok/772/orig 2025-09-07T07:55:56.6781397Z * [new branch] gh/swolchok/773/base -> origin/gh/swolchok/773/base 2025-09-07T07:55:56.6783014Z * [new branch] gh/swolchok/773/head -> origin/gh/swolchok/773/head 2025-09-07T07:55:56.6784907Z * [new branch] gh/swolchok/773/orig -> origin/gh/swolchok/773/orig 2025-09-07T07:55:56.6787201Z * [new branch] gh/swolchok/786/base -> origin/gh/swolchok/786/base 2025-09-07T07:55:56.6788687Z * [new branch] gh/swolchok/786/head -> origin/gh/swolchok/786/head 2025-09-07T07:55:56.6790320Z * [new branch] gh/swolchok/786/orig -> origin/gh/swolchok/786/orig 2025-09-07T07:55:56.6792549Z * [new branch] gh/swolchok/787/base -> origin/gh/swolchok/787/base 2025-09-07T07:55:56.6794292Z * [new branch] gh/swolchok/787/head -> origin/gh/swolchok/787/head 2025-09-07T07:55:56.6796018Z * [new branch] gh/swolchok/787/orig -> origin/gh/swolchok/787/orig 2025-09-07T07:55:56.6798412Z * [new branch] gh/swolchok/788/base -> origin/gh/swolchok/788/base 2025-09-07T07:55:56.6799954Z * [new branch] gh/swolchok/788/head -> origin/gh/swolchok/788/head 2025-09-07T07:55:56.6801556Z * [new branch] gh/swolchok/788/orig -> origin/gh/swolchok/788/orig 2025-09-07T07:55:56.6803956Z * [new branch] gh/swolchok/789/base -> origin/gh/swolchok/789/base 2025-09-07T07:55:56.6805684Z * [new branch] gh/swolchok/789/head -> origin/gh/swolchok/789/head 2025-09-07T07:55:56.6807189Z * [new branch] gh/swolchok/789/orig -> origin/gh/swolchok/789/orig 2025-09-07T07:55:56.6809462Z * [new branch] gh/swolchok/790/base -> origin/gh/swolchok/790/base 2025-09-07T07:55:56.6811101Z * [new branch] gh/swolchok/790/head -> origin/gh/swolchok/790/head 2025-09-07T07:55:56.6812656Z * [new branch] gh/swolchok/790/orig -> origin/gh/swolchok/790/orig 2025-09-07T07:55:56.6815363Z * [new branch] gh/swolchok/791/base -> origin/gh/swolchok/791/base 2025-09-07T07:55:56.6816835Z * [new branch] gh/swolchok/791/head -> origin/gh/swolchok/791/head 2025-09-07T07:55:56.6818419Z * [new branch] gh/swolchok/791/orig -> origin/gh/swolchok/791/orig 2025-09-07T07:55:56.6820776Z * [new branch] gh/swolchok/792/base -> origin/gh/swolchok/792/base 2025-09-07T07:55:56.6822270Z * [new branch] gh/swolchok/792/head -> origin/gh/swolchok/792/head 2025-09-07T07:55:56.6823900Z * [new branch] gh/swolchok/792/orig -> origin/gh/swolchok/792/orig 2025-09-07T07:55:56.6826420Z * [new branch] gh/swolchok/793/base -> origin/gh/swolchok/793/base 2025-09-07T07:55:56.6828051Z * [new branch] gh/swolchok/793/head -> origin/gh/swolchok/793/head 2025-09-07T07:55:56.6829570Z * [new branch] gh/swolchok/793/orig -> origin/gh/swolchok/793/orig 2025-09-07T07:55:56.6831916Z * [new branch] gh/swolchok/794/base -> origin/gh/swolchok/794/base 2025-09-07T07:55:56.6833469Z * [new branch] gh/swolchok/794/head -> origin/gh/swolchok/794/head 2025-09-07T07:55:56.6835297Z * [new branch] gh/swolchok/794/orig -> origin/gh/swolchok/794/orig 2025-09-07T07:55:56.6837750Z * [new branch] gh/swolchok/795/base -> origin/gh/swolchok/795/base 2025-09-07T07:55:56.6839341Z * [new branch] gh/swolchok/795/head -> origin/gh/swolchok/795/head 2025-09-07T07:55:56.6840889Z * [new branch] gh/swolchok/795/orig -> origin/gh/swolchok/795/orig 2025-09-07T07:55:56.6843218Z * [new branch] gh/swolchok/796/base -> origin/gh/swolchok/796/base 2025-09-07T07:55:56.6845355Z * [new branch] gh/swolchok/796/head -> origin/gh/swolchok/796/head 2025-09-07T07:55:56.6846751Z * [new branch] gh/swolchok/796/orig -> origin/gh/swolchok/796/orig 2025-09-07T07:55:56.6849226Z * [new branch] gh/swolchok/797/base -> origin/gh/swolchok/797/base 2025-09-07T07:55:56.6850782Z * [new branch] gh/swolchok/797/head -> origin/gh/swolchok/797/head 2025-09-07T07:55:56.6852287Z * [new branch] gh/swolchok/797/orig -> origin/gh/swolchok/797/orig 2025-09-07T07:55:56.6855026Z * [new branch] gh/swolchok/798/base -> origin/gh/swolchok/798/base 2025-09-07T07:55:56.6856504Z * [new branch] gh/swolchok/798/head -> origin/gh/swolchok/798/head 2025-09-07T07:55:56.6858093Z * [new branch] gh/swolchok/798/orig -> origin/gh/swolchok/798/orig 2025-09-07T07:55:56.6860576Z * [new branch] gh/swolchok/799/base -> origin/gh/swolchok/799/base 2025-09-07T07:55:56.6862064Z * [new branch] gh/swolchok/799/head -> origin/gh/swolchok/799/head 2025-09-07T07:55:56.6863816Z * [new branch] gh/swolchok/799/orig -> origin/gh/swolchok/799/orig 2025-09-07T07:55:56.6866525Z * [new branch] gh/swolchok/800/base -> origin/gh/swolchok/800/base 2025-09-07T07:55:56.6867948Z * [new branch] gh/swolchok/800/head -> origin/gh/swolchok/800/head 2025-09-07T07:55:56.6869556Z * [new branch] gh/swolchok/800/orig -> origin/gh/swolchok/800/orig 2025-09-07T07:55:56.6871952Z * [new branch] gh/swolchok/801/base -> origin/gh/swolchok/801/base 2025-09-07T07:55:56.6873467Z * [new branch] gh/swolchok/801/head -> origin/gh/swolchok/801/head 2025-09-07T07:55:56.6875627Z * [new branch] gh/swolchok/801/orig -> origin/gh/swolchok/801/orig 2025-09-07T07:55:56.6877968Z * [new branch] gh/swolchok/802/base -> origin/gh/swolchok/802/base 2025-09-07T07:55:56.6879409Z * [new branch] gh/swolchok/802/head -> origin/gh/swolchok/802/head 2025-09-07T07:55:56.6881087Z * [new branch] gh/swolchok/802/orig -> origin/gh/swolchok/802/orig 2025-09-07T07:55:56.6883382Z * [new branch] gh/swolchok/803/base -> origin/gh/swolchok/803/base 2025-09-07T07:55:56.6885299Z * [new branch] gh/swolchok/803/head -> origin/gh/swolchok/803/head 2025-09-07T07:55:56.6886907Z * [new branch] gh/swolchok/803/orig -> origin/gh/swolchok/803/orig 2025-09-07T07:55:56.6889365Z * [new branch] gh/swolchok/804/base -> origin/gh/swolchok/804/base 2025-09-07T07:55:56.6890862Z * [new branch] gh/swolchok/804/head -> origin/gh/swolchok/804/head 2025-09-07T07:55:56.6892446Z * [new branch] gh/swolchok/804/orig -> origin/gh/swolchok/804/orig 2025-09-07T07:55:56.6895114Z * [new branch] gh/swolchok/805/base -> origin/gh/swolchok/805/base 2025-09-07T07:55:56.6896659Z * [new branch] gh/swolchok/805/head -> origin/gh/swolchok/805/head 2025-09-07T07:55:56.6898185Z * [new branch] gh/swolchok/805/orig -> origin/gh/swolchok/805/orig 2025-09-07T07:55:56.6900320Z * [new branch] gh/swolchok/806/base -> origin/gh/swolchok/806/base 2025-09-07T07:55:56.6901953Z * [new branch] gh/swolchok/806/head -> origin/gh/swolchok/806/head 2025-09-07T07:55:56.6903468Z * [new branch] gh/swolchok/806/orig -> origin/gh/swolchok/806/orig 2025-09-07T07:55:56.6906211Z * [new branch] gh/swolchok/807/base -> origin/gh/swolchok/807/base 2025-09-07T07:55:56.6907648Z * [new branch] gh/swolchok/807/head -> origin/gh/swolchok/807/head 2025-09-07T07:55:56.6909250Z * [new branch] gh/swolchok/807/orig -> origin/gh/swolchok/807/orig 2025-09-07T07:55:56.6911839Z * [new branch] gh/swolchok/808/base -> origin/gh/swolchok/808/base 2025-09-07T07:55:56.6913333Z * [new branch] gh/swolchok/808/head -> origin/gh/swolchok/808/head 2025-09-07T07:55:56.6915115Z * [new branch] gh/swolchok/808/orig -> origin/gh/swolchok/808/orig 2025-09-07T07:55:56.6917472Z * [new branch] gh/swolchok/809/base -> origin/gh/swolchok/809/base 2025-09-07T07:55:56.6919041Z * [new branch] gh/swolchok/809/head -> origin/gh/swolchok/809/head 2025-09-07T07:55:56.6920601Z * [new branch] gh/swolchok/809/orig -> origin/gh/swolchok/809/orig 2025-09-07T07:55:56.6922919Z * [new branch] gh/swolchok/810/base -> origin/gh/swolchok/810/base 2025-09-07T07:55:56.6924869Z * [new branch] gh/swolchok/810/head -> origin/gh/swolchok/810/head 2025-09-07T07:55:56.6926559Z * [new branch] gh/swolchok/810/orig -> origin/gh/swolchok/810/orig 2025-09-07T07:55:56.6928860Z * [new branch] gh/swolchok/811/base -> origin/gh/swolchok/811/base 2025-09-07T07:55:56.6930450Z * [new branch] gh/swolchok/811/head -> origin/gh/swolchok/811/head 2025-09-07T07:55:56.6932088Z * [new branch] gh/swolchok/811/orig -> origin/gh/swolchok/811/orig 2025-09-07T07:55:56.6934659Z * [new branch] gh/swolchok/812/base -> origin/gh/swolchok/812/base 2025-09-07T07:55:56.6936190Z * [new branch] gh/swolchok/812/head -> origin/gh/swolchok/812/head 2025-09-07T07:55:56.6937793Z * [new branch] gh/swolchok/812/orig -> origin/gh/swolchok/812/orig 2025-09-07T07:55:56.6940149Z * [new branch] gh/swolchok/813/base -> origin/gh/swolchok/813/base 2025-09-07T07:55:56.6941610Z * [new branch] gh/swolchok/813/head -> origin/gh/swolchok/813/head 2025-09-07T07:55:56.6943244Z * [new branch] gh/swolchok/813/orig -> origin/gh/swolchok/813/orig 2025-09-07T07:55:56.6946257Z * [new branch] gh/swolchok/814/base -> origin/gh/swolchok/814/base 2025-09-07T07:55:56.6947771Z * [new branch] gh/swolchok/814/head -> origin/gh/swolchok/814/head 2025-09-07T07:55:56.6949269Z * [new branch] gh/swolchok/814/orig -> origin/gh/swolchok/814/orig 2025-09-07T07:55:56.6951693Z * [new branch] gh/swolchok/815/base -> origin/gh/swolchok/815/base 2025-09-07T07:55:56.6953189Z * [new branch] gh/swolchok/815/head -> origin/gh/swolchok/815/head 2025-09-07T07:55:56.6955173Z * [new branch] gh/swolchok/815/orig -> origin/gh/swolchok/815/orig 2025-09-07T07:55:56.6957509Z * [new branch] gh/swolchok/816/base -> origin/gh/swolchok/816/base 2025-09-07T07:55:56.6959167Z * [new branch] gh/swolchok/816/head -> origin/gh/swolchok/816/head 2025-09-07T07:55:56.6960763Z * [new branch] gh/swolchok/816/orig -> origin/gh/swolchok/816/orig 2025-09-07T07:55:56.6963251Z * [new branch] gh/swolchok/817/base -> origin/gh/swolchok/817/base 2025-09-07T07:55:56.6965063Z * [new branch] gh/swolchok/817/head -> origin/gh/swolchok/817/head 2025-09-07T07:55:56.6966533Z * [new branch] gh/swolchok/817/orig -> origin/gh/swolchok/817/orig 2025-09-07T07:55:56.6968921Z * [new branch] gh/swolchok/818/base -> origin/gh/swolchok/818/base 2025-09-07T07:55:56.6970629Z * [new branch] gh/swolchok/818/head -> origin/gh/swolchok/818/head 2025-09-07T07:55:56.6972200Z * [new branch] gh/swolchok/818/orig -> origin/gh/swolchok/818/orig 2025-09-07T07:55:56.6975160Z * [new branch] gh/swolchok/819/base -> origin/gh/swolchok/819/base 2025-09-07T07:55:56.6976640Z * [new branch] gh/swolchok/819/head -> origin/gh/swolchok/819/head 2025-09-07T07:55:56.6978322Z * [new branch] gh/swolchok/819/orig -> origin/gh/swolchok/819/orig 2025-09-07T07:55:56.6980562Z * [new branch] gh/swolchok/820/base -> origin/gh/swolchok/820/base 2025-09-07T07:55:56.6982103Z * [new branch] gh/swolchok/820/head -> origin/gh/swolchok/820/head 2025-09-07T07:55:56.6983668Z * [new branch] gh/swolchok/820/orig -> origin/gh/swolchok/820/orig 2025-09-07T07:55:56.6986394Z * [new branch] gh/swolchok/821/base -> origin/gh/swolchok/821/base 2025-09-07T07:55:56.6987836Z * [new branch] gh/swolchok/821/head -> origin/gh/swolchok/821/head 2025-09-07T07:55:56.6989327Z * [new branch] gh/swolchok/821/orig -> origin/gh/swolchok/821/orig 2025-09-07T07:55:56.6991741Z * [new branch] gh/swolchok/822/base -> origin/gh/swolchok/822/base 2025-09-07T07:55:56.6993317Z * [new branch] gh/swolchok/822/head -> origin/gh/swolchok/822/head 2025-09-07T07:55:56.6995248Z * [new branch] gh/swolchok/822/orig -> origin/gh/swolchok/822/orig 2025-09-07T07:55:56.6997649Z * [new branch] gh/swolchok/823/base -> origin/gh/swolchok/823/base 2025-09-07T07:55:56.6999155Z * [new branch] gh/swolchok/823/head -> origin/gh/swolchok/823/head 2025-09-07T07:55:56.7000652Z * [new branch] gh/swolchok/823/orig -> origin/gh/swolchok/823/orig 2025-09-07T07:55:56.7003075Z * [new branch] gh/swolchok/824/base -> origin/gh/swolchok/824/base 2025-09-07T07:55:56.7004966Z * [new branch] gh/swolchok/824/head -> origin/gh/swolchok/824/head 2025-09-07T07:55:56.7006499Z * [new branch] gh/swolchok/824/orig -> origin/gh/swolchok/824/orig 2025-09-07T07:55:56.7008762Z * [new branch] gh/swolchok/825/base -> origin/gh/swolchok/825/base 2025-09-07T07:55:56.7010338Z * [new branch] gh/swolchok/825/head -> origin/gh/swolchok/825/head 2025-09-07T07:55:56.7011901Z * [new branch] gh/swolchok/825/orig -> origin/gh/swolchok/825/orig 2025-09-07T07:55:56.7014637Z * [new branch] gh/swolchok/826/base -> origin/gh/swolchok/826/base 2025-09-07T07:55:56.7016096Z * [new branch] gh/swolchok/826/head -> origin/gh/swolchok/826/head 2025-09-07T07:55:56.7017564Z * [new branch] gh/swolchok/826/orig -> origin/gh/swolchok/826/orig 2025-09-07T07:55:56.7020033Z * [new branch] gh/swolchok/827/base -> origin/gh/swolchok/827/base 2025-09-07T07:55:56.7021647Z * [new branch] gh/swolchok/827/head -> origin/gh/swolchok/827/head 2025-09-07T07:55:56.7023082Z * [new branch] gh/swolchok/827/orig -> origin/gh/swolchok/827/orig 2025-09-07T07:55:56.7025853Z * [new branch] gh/swolchok/828/base -> origin/gh/swolchok/828/base 2025-09-07T07:55:56.7027348Z * [new branch] gh/swolchok/828/head -> origin/gh/swolchok/828/head 2025-09-07T07:55:56.7028909Z * [new branch] gh/swolchok/828/orig -> origin/gh/swolchok/828/orig 2025-09-07T07:55:56.7031098Z * [new branch] gh/swolchok/829/base -> origin/gh/swolchok/829/base 2025-09-07T07:55:56.7032663Z * [new branch] gh/swolchok/829/head -> origin/gh/swolchok/829/head 2025-09-07T07:55:56.7034465Z * [new branch] gh/swolchok/829/orig -> origin/gh/swolchok/829/orig 2025-09-07T07:55:56.7036990Z * [new branch] gh/swolchok/830/base -> origin/gh/swolchok/830/base 2025-09-07T07:55:56.7038501Z * [new branch] gh/swolchok/830/head -> origin/gh/swolchok/830/head 2025-09-07T07:55:56.7039973Z * [new branch] gh/swolchok/830/orig -> origin/gh/swolchok/830/orig 2025-09-07T07:55:56.7042187Z * [new branch] gh/swolchok/831/base -> origin/gh/swolchok/831/base 2025-09-07T07:55:56.7044118Z * [new branch] gh/swolchok/831/head -> origin/gh/swolchok/831/head 2025-09-07T07:55:56.7045764Z * [new branch] gh/swolchok/831/orig -> origin/gh/swolchok/831/orig 2025-09-07T07:55:56.7047807Z * [new branch] gh/swolchok/832/base -> origin/gh/swolchok/832/base 2025-09-07T07:55:56.7049407Z * [new branch] gh/swolchok/832/head -> origin/gh/swolchok/832/head 2025-09-07T07:55:56.7050942Z * [new branch] gh/swolchok/832/orig -> origin/gh/swolchok/832/orig 2025-09-07T07:55:56.7054173Z * [new branch] gh/syed-ahmed/3/base -> origin/gh/syed-ahmed/3/base 2025-09-07T07:55:56.7055921Z * [new branch] gh/syed-ahmed/3/head -> origin/gh/syed-ahmed/3/head 2025-09-07T07:55:56.7057491Z * [new branch] gh/syed-ahmed/3/orig -> origin/gh/syed-ahmed/3/orig 2025-09-07T07:55:56.7059748Z * [new branch] gh/syed-ahmed/4/base -> origin/gh/syed-ahmed/4/base 2025-09-07T07:55:56.7061283Z * [new branch] gh/syed-ahmed/4/head -> origin/gh/syed-ahmed/4/head 2025-09-07T07:55:56.7062795Z * [new branch] gh/syed-ahmed/4/orig -> origin/gh/syed-ahmed/4/orig 2025-09-07T07:55:56.7065769Z * [new branch] gh/syed-ahmed/5/base -> origin/gh/syed-ahmed/5/base 2025-09-07T07:55:56.7067245Z * [new branch] gh/syed-ahmed/5/head -> origin/gh/syed-ahmed/5/head 2025-09-07T07:55:56.7068835Z * [new branch] gh/syed-ahmed/5/orig -> origin/gh/syed-ahmed/5/orig 2025-09-07T07:55:56.7071771Z * [new branch] gh/teja-rao/4/base -> origin/gh/teja-rao/4/base 2025-09-07T07:55:56.7073389Z * [new branch] gh/teja-rao/4/head -> origin/gh/teja-rao/4/head 2025-09-07T07:55:56.7075327Z * [new branch] gh/teja-rao/4/orig -> origin/gh/teja-rao/4/orig 2025-09-07T07:55:56.7078348Z * [new branch] gh/tianyu-l/2/base -> origin/gh/tianyu-l/2/base 2025-09-07T07:55:56.7079903Z * [new branch] gh/tianyu-l/2/head -> origin/gh/tianyu-l/2/head 2025-09-07T07:55:56.7081414Z * [new branch] gh/tianyu-l/2/orig -> origin/gh/tianyu-l/2/orig 2025-09-07T07:55:56.7083690Z * [new branch] gh/tianyu-l/3/base -> origin/gh/tianyu-l/3/base 2025-09-07T07:55:56.7085568Z * [new branch] gh/tianyu-l/3/head -> origin/gh/tianyu-l/3/head 2025-09-07T07:55:56.7087005Z * [new branch] gh/tianyu-l/3/orig -> origin/gh/tianyu-l/3/orig 2025-09-07T07:55:56.7089214Z * [new branch] gh/tianyu-l/4/base -> origin/gh/tianyu-l/4/base 2025-09-07T07:55:56.7090808Z * [new branch] gh/tianyu-l/4/head -> origin/gh/tianyu-l/4/head 2025-09-07T07:55:56.7092322Z * [new branch] gh/tianyu-l/4/orig -> origin/gh/tianyu-l/4/orig 2025-09-07T07:55:56.7095760Z * [new branch] gh/tugsbayasgalan/1/base -> origin/gh/tugsbayasgalan/1/base 2025-09-07T07:55:56.7097191Z * [new branch] gh/tugsbayasgalan/1/head -> origin/gh/tugsbayasgalan/1/head 2025-09-07T07:55:56.7098875Z * [new branch] gh/tugsbayasgalan/1/orig -> origin/gh/tugsbayasgalan/1/orig 2025-09-07T07:55:56.7101366Z * [new branch] gh/tugsbayasgalan/10/base -> origin/gh/tugsbayasgalan/10/base 2025-09-07T07:55:56.7103009Z * [new branch] gh/tugsbayasgalan/10/head -> origin/gh/tugsbayasgalan/10/head 2025-09-07T07:55:56.7104925Z * [new branch] gh/tugsbayasgalan/10/orig -> origin/gh/tugsbayasgalan/10/orig 2025-09-07T07:55:56.7106996Z * [new branch] gh/tugsbayasgalan/11/base -> origin/gh/tugsbayasgalan/11/base 2025-09-07T07:55:56.7108618Z * [new branch] gh/tugsbayasgalan/11/head -> origin/gh/tugsbayasgalan/11/head 2025-09-07T07:55:56.7110112Z * [new branch] gh/tugsbayasgalan/11/orig -> origin/gh/tugsbayasgalan/11/orig 2025-09-07T07:55:56.7112730Z * [new branch] gh/tugsbayasgalan/12/base -> origin/gh/tugsbayasgalan/12/base 2025-09-07T07:55:56.7114254Z * [new branch] gh/tugsbayasgalan/12/head -> origin/gh/tugsbayasgalan/12/head 2025-09-07T07:55:56.7115965Z * [new branch] gh/tugsbayasgalan/12/orig -> origin/gh/tugsbayasgalan/12/orig 2025-09-07T07:55:56.7118297Z * [new branch] gh/tugsbayasgalan/13/base -> origin/gh/tugsbayasgalan/13/base 2025-09-07T07:55:56.7119835Z * [new branch] gh/tugsbayasgalan/13/head -> origin/gh/tugsbayasgalan/13/head 2025-09-07T07:55:56.7121402Z * [new branch] gh/tugsbayasgalan/13/orig -> origin/gh/tugsbayasgalan/13/orig 2025-09-07T07:55:56.7124033Z * [new branch] gh/tugsbayasgalan/14/base -> origin/gh/tugsbayasgalan/14/base 2025-09-07T07:55:56.7125645Z * [new branch] gh/tugsbayasgalan/14/head -> origin/gh/tugsbayasgalan/14/head 2025-09-07T07:55:56.7127132Z * [new branch] gh/tugsbayasgalan/14/orig -> origin/gh/tugsbayasgalan/14/orig 2025-09-07T07:55:56.7129851Z * [new branch] gh/tugsbayasgalan/15/base -> origin/gh/tugsbayasgalan/15/base 2025-09-07T07:55:56.7131099Z * [new branch] gh/tugsbayasgalan/15/head -> origin/gh/tugsbayasgalan/15/head 2025-09-07T07:55:56.7132568Z * [new branch] gh/tugsbayasgalan/15/orig -> origin/gh/tugsbayasgalan/15/orig 2025-09-07T07:55:56.7135272Z * [new branch] gh/tugsbayasgalan/2/base -> origin/gh/tugsbayasgalan/2/base 2025-09-07T07:55:56.7136753Z * [new branch] gh/tugsbayasgalan/2/head -> origin/gh/tugsbayasgalan/2/head 2025-09-07T07:55:56.7138269Z * [new branch] gh/tugsbayasgalan/2/orig -> origin/gh/tugsbayasgalan/2/orig 2025-09-07T07:55:56.7140372Z * [new branch] gh/tugsbayasgalan/3/base -> origin/gh/tugsbayasgalan/3/base 2025-09-07T07:55:56.7142165Z * [new branch] gh/tugsbayasgalan/3/head -> origin/gh/tugsbayasgalan/3/head 2025-09-07T07:55:56.7144175Z * [new branch] gh/tugsbayasgalan/3/orig -> origin/gh/tugsbayasgalan/3/orig 2025-09-07T07:55:56.7146987Z * [new branch] gh/tugsbayasgalan/4/base -> origin/gh/tugsbayasgalan/4/base 2025-09-07T07:55:56.7148515Z * [new branch] gh/tugsbayasgalan/4/head -> origin/gh/tugsbayasgalan/4/head 2025-09-07T07:55:56.7150108Z * [new branch] gh/tugsbayasgalan/4/orig -> origin/gh/tugsbayasgalan/4/orig 2025-09-07T07:55:56.7152514Z * [new branch] gh/tugsbayasgalan/5/base -> origin/gh/tugsbayasgalan/5/base 2025-09-07T07:55:56.7154382Z * [new branch] gh/tugsbayasgalan/5/head -> origin/gh/tugsbayasgalan/5/head 2025-09-07T07:55:56.7156020Z * [new branch] gh/tugsbayasgalan/5/orig -> origin/gh/tugsbayasgalan/5/orig 2025-09-07T07:55:56.7158257Z * [new branch] gh/tugsbayasgalan/6/base -> origin/gh/tugsbayasgalan/6/base 2025-09-07T07:55:56.7159707Z * [new branch] gh/tugsbayasgalan/6/head -> origin/gh/tugsbayasgalan/6/head 2025-09-07T07:55:56.7161545Z * [new branch] gh/tugsbayasgalan/6/orig -> origin/gh/tugsbayasgalan/6/orig 2025-09-07T07:55:56.7163987Z * [new branch] gh/tugsbayasgalan/7/base -> origin/gh/tugsbayasgalan/7/base 2025-09-07T07:55:56.7165806Z * [new branch] gh/tugsbayasgalan/7/head -> origin/gh/tugsbayasgalan/7/head 2025-09-07T07:55:56.7167337Z * [new branch] gh/tugsbayasgalan/7/orig -> origin/gh/tugsbayasgalan/7/orig 2025-09-07T07:55:56.7169666Z * [new branch] gh/tugsbayasgalan/8/base -> origin/gh/tugsbayasgalan/8/base 2025-09-07T07:55:56.7171158Z * [new branch] gh/tugsbayasgalan/8/head -> origin/gh/tugsbayasgalan/8/head 2025-09-07T07:55:56.7172739Z * [new branch] gh/tugsbayasgalan/8/orig -> origin/gh/tugsbayasgalan/8/orig 2025-09-07T07:55:56.7175367Z * [new branch] gh/tugsbayasgalan/9/base -> origin/gh/tugsbayasgalan/9/base 2025-09-07T07:55:56.7177026Z * [new branch] gh/tugsbayasgalan/9/head -> origin/gh/tugsbayasgalan/9/head 2025-09-07T07:55:56.7178377Z * [new branch] gh/tugsbayasgalan/9/orig -> origin/gh/tugsbayasgalan/9/orig 2025-09-07T07:55:56.7181156Z * [new branch] gh/v0i0/1/base -> origin/gh/v0i0/1/base 2025-09-07T07:55:56.7182821Z * [new branch] gh/v0i0/1/head -> origin/gh/v0i0/1/head 2025-09-07T07:55:56.7184774Z * [new branch] gh/v0i0/1/orig -> origin/gh/v0i0/1/orig 2025-09-07T07:55:56.7186913Z * [new branch] gh/v0i0/4/base -> origin/gh/v0i0/4/base 2025-09-07T07:55:56.7188526Z * [new branch] gh/v0i0/4/head -> origin/gh/v0i0/4/head 2025-09-07T07:55:56.7190036Z * [new branch] gh/v0i0/4/orig -> origin/gh/v0i0/4/orig 2025-09-07T07:55:56.7192227Z * [new branch] gh/v0i0/6/base -> origin/gh/v0i0/6/base 2025-09-07T07:55:56.7194022Z * [new branch] gh/v0i0/6/head -> origin/gh/v0i0/6/head 2025-09-07T07:55:56.7195713Z * [new branch] gh/v0i0/6/orig -> origin/gh/v0i0/6/orig 2025-09-07T07:55:56.7198125Z * [new branch] gh/v0i0/7/base -> origin/gh/v0i0/7/base 2025-09-07T07:55:56.7199703Z * [new branch] gh/v0i0/7/head -> origin/gh/v0i0/7/head 2025-09-07T07:55:56.7201378Z * [new branch] gh/v0i0/7/orig -> origin/gh/v0i0/7/orig 2025-09-07T07:55:56.7203531Z * [new branch] gh/v0i0/8/base -> origin/gh/v0i0/8/base 2025-09-07T07:55:56.7205311Z * [new branch] gh/v0i0/8/head -> origin/gh/v0i0/8/head 2025-09-07T07:55:56.7206907Z * [new branch] gh/v0i0/8/orig -> origin/gh/v0i0/8/orig 2025-09-07T07:55:56.7209143Z * [new branch] gh/v0i0/9/base -> origin/gh/v0i0/9/base 2025-09-07T07:55:56.7210680Z * [new branch] gh/v0i0/9/head -> origin/gh/v0i0/9/head 2025-09-07T07:55:56.7212166Z * [new branch] gh/v0i0/9/orig -> origin/gh/v0i0/9/orig 2025-09-07T07:55:56.7215431Z * [new branch] gh/vkuzo/1/next -> origin/gh/vkuzo/1/next 2025-09-07T07:55:56.7217659Z * [new branch] gh/vkuzo/2/next -> origin/gh/vkuzo/2/next 2025-09-07T07:55:56.7219791Z * [new branch] gh/vkuzo/3/next -> origin/gh/vkuzo/3/next 2025-09-07T07:55:56.7222075Z * [new branch] gh/vkuzo/4/base -> origin/gh/vkuzo/4/base 2025-09-07T07:55:56.7223896Z * [new branch] gh/vkuzo/4/head -> origin/gh/vkuzo/4/head 2025-09-07T07:55:56.7225795Z * [new branch] gh/vkuzo/4/orig -> origin/gh/vkuzo/4/orig 2025-09-07T07:55:56.7228142Z * [new branch] gh/vkuzo/5/base -> origin/gh/vkuzo/5/base 2025-09-07T07:55:56.7229827Z * [new branch] gh/vkuzo/5/head -> origin/gh/vkuzo/5/head 2025-09-07T07:55:56.7231428Z * [new branch] gh/vkuzo/5/orig -> origin/gh/vkuzo/5/orig 2025-09-07T07:55:56.7234069Z * [new branch] gh/vkuzo/6/base -> origin/gh/vkuzo/6/base 2025-09-07T07:55:56.7235689Z * [new branch] gh/vkuzo/6/head -> origin/gh/vkuzo/6/head 2025-09-07T07:55:56.7237332Z * [new branch] gh/vkuzo/6/orig -> origin/gh/vkuzo/6/orig 2025-09-07T07:55:56.7239520Z * [new branch] gh/vkuzo/7/base -> origin/gh/vkuzo/7/base 2025-09-07T07:55:56.7241248Z * [new branch] gh/vkuzo/7/head -> origin/gh/vkuzo/7/head 2025-09-07T07:55:56.7242802Z * [new branch] gh/vkuzo/7/orig -> origin/gh/vkuzo/7/orig 2025-09-07T07:55:56.7246112Z * [new branch] gh/wconstab/419/base -> origin/gh/wconstab/419/base 2025-09-07T07:55:56.7247688Z * [new branch] gh/wconstab/419/head -> origin/gh/wconstab/419/head 2025-09-07T07:55:56.7249153Z * [new branch] gh/wconstab/419/orig -> origin/gh/wconstab/419/orig 2025-09-07T07:55:56.7251520Z * [new branch] gh/wconstab/424/base -> origin/gh/wconstab/424/base 2025-09-07T07:55:56.7253020Z * [new branch] gh/wconstab/424/head -> origin/gh/wconstab/424/head 2025-09-07T07:55:56.7254931Z * [new branch] gh/wconstab/424/orig -> origin/gh/wconstab/424/orig 2025-09-07T07:55:56.7257075Z * [new branch] gh/wconstab/435/base -> origin/gh/wconstab/435/base 2025-09-07T07:55:56.7258895Z * [new branch] gh/wconstab/435/head -> origin/gh/wconstab/435/head 2025-09-07T07:55:56.7260515Z * [new branch] gh/wconstab/435/orig -> origin/gh/wconstab/435/orig 2025-09-07T07:55:56.7262884Z * [new branch] gh/wconstab/438/base -> origin/gh/wconstab/438/base 2025-09-07T07:55:56.7264694Z * [new branch] gh/wconstab/438/head -> origin/gh/wconstab/438/head 2025-09-07T07:55:56.7266227Z * [new branch] gh/wconstab/438/orig -> origin/gh/wconstab/438/orig 2025-09-07T07:55:56.7268456Z * [new branch] gh/wconstab/440/base -> origin/gh/wconstab/440/base 2025-09-07T07:55:56.7270093Z * [new branch] gh/wconstab/440/head -> origin/gh/wconstab/440/head 2025-09-07T07:55:56.7271790Z * [new branch] gh/wconstab/440/orig -> origin/gh/wconstab/440/orig 2025-09-07T07:55:56.7274495Z * [new branch] gh/wconstab/441/base -> origin/gh/wconstab/441/base 2025-09-07T07:55:56.7276042Z * [new branch] gh/wconstab/441/head -> origin/gh/wconstab/441/head 2025-09-07T07:55:56.7277762Z * [new branch] gh/wconstab/441/orig -> origin/gh/wconstab/441/orig 2025-09-07T07:55:56.7280231Z * [new branch] gh/wconstab/442/base -> origin/gh/wconstab/442/base 2025-09-07T07:55:56.7281903Z * [new branch] gh/wconstab/442/head -> origin/gh/wconstab/442/head 2025-09-07T07:55:56.7283469Z * [new branch] gh/wconstab/442/orig -> origin/gh/wconstab/442/orig 2025-09-07T07:55:56.7286126Z * [new branch] gh/wconstab/443/base -> origin/gh/wconstab/443/base 2025-09-07T07:55:56.7287629Z * [new branch] gh/wconstab/443/head -> origin/gh/wconstab/443/head 2025-09-07T07:55:56.7289179Z * [new branch] gh/wconstab/443/orig -> origin/gh/wconstab/443/orig 2025-09-07T07:55:56.7291434Z * [new branch] gh/wconstab/444/base -> origin/gh/wconstab/444/base 2025-09-07T07:55:56.7292988Z * [new branch] gh/wconstab/444/head -> origin/gh/wconstab/444/head 2025-09-07T07:55:56.7294992Z * [new branch] gh/wconstab/444/orig -> origin/gh/wconstab/444/orig 2025-09-07T07:55:56.7297209Z * [new branch] gh/wconstab/445/base -> origin/gh/wconstab/445/base 2025-09-07T07:55:56.7298751Z * [new branch] gh/wconstab/445/head -> origin/gh/wconstab/445/head 2025-09-07T07:55:56.7300394Z * [new branch] gh/wconstab/445/orig -> origin/gh/wconstab/445/orig 2025-09-07T07:55:56.7303100Z * [new branch] gh/wconstab/446/base -> origin/gh/wconstab/446/base 2025-09-07T07:55:56.7305218Z * [new branch] gh/wconstab/446/head -> origin/gh/wconstab/446/head 2025-09-07T07:55:56.7307099Z * [new branch] gh/wconstab/446/orig -> origin/gh/wconstab/446/orig 2025-09-07T07:55:56.7309387Z * [new branch] gh/wconstab/447/base -> origin/gh/wconstab/447/base 2025-09-07T07:55:56.7310993Z * [new branch] gh/wconstab/447/head -> origin/gh/wconstab/447/head 2025-09-07T07:55:56.7312578Z * [new branch] gh/wconstab/447/orig -> origin/gh/wconstab/447/orig 2025-09-07T07:55:56.7315941Z * [new branch] gh/weifengpy/27/base -> origin/gh/weifengpy/27/base 2025-09-07T07:55:56.7317373Z * [new branch] gh/weifengpy/27/head -> origin/gh/weifengpy/27/head 2025-09-07T07:55:56.7319004Z * [new branch] gh/weifengpy/27/orig -> origin/gh/weifengpy/27/orig 2025-09-07T07:55:56.7321320Z * [new branch] gh/weifengpy/30/base -> origin/gh/weifengpy/30/base 2025-09-07T07:55:56.7322849Z * [new branch] gh/weifengpy/30/head -> origin/gh/weifengpy/30/head 2025-09-07T07:55:56.7324693Z * [new branch] gh/weifengpy/30/orig -> origin/gh/weifengpy/30/orig 2025-09-07T07:55:56.7327702Z * [new branch] gh/williamwen42/196/base -> origin/gh/williamwen42/196/base 2025-09-07T07:55:56.7329295Z * [new branch] gh/williamwen42/196/head -> origin/gh/williamwen42/196/head 2025-09-07T07:55:56.7330990Z * [new branch] gh/williamwen42/196/orig -> origin/gh/williamwen42/196/orig 2025-09-07T07:55:56.7334108Z * [new branch] gh/williamwen42/250/base -> origin/gh/williamwen42/250/base 2025-09-07T07:55:56.7336161Z * [new branch] gh/williamwen42/250/head -> origin/gh/williamwen42/250/head 2025-09-07T07:55:56.7336587Z * [new branch] gh/williamwen42/250/orig -> origin/gh/williamwen42/250/orig 2025-09-07T07:55:56.7339126Z * [new branch] gh/williamwen42/258/base -> origin/gh/williamwen42/258/base 2025-09-07T07:55:56.7340807Z * [new branch] gh/williamwen42/258/head -> origin/gh/williamwen42/258/head 2025-09-07T07:55:56.7342340Z * [new branch] gh/williamwen42/258/orig -> origin/gh/williamwen42/258/orig 2025-09-07T07:55:56.7344889Z * [new branch] gh/williamwen42/266/base -> origin/gh/williamwen42/266/base 2025-09-07T07:55:56.7346527Z * [new branch] gh/williamwen42/266/head -> origin/gh/williamwen42/266/head 2025-09-07T07:55:56.7348013Z * [new branch] gh/williamwen42/266/orig -> origin/gh/williamwen42/266/orig 2025-09-07T07:55:56.7350320Z * [new branch] gh/williamwen42/267/base -> origin/gh/williamwen42/267/base 2025-09-07T07:55:56.7351943Z * [new branch] gh/williamwen42/267/head -> origin/gh/williamwen42/267/head 2025-09-07T07:55:56.7353598Z * [new branch] gh/williamwen42/267/orig -> origin/gh/williamwen42/267/orig 2025-09-07T07:55:56.7356500Z * [new branch] gh/williamwen42/270/base -> origin/gh/williamwen42/270/base 2025-09-07T07:55:56.7358165Z * [new branch] gh/williamwen42/270/head -> origin/gh/williamwen42/270/head 2025-09-07T07:55:56.7359775Z * [new branch] gh/williamwen42/270/orig -> origin/gh/williamwen42/270/orig 2025-09-07T07:55:56.7362129Z * [new branch] gh/williamwen42/271/base -> origin/gh/williamwen42/271/base 2025-09-07T07:55:56.7363906Z * [new branch] gh/williamwen42/271/head -> origin/gh/williamwen42/271/head 2025-09-07T07:55:56.7365639Z * [new branch] gh/williamwen42/271/orig -> origin/gh/williamwen42/271/orig 2025-09-07T07:55:56.7367835Z * [new branch] gh/williamwen42/272/base -> origin/gh/williamwen42/272/base 2025-09-07T07:55:56.7369408Z * [new branch] gh/williamwen42/272/head -> origin/gh/williamwen42/272/head 2025-09-07T07:55:56.7371009Z * [new branch] gh/williamwen42/272/orig -> origin/gh/williamwen42/272/orig 2025-09-07T07:55:56.7373221Z * [new branch] gh/williamwen42/274/base -> origin/gh/williamwen42/274/base 2025-09-07T07:55:56.7375179Z * [new branch] gh/williamwen42/274/head -> origin/gh/williamwen42/274/head 2025-09-07T07:55:56.7376732Z * [new branch] gh/williamwen42/274/orig -> origin/gh/williamwen42/274/orig 2025-09-07T07:55:56.7378992Z * [new branch] gh/williamwen42/275/base -> origin/gh/williamwen42/275/base 2025-09-07T07:55:56.7380713Z * [new branch] gh/williamwen42/275/head -> origin/gh/williamwen42/275/head 2025-09-07T07:55:56.7382800Z * [new branch] gh/williamwen42/276/base -> origin/gh/williamwen42/276/base 2025-09-07T07:55:56.7384706Z * [new branch] gh/williamwen42/276/head -> origin/gh/williamwen42/276/head 2025-09-07T07:55:56.7386311Z * [new branch] gh/williamwen42/276/orig -> origin/gh/williamwen42/276/orig 2025-09-07T07:55:56.7388670Z * [new branch] gh/williamwen42/277/base -> origin/gh/williamwen42/277/base 2025-09-07T07:55:56.7390228Z * [new branch] gh/williamwen42/277/head -> origin/gh/williamwen42/277/head 2025-09-07T07:55:56.7391738Z * [new branch] gh/williamwen42/277/orig -> origin/gh/williamwen42/277/orig 2025-09-07T07:55:56.7394156Z * [new branch] gh/williamwen42/278/base -> origin/gh/williamwen42/278/base 2025-09-07T07:55:56.7396276Z * [new branch] gh/williamwen42/278/head -> origin/gh/williamwen42/278/head 2025-09-07T07:55:56.7397912Z * [new branch] gh/williamwen42/278/orig -> origin/gh/williamwen42/278/orig 2025-09-07T07:55:56.7400163Z * [new branch] gh/williamwen42/279/base -> origin/gh/williamwen42/279/base 2025-09-07T07:55:56.7406737Z * [new branch] gh/williamwen42/279/head -> origin/gh/williamwen42/279/head 2025-09-07T07:55:56.7407238Z * [new branch] gh/williamwen42/279/orig -> origin/gh/williamwen42/279/orig 2025-09-07T07:55:56.7407459Z * [new branch] gh/williamwen42/280/base -> origin/gh/williamwen42/280/base 2025-09-07T07:55:56.7407675Z * [new branch] gh/williamwen42/280/head -> origin/gh/williamwen42/280/head 2025-09-07T07:55:56.7409238Z * [new branch] gh/williamwen42/280/orig -> origin/gh/williamwen42/280/orig 2025-09-07T07:55:56.7411520Z * [new branch] gh/williamwen42/281/base -> origin/gh/williamwen42/281/base 2025-09-07T07:55:56.7413002Z * [new branch] gh/williamwen42/281/head -> origin/gh/williamwen42/281/head 2025-09-07T07:55:56.7414934Z * [new branch] gh/williamwen42/281/orig -> origin/gh/williamwen42/281/orig 2025-09-07T07:55:56.7417022Z * [new branch] gh/williamwen42/282/base -> origin/gh/williamwen42/282/base 2025-09-07T07:55:56.7418585Z * [new branch] gh/williamwen42/282/head -> origin/gh/williamwen42/282/head 2025-09-07T07:55:56.7420167Z * [new branch] gh/williamwen42/282/orig -> origin/gh/williamwen42/282/orig 2025-09-07T07:55:56.7422538Z * [new branch] gh/williamwen42/283/base -> origin/gh/williamwen42/283/base 2025-09-07T07:55:56.7424523Z * [new branch] gh/williamwen42/283/head -> origin/gh/williamwen42/283/head 2025-09-07T07:55:56.7426056Z * [new branch] gh/williamwen42/283/orig -> origin/gh/williamwen42/283/orig 2025-09-07T07:55:56.7428609Z * [new branch] gh/williamwen42/284/base -> origin/gh/williamwen42/284/base 2025-09-07T07:55:56.7430118Z * [new branch] gh/williamwen42/284/head -> origin/gh/williamwen42/284/head 2025-09-07T07:55:56.7431658Z * [new branch] gh/williamwen42/284/orig -> origin/gh/williamwen42/284/orig 2025-09-07T07:55:56.7433983Z * [new branch] gh/williamwen42/285/base -> origin/gh/williamwen42/285/base 2025-09-07T07:55:56.7435785Z * [new branch] gh/williamwen42/285/head -> origin/gh/williamwen42/285/head 2025-09-07T07:55:56.7437323Z * [new branch] gh/williamwen42/285/orig -> origin/gh/williamwen42/285/orig 2025-09-07T07:55:56.7439437Z * [new branch] gh/williamwen42/286/base -> origin/gh/williamwen42/286/base 2025-09-07T07:55:56.7441006Z * [new branch] gh/williamwen42/286/head -> origin/gh/williamwen42/286/head 2025-09-07T07:55:56.7442505Z * [new branch] gh/williamwen42/286/orig -> origin/gh/williamwen42/286/orig 2025-09-07T07:55:56.7445449Z * [new branch] gh/williamwen42/287/base -> origin/gh/williamwen42/287/base 2025-09-07T07:55:56.7446862Z * [new branch] gh/williamwen42/287/head -> origin/gh/williamwen42/287/head 2025-09-07T07:55:56.7448394Z * [new branch] gh/williamwen42/287/orig -> origin/gh/williamwen42/287/orig 2025-09-07T07:55:56.7450989Z * [new branch] gh/williamwen42/288/base -> origin/gh/williamwen42/288/base 2025-09-07T07:55:56.7452654Z * [new branch] gh/williamwen42/288/head -> origin/gh/williamwen42/288/head 2025-09-07T07:55:56.7454296Z * [new branch] gh/williamwen42/288/orig -> origin/gh/williamwen42/288/orig 2025-09-07T07:55:56.7456751Z * [new branch] gh/williamwen42/289/base -> origin/gh/williamwen42/289/base 2025-09-07T07:55:56.7458336Z * [new branch] gh/williamwen42/289/head -> origin/gh/williamwen42/289/head 2025-09-07T07:55:56.7459934Z * [new branch] gh/williamwen42/289/orig -> origin/gh/williamwen42/289/orig 2025-09-07T07:55:56.7462999Z * [new branch] gh/wychi/1/base -> origin/gh/wychi/1/base 2025-09-07T07:55:56.7464866Z * [new branch] gh/wychi/1/head -> origin/gh/wychi/1/head 2025-09-07T07:55:56.7466479Z * [new branch] gh/wychi/1/orig -> origin/gh/wychi/1/orig 2025-09-07T07:55:56.7469391Z * [new branch] gh/xmfan/169/base -> origin/gh/xmfan/169/base 2025-09-07T07:55:56.7470943Z * [new branch] gh/xmfan/169/head -> origin/gh/xmfan/169/head 2025-09-07T07:55:56.7473062Z * [new branch] gh/xmfan/170/base -> origin/gh/xmfan/170/base 2025-09-07T07:55:56.7475026Z * [new branch] gh/xmfan/170/head -> origin/gh/xmfan/170/head 2025-09-07T07:55:56.7477379Z * [new branch] gh/xmfan/18/base -> origin/gh/xmfan/18/base 2025-09-07T07:55:56.7478986Z * [new branch] gh/xmfan/18/head -> origin/gh/xmfan/18/head 2025-09-07T07:55:56.7481207Z * [new branch] gh/xmfan/229/base -> origin/gh/xmfan/229/base 2025-09-07T07:55:56.7482739Z * [new branch] gh/xmfan/229/head -> origin/gh/xmfan/229/head 2025-09-07T07:55:56.7484597Z * [new branch] gh/xmfan/229/orig -> origin/gh/xmfan/229/orig 2025-09-07T07:55:56.7486852Z * [new branch] gh/xmfan/237/base -> origin/gh/xmfan/237/base 2025-09-07T07:55:56.7488479Z * [new branch] gh/xmfan/237/head -> origin/gh/xmfan/237/head 2025-09-07T07:55:56.7490047Z * [new branch] gh/xmfan/237/orig -> origin/gh/xmfan/237/orig 2025-09-07T07:55:56.7492174Z * [new branch] gh/xmfan/244/base -> origin/gh/xmfan/244/base 2025-09-07T07:55:56.7493828Z * [new branch] gh/xmfan/244/head -> origin/gh/xmfan/244/head 2025-09-07T07:55:56.7495638Z * [new branch] gh/xmfan/244/orig -> origin/gh/xmfan/244/orig 2025-09-07T07:55:56.7497800Z * [new branch] gh/xmfan/246/base -> origin/gh/xmfan/246/base 2025-09-07T07:55:56.7499437Z * [new branch] gh/xmfan/246/head -> origin/gh/xmfan/246/head 2025-09-07T07:55:56.7500886Z * [new branch] gh/xmfan/246/orig -> origin/gh/xmfan/246/orig 2025-09-07T07:55:56.7503076Z * [new branch] gh/xmfan/253/base -> origin/gh/xmfan/253/base 2025-09-07T07:55:56.7505049Z * [new branch] gh/xmfan/253/head -> origin/gh/xmfan/253/head 2025-09-07T07:55:56.7506581Z * [new branch] gh/xmfan/253/orig -> origin/gh/xmfan/253/orig 2025-09-07T07:55:56.7508826Z * [new branch] gh/xmfan/254/base -> origin/gh/xmfan/254/base 2025-09-07T07:55:56.7510425Z * [new branch] gh/xmfan/254/head -> origin/gh/xmfan/254/head 2025-09-07T07:55:56.7512028Z * [new branch] gh/xmfan/254/orig -> origin/gh/xmfan/254/orig 2025-09-07T07:55:56.7514805Z * [new branch] gh/xmfan/260/base -> origin/gh/xmfan/260/base 2025-09-07T07:55:56.7516220Z * [new branch] gh/xmfan/260/head -> origin/gh/xmfan/260/head 2025-09-07T07:55:56.7517873Z * [new branch] gh/xmfan/260/orig -> origin/gh/xmfan/260/orig 2025-09-07T07:55:56.7520083Z * [new branch] gh/xmfan/262/base -> origin/gh/xmfan/262/base 2025-09-07T07:55:56.7521571Z * [new branch] gh/xmfan/262/head -> origin/gh/xmfan/262/head 2025-09-07T07:55:56.7523077Z * [new branch] gh/xmfan/262/orig -> origin/gh/xmfan/262/orig 2025-09-07T07:55:56.7525750Z * [new branch] gh/xmfan/263/base -> origin/gh/xmfan/263/base 2025-09-07T07:55:56.7527268Z * [new branch] gh/xmfan/263/head -> origin/gh/xmfan/263/head 2025-09-07T07:55:56.7528810Z * [new branch] gh/xmfan/263/orig -> origin/gh/xmfan/263/orig 2025-09-07T07:55:56.7531093Z * [new branch] gh/xmfan/264/base -> origin/gh/xmfan/264/base 2025-09-07T07:55:56.7532646Z * [new branch] gh/xmfan/264/head -> origin/gh/xmfan/264/head 2025-09-07T07:55:56.7534472Z * [new branch] gh/xmfan/264/orig -> origin/gh/xmfan/264/orig 2025-09-07T07:55:56.7536722Z * [new branch] gh/xmfan/274/base -> origin/gh/xmfan/274/base 2025-09-07T07:55:56.7538177Z * [new branch] gh/xmfan/274/head -> origin/gh/xmfan/274/head 2025-09-07T07:55:56.7539779Z * [new branch] gh/xmfan/274/orig -> origin/gh/xmfan/274/orig 2025-09-07T07:55:56.7542004Z * [new branch] gh/xmfan/276/base -> origin/gh/xmfan/276/base 2025-09-07T07:55:56.7543567Z * [new branch] gh/xmfan/276/head -> origin/gh/xmfan/276/head 2025-09-07T07:55:56.7545687Z * [new branch] gh/xmfan/276/orig -> origin/gh/xmfan/276/orig 2025-09-07T07:55:56.7547908Z * [new branch] gh/xmfan/277/base -> origin/gh/xmfan/277/base 2025-09-07T07:55:56.7549406Z * [new branch] gh/xmfan/277/head -> origin/gh/xmfan/277/head 2025-09-07T07:55:56.7550927Z * [new branch] gh/xmfan/277/orig -> origin/gh/xmfan/277/orig 2025-09-07T07:55:56.7553171Z * [new branch] gh/xmfan/278/base -> origin/gh/xmfan/278/base 2025-09-07T07:55:56.7555085Z * [new branch] gh/xmfan/278/head -> origin/gh/xmfan/278/head 2025-09-07T07:55:56.7556644Z * [new branch] gh/xmfan/278/orig -> origin/gh/xmfan/278/orig 2025-09-07T07:55:56.7559045Z * [new branch] gh/xmfan/279/base -> origin/gh/xmfan/279/base 2025-09-07T07:55:56.7560602Z * [new branch] gh/xmfan/279/head -> origin/gh/xmfan/279/head 2025-09-07T07:55:56.7562159Z * [new branch] gh/xmfan/279/orig -> origin/gh/xmfan/279/orig 2025-09-07T07:55:56.7564726Z * [new branch] gh/xmfan/280/base -> origin/gh/xmfan/280/base 2025-09-07T07:55:56.7566303Z * [new branch] gh/xmfan/280/head -> origin/gh/xmfan/280/head 2025-09-07T07:55:56.7567733Z * [new branch] gh/xmfan/280/orig -> origin/gh/xmfan/280/orig 2025-09-07T07:55:56.7569988Z * [new branch] gh/xmfan/281/base -> origin/gh/xmfan/281/base 2025-09-07T07:55:56.7571570Z * [new branch] gh/xmfan/281/head -> origin/gh/xmfan/281/head 2025-09-07T07:55:56.7573143Z * [new branch] gh/xmfan/281/orig -> origin/gh/xmfan/281/orig 2025-09-07T07:55:56.7575806Z * [new branch] gh/xmfan/282/base -> origin/gh/xmfan/282/base 2025-09-07T07:55:56.7577365Z * [new branch] gh/xmfan/282/head -> origin/gh/xmfan/282/head 2025-09-07T07:55:56.7579555Z * [new branch] gh/xmfan/283/base -> origin/gh/xmfan/283/base 2025-09-07T07:55:56.7581353Z * [new branch] gh/xmfan/283/head -> origin/gh/xmfan/283/head 2025-09-07T07:55:56.7582694Z * [new branch] gh/xmfan/283/orig -> origin/gh/xmfan/283/orig 2025-09-07T07:55:56.7585912Z * [new branch] gh/xuanzhang816/14/base -> origin/gh/xuanzhang816/14/base 2025-09-07T07:55:56.7590565Z * [new branch] gh/xuanzhang816/14/head -> origin/gh/xuanzhang816/14/head 2025-09-07T07:55:56.7592115Z * [new branch] gh/xuanzhang816/14/orig -> origin/gh/xuanzhang816/14/orig 2025-09-07T07:55:56.7594636Z * [new branch] gh/xuanzhang816/19/base -> origin/gh/xuanzhang816/19/base 2025-09-07T07:55:56.7596210Z * [new branch] gh/xuanzhang816/19/head -> origin/gh/xuanzhang816/19/head 2025-09-07T07:55:56.7597948Z * [new branch] gh/xuanzhang816/19/orig -> origin/gh/xuanzhang816/19/orig 2025-09-07T07:55:56.7600164Z * [new branch] gh/xuanzhang816/22/base -> origin/gh/xuanzhang816/22/base 2025-09-07T07:55:56.7601689Z * [new branch] gh/xuanzhang816/22/head -> origin/gh/xuanzhang816/22/head 2025-09-07T07:55:56.7603316Z * [new branch] gh/xuanzhang816/22/orig -> origin/gh/xuanzhang816/22/orig 2025-09-07T07:55:56.7605922Z * [new branch] gh/xuanzhang816/23/base -> origin/gh/xuanzhang816/23/base 2025-09-07T07:55:56.7607431Z * [new branch] gh/xuanzhang816/23/head -> origin/gh/xuanzhang816/23/head 2025-09-07T07:55:56.7608990Z * [new branch] gh/xuanzhang816/23/orig -> origin/gh/xuanzhang816/23/orig 2025-09-07T07:55:56.7611237Z * [new branch] gh/xuanzhang816/24/base -> origin/gh/xuanzhang816/24/base 2025-09-07T07:55:56.7612695Z * [new branch] gh/xuanzhang816/24/head -> origin/gh/xuanzhang816/24/head 2025-09-07T07:55:56.7614513Z * [new branch] gh/xuanzhang816/24/orig -> origin/gh/xuanzhang816/24/orig 2025-09-07T07:55:56.7616774Z * [new branch] gh/xuanzhang816/25/base -> origin/gh/xuanzhang816/25/base 2025-09-07T07:55:56.7618350Z * [new branch] gh/xuanzhang816/25/head -> origin/gh/xuanzhang816/25/head 2025-09-07T07:55:56.7619888Z * [new branch] gh/xuanzhang816/25/orig -> origin/gh/xuanzhang816/25/orig 2025-09-07T07:55:56.7622184Z * [new branch] gh/xuanzhang816/26/base -> origin/gh/xuanzhang816/26/base 2025-09-07T07:55:56.7623846Z * [new branch] gh/xuanzhang816/26/head -> origin/gh/xuanzhang816/26/head 2025-09-07T07:55:56.7625662Z * [new branch] gh/xuanzhang816/26/orig -> origin/gh/xuanzhang816/26/orig 2025-09-07T07:55:56.7628484Z * [new branch] gh/yanbing-j/11/base -> origin/gh/yanbing-j/11/base 2025-09-07T07:55:56.7630060Z * [new branch] gh/yanbing-j/11/head -> origin/gh/yanbing-j/11/head 2025-09-07T07:55:56.7631656Z * [new branch] gh/yanbing-j/11/orig -> origin/gh/yanbing-j/11/orig 2025-09-07T07:55:56.7633955Z * [new branch] gh/yanbing-j/12/base -> origin/gh/yanbing-j/12/base 2025-09-07T07:55:56.7635775Z * [new branch] gh/yanbing-j/12/head -> origin/gh/yanbing-j/12/head 2025-09-07T07:55:56.7637327Z * [new branch] gh/yanbing-j/12/orig -> origin/gh/yanbing-j/12/orig 2025-09-07T07:55:56.7639757Z * [new branch] gh/yanbing-j/13/base -> origin/gh/yanbing-j/13/base 2025-09-07T07:55:56.7641275Z * [new branch] gh/yanbing-j/13/head -> origin/gh/yanbing-j/13/head 2025-09-07T07:55:56.7643034Z * [new branch] gh/yanbing-j/13/orig -> origin/gh/yanbing-j/13/orig 2025-09-07T07:55:56.7645672Z * [new branch] gh/yanbing-j/14/base -> origin/gh/yanbing-j/14/base 2025-09-07T07:55:56.7647241Z * [new branch] gh/yanbing-j/14/head -> origin/gh/yanbing-j/14/head 2025-09-07T07:55:56.7648897Z * [new branch] gh/yanbing-j/14/orig -> origin/gh/yanbing-j/14/orig 2025-09-07T07:55:56.7650924Z * [new branch] gh/yanbing-j/15/base -> origin/gh/yanbing-j/15/base 2025-09-07T07:55:56.7652432Z * [new branch] gh/yanbing-j/15/head -> origin/gh/yanbing-j/15/head 2025-09-07T07:55:56.7654149Z * [new branch] gh/yanbing-j/15/orig -> origin/gh/yanbing-j/15/orig 2025-09-07T07:55:56.7656470Z * [new branch] gh/yanbing-j/18/base -> origin/gh/yanbing-j/18/base 2025-09-07T07:55:56.7658037Z * [new branch] gh/yanbing-j/18/head -> origin/gh/yanbing-j/18/head 2025-09-07T07:55:56.7659591Z * [new branch] gh/yanbing-j/18/orig -> origin/gh/yanbing-j/18/orig 2025-09-07T07:55:56.7661882Z * [new branch] gh/yanbing-j/19/base -> origin/gh/yanbing-j/19/base 2025-09-07T07:55:56.7663430Z * [new branch] gh/yanbing-j/19/head -> origin/gh/yanbing-j/19/head 2025-09-07T07:55:56.7665295Z * [new branch] gh/yanbing-j/19/orig -> origin/gh/yanbing-j/19/orig 2025-09-07T07:55:56.7667593Z * [new branch] gh/yanbing-j/20/base -> origin/gh/yanbing-j/20/base 2025-09-07T07:55:56.7669165Z * [new branch] gh/yanbing-j/20/head -> origin/gh/yanbing-j/20/head 2025-09-07T07:55:56.7670721Z * [new branch] gh/yanbing-j/20/orig -> origin/gh/yanbing-j/20/orig 2025-09-07T07:55:56.7673012Z * [new branch] gh/yanbing-j/21/base -> origin/gh/yanbing-j/21/base 2025-09-07T07:55:56.7674927Z * [new branch] gh/yanbing-j/21/head -> origin/gh/yanbing-j/21/head 2025-09-07T07:55:56.7677251Z * [new branch] gh/yanbing-j/22/base -> origin/gh/yanbing-j/22/base 2025-09-07T07:55:56.7678887Z * [new branch] gh/yanbing-j/22/head -> origin/gh/yanbing-j/22/head 2025-09-07T07:55:56.7680342Z * [new branch] gh/yanbing-j/22/orig -> origin/gh/yanbing-j/22/orig 2025-09-07T07:55:56.7682583Z * [new branch] gh/yanbing-j/23/base -> origin/gh/yanbing-j/23/base 2025-09-07T07:55:56.7684305Z * [new branch] gh/yanbing-j/23/head -> origin/gh/yanbing-j/23/head 2025-09-07T07:55:56.7685977Z * [new branch] gh/yanbing-j/23/orig -> origin/gh/yanbing-j/23/orig 2025-09-07T07:55:56.7688205Z * [new branch] gh/yanbing-j/24/base -> origin/gh/yanbing-j/24/base 2025-09-07T07:55:56.7689725Z * [new branch] gh/yanbing-j/24/head -> origin/gh/yanbing-j/24/head 2025-09-07T07:55:56.7691274Z * [new branch] gh/yanbing-j/24/orig -> origin/gh/yanbing-j/24/orig 2025-09-07T07:55:56.7693496Z * [new branch] gh/yanbing-j/25/base -> origin/gh/yanbing-j/25/base 2025-09-07T07:55:56.7695412Z * [new branch] gh/yanbing-j/25/head -> origin/gh/yanbing-j/25/head 2025-09-07T07:55:56.7696924Z * [new branch] gh/yanbing-j/25/orig -> origin/gh/yanbing-j/25/orig 2025-09-07T07:55:56.7699151Z * [new branch] gh/yanbing-j/26/base -> origin/gh/yanbing-j/26/base 2025-09-07T07:55:56.7700699Z * [new branch] gh/yanbing-j/26/head -> origin/gh/yanbing-j/26/head 2025-09-07T07:55:56.7702231Z * [new branch] gh/yanbing-j/26/orig -> origin/gh/yanbing-j/26/orig 2025-09-07T07:55:56.7704806Z * [new branch] gh/yanbing-j/36/base -> origin/gh/yanbing-j/36/base 2025-09-07T07:55:56.7706329Z * [new branch] gh/yanbing-j/36/head -> origin/gh/yanbing-j/36/head 2025-09-07T07:55:56.7707889Z * [new branch] gh/yanbing-j/36/orig -> origin/gh/yanbing-j/36/orig 2025-09-07T07:55:56.7710123Z * [new branch] gh/yanbing-j/37/base -> origin/gh/yanbing-j/37/base 2025-09-07T07:55:56.7711654Z * [new branch] gh/yanbing-j/37/head -> origin/gh/yanbing-j/37/head 2025-09-07T07:55:56.7713327Z * [new branch] gh/yanbing-j/37/orig -> origin/gh/yanbing-j/37/orig 2025-09-07T07:55:56.7716676Z * [new branch] gh/yangw-dev/12/base -> origin/gh/yangw-dev/12/base 2025-09-07T07:55:56.7718252Z * [new branch] gh/yangw-dev/12/head -> origin/gh/yangw-dev/12/head 2025-09-07T07:55:56.7719768Z * [new branch] gh/yangw-dev/12/orig -> origin/gh/yangw-dev/12/orig 2025-09-07T07:55:56.7721994Z * [new branch] gh/yangw-dev/13/base -> origin/gh/yangw-dev/13/base 2025-09-07T07:55:56.7723525Z * [new branch] gh/yangw-dev/13/head -> origin/gh/yangw-dev/13/head 2025-09-07T07:55:56.7725439Z * [new branch] gh/yangw-dev/13/orig -> origin/gh/yangw-dev/13/orig 2025-09-07T07:55:56.7727546Z * [new branch] gh/yangw-dev/14/base -> origin/gh/yangw-dev/14/base 2025-09-07T07:55:56.7729133Z * [new branch] gh/yangw-dev/14/head -> origin/gh/yangw-dev/14/head 2025-09-07T07:55:56.7730708Z * [new branch] gh/yangw-dev/14/orig -> origin/gh/yangw-dev/14/orig 2025-09-07T07:55:56.7732983Z * [new branch] gh/yangw-dev/15/base -> origin/gh/yangw-dev/15/base 2025-09-07T07:55:56.7734912Z * [new branch] gh/yangw-dev/15/head -> origin/gh/yangw-dev/15/head 2025-09-07T07:55:56.7736606Z * [new branch] gh/yangw-dev/15/orig -> origin/gh/yangw-dev/15/orig 2025-09-07T07:55:56.7738978Z * [new branch] gh/yangw-dev/16/base -> origin/gh/yangw-dev/16/base 2025-09-07T07:55:56.7740531Z * [new branch] gh/yangw-dev/16/head -> origin/gh/yangw-dev/16/head 2025-09-07T07:55:56.7742054Z * [new branch] gh/yangw-dev/16/orig -> origin/gh/yangw-dev/16/orig 2025-09-07T07:55:56.7744581Z * [new branch] gh/yangw-dev/17/base -> origin/gh/yangw-dev/17/base 2025-09-07T07:55:56.7746138Z * [new branch] gh/yangw-dev/17/head -> origin/gh/yangw-dev/17/head 2025-09-07T07:55:56.7747578Z * [new branch] gh/yangw-dev/17/orig -> origin/gh/yangw-dev/17/orig 2025-09-07T07:55:56.7749790Z * [new branch] gh/yangw-dev/18/base -> origin/gh/yangw-dev/18/base 2025-09-07T07:55:56.7751451Z * [new branch] gh/yangw-dev/18/head -> origin/gh/yangw-dev/18/head 2025-09-07T07:55:56.7752918Z * [new branch] gh/yangw-dev/18/orig -> origin/gh/yangw-dev/18/orig 2025-09-07T07:55:56.7755439Z * [new branch] gh/yangw-dev/19/base -> origin/gh/yangw-dev/19/base 2025-09-07T07:55:56.7756971Z * [new branch] gh/yangw-dev/19/head -> origin/gh/yangw-dev/19/head 2025-09-07T07:55:56.7758597Z * [new branch] gh/yangw-dev/19/orig -> origin/gh/yangw-dev/19/orig 2025-09-07T07:55:56.7760815Z * [new branch] gh/yangw-dev/20/base -> origin/gh/yangw-dev/20/base 2025-09-07T07:55:56.7762388Z * [new branch] gh/yangw-dev/20/head -> origin/gh/yangw-dev/20/head 2025-09-07T07:55:56.7763997Z * [new branch] gh/yangw-dev/20/orig -> origin/gh/yangw-dev/20/orig 2025-09-07T07:55:56.7766459Z * [new branch] gh/yangw-dev/21/base -> origin/gh/yangw-dev/21/base 2025-09-07T07:55:56.7767999Z * [new branch] gh/yangw-dev/21/head -> origin/gh/yangw-dev/21/head 2025-09-07T07:55:56.7769639Z * [new branch] gh/yangw-dev/21/orig -> origin/gh/yangw-dev/21/orig 2025-09-07T07:55:56.7771843Z * [new branch] gh/yangw-dev/22/base -> origin/gh/yangw-dev/22/base 2025-09-07T07:55:56.7773413Z * [new branch] gh/yangw-dev/22/head -> origin/gh/yangw-dev/22/head 2025-09-07T07:55:56.7775288Z * [new branch] gh/yangw-dev/22/orig -> origin/gh/yangw-dev/22/orig 2025-09-07T07:55:56.7777329Z * [new branch] gh/yangw-dev/23/base -> origin/gh/yangw-dev/23/base 2025-09-07T07:55:56.7779089Z * [new branch] gh/yangw-dev/23/head -> origin/gh/yangw-dev/23/head 2025-09-07T07:55:56.7780446Z * [new branch] gh/yangw-dev/23/orig -> origin/gh/yangw-dev/23/orig 2025-09-07T07:55:56.7782722Z * [new branch] gh/yangw-dev/24/base -> origin/gh/yangw-dev/24/base 2025-09-07T07:55:56.7784599Z * [new branch] gh/yangw-dev/24/head -> origin/gh/yangw-dev/24/head 2025-09-07T07:55:56.7786130Z * [new branch] gh/yangw-dev/24/orig -> origin/gh/yangw-dev/24/orig 2025-09-07T07:55:56.7788492Z * [new branch] gh/yangw-dev/25/base -> origin/gh/yangw-dev/25/base 2025-09-07T07:55:56.7790017Z * [new branch] gh/yangw-dev/25/head -> origin/gh/yangw-dev/25/head 2025-09-07T07:55:56.7794146Z * [new branch] gh/yangw-dev/25/orig -> origin/gh/yangw-dev/25/orig 2025-09-07T07:55:56.7796769Z * [new branch] gh/yangw-dev/26/base -> origin/gh/yangw-dev/26/base 2025-09-07T07:55:56.7798372Z * [new branch] gh/yangw-dev/26/head -> origin/gh/yangw-dev/26/head 2025-09-07T07:55:56.7799879Z * [new branch] gh/yangw-dev/26/orig -> origin/gh/yangw-dev/26/orig 2025-09-07T07:55:56.7802177Z * [new branch] gh/yangw-dev/27/base -> origin/gh/yangw-dev/27/base 2025-09-07T07:55:56.7803898Z * [new branch] gh/yangw-dev/27/head -> origin/gh/yangw-dev/27/head 2025-09-07T07:55:56.7805769Z * [new branch] gh/yangw-dev/27/orig -> origin/gh/yangw-dev/27/orig 2025-09-07T07:55:56.7808632Z * [new branch] gh/ydwu4/233/base -> origin/gh/ydwu4/233/base 2025-09-07T07:55:56.7810242Z * [new branch] gh/ydwu4/233/head -> origin/gh/ydwu4/233/head 2025-09-07T07:55:56.7811800Z * [new branch] gh/ydwu4/233/orig -> origin/gh/ydwu4/233/orig 2025-09-07T07:55:56.7814555Z * [new branch] gh/ydwu4/246/base -> origin/gh/ydwu4/246/base 2025-09-07T07:55:56.7816117Z * [new branch] gh/ydwu4/246/head -> origin/gh/ydwu4/246/head 2025-09-07T07:55:56.7817637Z * [new branch] gh/ydwu4/246/orig -> origin/gh/ydwu4/246/orig 2025-09-07T07:55:56.7819948Z * [new branch] gh/ydwu4/253/base -> origin/gh/ydwu4/253/base 2025-09-07T07:55:56.7821621Z * [new branch] gh/ydwu4/253/head -> origin/gh/ydwu4/253/head 2025-09-07T07:55:56.7823134Z * [new branch] gh/ydwu4/253/orig -> origin/gh/ydwu4/253/orig 2025-09-07T07:55:56.7825796Z * [new branch] gh/ydwu4/255/base -> origin/gh/ydwu4/255/base 2025-09-07T07:55:56.7827245Z * [new branch] gh/ydwu4/255/head -> origin/gh/ydwu4/255/head 2025-09-07T07:55:56.7828810Z * [new branch] gh/ydwu4/255/orig -> origin/gh/ydwu4/255/orig 2025-09-07T07:55:56.7831176Z * [new branch] gh/ydwu4/259/base -> origin/gh/ydwu4/259/base 2025-09-07T07:55:56.7833003Z * [new branch] gh/ydwu4/259/head -> origin/gh/ydwu4/259/head 2025-09-07T07:55:56.7834919Z * [new branch] gh/ydwu4/259/orig -> origin/gh/ydwu4/259/orig 2025-09-07T07:55:56.7837179Z * [new branch] gh/ydwu4/262/base -> origin/gh/ydwu4/262/base 2025-09-07T07:55:56.7838828Z * [new branch] gh/ydwu4/262/head -> origin/gh/ydwu4/262/head 2025-09-07T07:55:56.7840507Z * [new branch] gh/ydwu4/262/orig -> origin/gh/ydwu4/262/orig 2025-09-07T07:55:56.7842722Z * [new branch] gh/ydwu4/263/base -> origin/gh/ydwu4/263/base 2025-09-07T07:55:56.7844597Z * [new branch] gh/ydwu4/263/head -> origin/gh/ydwu4/263/head 2025-09-07T07:55:56.7846152Z * [new branch] gh/ydwu4/263/orig -> origin/gh/ydwu4/263/orig 2025-09-07T07:55:56.7848584Z * [new branch] gh/ydwu4/269/base -> origin/gh/ydwu4/269/base 2025-09-07T07:55:56.7850197Z * [new branch] gh/ydwu4/269/head -> origin/gh/ydwu4/269/head 2025-09-07T07:55:56.7851597Z * [new branch] gh/ydwu4/269/orig -> origin/gh/ydwu4/269/orig 2025-09-07T07:55:56.7854077Z * [new branch] gh/ydwu4/270/base -> origin/gh/ydwu4/270/base 2025-09-07T07:55:56.7856057Z * [new branch] gh/ydwu4/270/head -> origin/gh/ydwu4/270/head 2025-09-07T07:55:56.7857540Z * [new branch] gh/ydwu4/270/orig -> origin/gh/ydwu4/270/orig 2025-09-07T07:55:56.7859793Z * [new branch] gh/ydwu4/272/base -> origin/gh/ydwu4/272/base 2025-09-07T07:55:56.7861545Z * [new branch] gh/ydwu4/272/head -> origin/gh/ydwu4/272/head 2025-09-07T07:55:56.7863157Z * [new branch] gh/ydwu4/272/orig -> origin/gh/ydwu4/272/orig 2025-09-07T07:55:56.7865782Z * [new branch] gh/ydwu4/275/base -> origin/gh/ydwu4/275/base 2025-09-07T07:55:56.7867428Z * [new branch] gh/ydwu4/275/head -> origin/gh/ydwu4/275/head 2025-09-07T07:55:56.7868941Z * [new branch] gh/ydwu4/275/orig -> origin/gh/ydwu4/275/orig 2025-09-07T07:55:56.7871116Z * [new branch] gh/ydwu4/276/base -> origin/gh/ydwu4/276/base 2025-09-07T07:55:56.7872655Z * [new branch] gh/ydwu4/276/head -> origin/gh/ydwu4/276/head 2025-09-07T07:55:56.7874555Z * [new branch] gh/ydwu4/276/orig -> origin/gh/ydwu4/276/orig 2025-09-07T07:55:56.7877175Z * [new branch] gh/ydwu4/279/base -> origin/gh/ydwu4/279/base 2025-09-07T07:55:56.7878869Z * [new branch] gh/ydwu4/279/head -> origin/gh/ydwu4/279/head 2025-09-07T07:55:56.7880374Z * [new branch] gh/ydwu4/279/orig -> origin/gh/ydwu4/279/orig 2025-09-07T07:55:56.7883108Z * [new branch] gh/ydwu4/283/base -> origin/gh/ydwu4/283/base 2025-09-07T07:55:56.7885245Z * [new branch] gh/ydwu4/283/head -> origin/gh/ydwu4/283/head 2025-09-07T07:55:56.7886794Z * [new branch] gh/ydwu4/283/orig -> origin/gh/ydwu4/283/orig 2025-09-07T07:55:56.7889280Z * [new branch] gh/ydwu4/289/base -> origin/gh/ydwu4/289/base 2025-09-07T07:55:56.7890817Z * [new branch] gh/ydwu4/289/head -> origin/gh/ydwu4/289/head 2025-09-07T07:55:56.7892332Z * [new branch] gh/ydwu4/289/orig -> origin/gh/ydwu4/289/orig 2025-09-07T07:55:56.7895190Z * [new branch] gh/ydwu4/290/base -> origin/gh/ydwu4/290/base 2025-09-07T07:55:56.7896766Z * [new branch] gh/ydwu4/290/head -> origin/gh/ydwu4/290/head 2025-09-07T07:55:56.7898297Z * [new branch] gh/ydwu4/290/orig -> origin/gh/ydwu4/290/orig 2025-09-07T07:55:56.7900716Z * [new branch] gh/ydwu4/291/base -> origin/gh/ydwu4/291/base 2025-09-07T07:55:56.7902316Z * [new branch] gh/ydwu4/291/head -> origin/gh/ydwu4/291/head 2025-09-07T07:55:56.7904053Z * [new branch] gh/ydwu4/291/orig -> origin/gh/ydwu4/291/orig 2025-09-07T07:55:56.7906733Z * [new branch] gh/ydwu4/292/base -> origin/gh/ydwu4/292/base 2025-09-07T07:55:56.7908203Z * [new branch] gh/ydwu4/292/head -> origin/gh/ydwu4/292/head 2025-09-07T07:55:56.7909691Z * [new branch] gh/ydwu4/292/orig -> origin/gh/ydwu4/292/orig 2025-09-07T07:55:56.7911894Z * [new branch] gh/ydwu4/293/base -> origin/gh/ydwu4/293/base 2025-09-07T07:55:56.7913470Z * [new branch] gh/ydwu4/293/head -> origin/gh/ydwu4/293/head 2025-09-07T07:55:56.7915343Z * [new branch] gh/ydwu4/293/orig -> origin/gh/ydwu4/293/orig 2025-09-07T07:55:56.7917976Z * [new branch] gh/ydwu4/294/base -> origin/gh/ydwu4/294/base 2025-09-07T07:55:56.7919728Z * [new branch] gh/ydwu4/294/head -> origin/gh/ydwu4/294/head 2025-09-07T07:55:56.7921082Z * [new branch] gh/ydwu4/294/orig -> origin/gh/ydwu4/294/orig 2025-09-07T07:55:56.7923359Z * [new branch] gh/ydwu4/295/base -> origin/gh/ydwu4/295/base 2025-09-07T07:55:56.7925480Z * [new branch] gh/ydwu4/295/head -> origin/gh/ydwu4/295/head 2025-09-07T07:55:56.7926903Z * [new branch] gh/ydwu4/295/orig -> origin/gh/ydwu4/295/orig 2025-09-07T07:55:56.7929314Z * [new branch] gh/ydwu4/296/base -> origin/gh/ydwu4/296/base 2025-09-07T07:55:56.7930804Z * [new branch] gh/ydwu4/296/head -> origin/gh/ydwu4/296/head 2025-09-07T07:55:56.7932320Z * [new branch] gh/ydwu4/296/orig -> origin/gh/ydwu4/296/orig 2025-09-07T07:55:56.7935732Z * [new branch] gh/ydwu4/300/base -> origin/gh/ydwu4/300/base 2025-09-07T07:55:56.7937759Z * [new branch] gh/ydwu4/300/head -> origin/gh/ydwu4/300/head 2025-09-07T07:55:56.7939446Z * [new branch] gh/ydwu4/300/orig -> origin/gh/ydwu4/300/orig 2025-09-07T07:55:56.7942058Z * [new branch] gh/ydwu4/301/base -> origin/gh/ydwu4/301/base 2025-09-07T07:55:56.7943578Z * [new branch] gh/ydwu4/301/head -> origin/gh/ydwu4/301/head 2025-09-07T07:55:56.7945533Z * [new branch] gh/ydwu4/301/orig -> origin/gh/ydwu4/301/orig 2025-09-07T07:55:56.7947873Z * [new branch] gh/ydwu4/302/base -> origin/gh/ydwu4/302/base 2025-09-07T07:55:56.7949381Z * [new branch] gh/ydwu4/302/head -> origin/gh/ydwu4/302/head 2025-09-07T07:55:56.7950946Z * [new branch] gh/ydwu4/302/orig -> origin/gh/ydwu4/302/orig 2025-09-07T07:55:56.7953124Z * [new branch] gh/ydwu4/303/base -> origin/gh/ydwu4/303/base 2025-09-07T07:55:56.7955129Z * [new branch] gh/ydwu4/303/head -> origin/gh/ydwu4/303/head 2025-09-07T07:55:56.7956595Z * [new branch] gh/ydwu4/303/orig -> origin/gh/ydwu4/303/orig 2025-09-07T07:55:56.7959038Z * [new branch] gh/ydwu4/304/base -> origin/gh/ydwu4/304/base 2025-09-07T07:55:56.7960618Z * [new branch] gh/ydwu4/304/head -> origin/gh/ydwu4/304/head 2025-09-07T07:55:56.7962134Z * [new branch] gh/ydwu4/304/orig -> origin/gh/ydwu4/304/orig 2025-09-07T07:55:56.7965046Z * [new branch] gh/ydwu4/305/base -> origin/gh/ydwu4/305/base 2025-09-07T07:55:56.7966617Z * [new branch] gh/ydwu4/305/head -> origin/gh/ydwu4/305/head 2025-09-07T07:55:56.7968155Z * [new branch] gh/ydwu4/305/orig -> origin/gh/ydwu4/305/orig 2025-09-07T07:55:56.7970790Z * [new branch] gh/ydwu4/306/base -> origin/gh/ydwu4/306/base 2025-09-07T07:55:56.7972480Z * [new branch] gh/ydwu4/306/head -> origin/gh/ydwu4/306/head 2025-09-07T07:55:56.7974160Z * [new branch] gh/ydwu4/306/orig -> origin/gh/ydwu4/306/orig 2025-09-07T07:55:56.7976759Z * [new branch] gh/ydwu4/307/base -> origin/gh/ydwu4/307/base 2025-09-07T07:55:56.7978221Z * [new branch] gh/ydwu4/307/head -> origin/gh/ydwu4/307/head 2025-09-07T07:55:56.7979889Z * [new branch] gh/ydwu4/307/orig -> origin/gh/ydwu4/307/orig 2025-09-07T07:55:56.7982380Z * [new branch] gh/ydwu4/308/base -> origin/gh/ydwu4/308/base 2025-09-07T07:55:56.7984017Z * [new branch] gh/ydwu4/308/head -> origin/gh/ydwu4/308/head 2025-09-07T07:55:56.7985980Z * [new branch] gh/ydwu4/308/orig -> origin/gh/ydwu4/308/orig 2025-09-07T07:55:56.7988199Z * [new branch] gh/ydwu4/309/base -> origin/gh/ydwu4/309/base 2025-09-07T07:55:56.7989907Z * [new branch] gh/ydwu4/309/head -> origin/gh/ydwu4/309/head 2025-09-07T07:55:56.7991384Z * [new branch] gh/ydwu4/309/orig -> origin/gh/ydwu4/309/orig 2025-09-07T07:55:56.7993948Z * [new branch] gh/ydwu4/310/base -> origin/gh/ydwu4/310/base 2025-09-07T07:55:56.7995912Z * [new branch] gh/ydwu4/310/head -> origin/gh/ydwu4/310/head 2025-09-07T07:55:56.7997430Z * [new branch] gh/ydwu4/310/orig -> origin/gh/ydwu4/310/orig 2025-09-07T07:55:56.7999884Z * [new branch] gh/ydwu4/311/base -> origin/gh/ydwu4/311/base 2025-09-07T07:55:56.8001512Z * [new branch] gh/ydwu4/311/head -> origin/gh/ydwu4/311/head 2025-09-07T07:55:56.8003056Z * [new branch] gh/ydwu4/311/orig -> origin/gh/ydwu4/311/orig 2025-09-07T07:55:56.8005932Z * [new branch] gh/ydwu4/312/base -> origin/gh/ydwu4/312/base 2025-09-07T07:55:56.8007390Z * [new branch] gh/ydwu4/312/head -> origin/gh/ydwu4/312/head 2025-09-07T07:55:56.8008977Z * [new branch] gh/ydwu4/312/orig -> origin/gh/ydwu4/312/orig 2025-09-07T07:55:56.8011508Z * [new branch] gh/ydwu4/313/base -> origin/gh/ydwu4/313/base 2025-09-07T07:55:56.8013223Z * [new branch] gh/ydwu4/313/head -> origin/gh/ydwu4/313/head 2025-09-07T07:55:56.8015141Z * [new branch] gh/ydwu4/313/orig -> origin/gh/ydwu4/313/orig 2025-09-07T07:55:56.8017648Z * [new branch] gh/ydwu4/314/base -> origin/gh/ydwu4/314/base 2025-09-07T07:55:56.8019336Z * [new branch] gh/ydwu4/314/head -> origin/gh/ydwu4/314/head 2025-09-07T07:55:56.8020933Z * [new branch] gh/ydwu4/314/orig -> origin/gh/ydwu4/314/orig 2025-09-07T07:55:56.8023328Z * [new branch] gh/ydwu4/315/base -> origin/gh/ydwu4/315/base 2025-09-07T07:55:56.8025263Z * [new branch] gh/ydwu4/315/head -> origin/gh/ydwu4/315/head 2025-09-07T07:55:56.8026918Z * [new branch] gh/ydwu4/315/orig -> origin/gh/ydwu4/315/orig 2025-09-07T07:55:56.8029509Z * [new branch] gh/ydwu4/316/base -> origin/gh/ydwu4/316/base 2025-09-07T07:55:56.8031123Z * [new branch] gh/ydwu4/316/head -> origin/gh/ydwu4/316/head 2025-09-07T07:55:56.8032767Z * [new branch] gh/ydwu4/316/orig -> origin/gh/ydwu4/316/orig 2025-09-07T07:55:56.8035599Z * [new branch] gh/ydwu4/317/base -> origin/gh/ydwu4/317/base 2025-09-07T07:55:56.8037006Z * [new branch] gh/ydwu4/317/head -> origin/gh/ydwu4/317/head 2025-09-07T07:55:56.8038712Z * [new branch] gh/ydwu4/317/orig -> origin/gh/ydwu4/317/orig 2025-09-07T07:55:56.8041207Z * [new branch] gh/ydwu4/318/base -> origin/gh/ydwu4/318/base 2025-09-07T07:55:56.8042812Z * [new branch] gh/ydwu4/318/head -> origin/gh/ydwu4/318/head 2025-09-07T07:55:56.8044739Z * [new branch] gh/ydwu4/318/orig -> origin/gh/ydwu4/318/orig 2025-09-07T07:55:56.8046993Z * [new branch] gh/ydwu4/319/base -> origin/gh/ydwu4/319/base 2025-09-07T07:55:56.8048569Z * [new branch] gh/ydwu4/319/head -> origin/gh/ydwu4/319/head 2025-09-07T07:55:56.8050098Z * [new branch] gh/ydwu4/319/orig -> origin/gh/ydwu4/319/orig 2025-09-07T07:55:56.8052658Z * [new branch] gh/ydwu4/320/base -> origin/gh/ydwu4/320/base 2025-09-07T07:55:56.8054314Z * [new branch] gh/ydwu4/320/head -> origin/gh/ydwu4/320/head 2025-09-07T07:55:56.8056118Z * [new branch] gh/ydwu4/320/orig -> origin/gh/ydwu4/320/orig 2025-09-07T07:55:56.8058595Z * [new branch] gh/ydwu4/321/base -> origin/gh/ydwu4/321/base 2025-09-07T07:55:56.8060015Z * [new branch] gh/ydwu4/321/head -> origin/gh/ydwu4/321/head 2025-09-07T07:55:56.8061603Z * [new branch] gh/ydwu4/321/orig -> origin/gh/ydwu4/321/orig 2025-09-07T07:55:56.8064190Z * [new branch] gh/ydwu4/322/base -> origin/gh/ydwu4/322/base 2025-09-07T07:55:56.8065922Z * [new branch] gh/ydwu4/322/head -> origin/gh/ydwu4/322/head 2025-09-07T07:55:56.8067407Z * [new branch] gh/ydwu4/322/orig -> origin/gh/ydwu4/322/orig 2025-09-07T07:55:56.8069858Z * [new branch] gh/ydwu4/323/base -> origin/gh/ydwu4/323/base 2025-09-07T07:55:56.8071395Z * [new branch] gh/ydwu4/323/head -> origin/gh/ydwu4/323/head 2025-09-07T07:55:56.8072901Z * [new branch] gh/ydwu4/323/orig -> origin/gh/ydwu4/323/orig 2025-09-07T07:55:56.8075706Z * [new branch] gh/ydwu4/324/base -> origin/gh/ydwu4/324/base 2025-09-07T07:55:56.8077294Z * [new branch] gh/ydwu4/324/head -> origin/gh/ydwu4/324/head 2025-09-07T07:55:56.8078822Z * [new branch] gh/ydwu4/324/orig -> origin/gh/ydwu4/324/orig 2025-09-07T07:55:56.8082097Z * [new branch] gh/yf225/133/base -> origin/gh/yf225/133/base 2025-09-07T07:55:56.8083665Z * [new branch] gh/yf225/133/head -> origin/gh/yf225/133/head 2025-09-07T07:55:56.8086714Z * [new branch] gh/yf225/171/base -> origin/gh/yf225/171/base 2025-09-07T07:55:56.8088258Z * [new branch] gh/yf225/171/head -> origin/gh/yf225/171/head 2025-09-07T07:55:56.8089809Z * [new branch] gh/yf225/171/orig -> origin/gh/yf225/171/orig 2025-09-07T07:55:56.8092149Z * [new branch] gh/yf225/172/base -> origin/gh/yf225/172/base 2025-09-07T07:55:56.8093667Z * [new branch] gh/yf225/172/head -> origin/gh/yf225/172/head 2025-09-07T07:55:56.8095551Z * [new branch] gh/yf225/172/orig -> origin/gh/yf225/172/orig 2025-09-07T07:55:56.8097980Z * [new branch] gh/yf225/93/base -> origin/gh/yf225/93/base 2025-09-07T07:55:56.8099487Z * [new branch] gh/yf225/93/head -> origin/gh/yf225/93/head 2025-09-07T07:55:56.8103263Z * [new branch] gh/yifuwang/152/base -> origin/gh/yifuwang/152/base 2025-09-07T07:55:56.8105335Z * [new branch] gh/yifuwang/152/head -> origin/gh/yifuwang/152/head 2025-09-07T07:55:56.8106826Z * [new branch] gh/yifuwang/152/orig -> origin/gh/yifuwang/152/orig 2025-09-07T07:55:56.8109333Z * [new branch] gh/yifuwang/195/base -> origin/gh/yifuwang/195/base 2025-09-07T07:55:56.8110937Z * [new branch] gh/yifuwang/195/head -> origin/gh/yifuwang/195/head 2025-09-07T07:55:56.8112429Z * [new branch] gh/yifuwang/195/orig -> origin/gh/yifuwang/195/orig 2025-09-07T07:55:56.8115957Z * [new branch] gh/yiming0416/1/base -> origin/gh/yiming0416/1/base 2025-09-07T07:55:56.8117545Z * [new branch] gh/yiming0416/1/head -> origin/gh/yiming0416/1/head 2025-09-07T07:55:56.8119863Z * [new branch] gh/yiming0416/2/base -> origin/gh/yiming0416/2/base 2025-09-07T07:55:56.8121344Z * [new branch] gh/yiming0416/2/head -> origin/gh/yiming0416/2/head 2025-09-07T07:55:56.8124642Z * [new branch] gh/ysiraichi/79/base -> origin/gh/ysiraichi/79/base 2025-09-07T07:55:56.8126272Z * [new branch] gh/ysiraichi/79/head -> origin/gh/ysiraichi/79/head 2025-09-07T07:55:56.8128161Z * [new branch] gh/ysiraichi/79/orig -> origin/gh/ysiraichi/79/orig 2025-09-07T07:55:56.8130551Z * [new branch] gh/ysiraichi/88/base -> origin/gh/ysiraichi/88/base 2025-09-07T07:55:56.8132260Z * [new branch] gh/ysiraichi/88/head -> origin/gh/ysiraichi/88/head 2025-09-07T07:55:56.8133670Z * [new branch] gh/ysiraichi/88/orig -> origin/gh/ysiraichi/88/orig 2025-09-07T07:55:56.8137268Z * [new branch] gh/zhxchen17/25/base -> origin/gh/zhxchen17/25/base 2025-09-07T07:55:56.8138803Z * [new branch] gh/zhxchen17/25/head -> origin/gh/zhxchen17/25/head 2025-09-07T07:55:56.8140363Z * [new branch] gh/zhxchen17/25/orig -> origin/gh/zhxchen17/25/orig 2025-09-07T07:55:56.8142785Z * [new branch] gh/zhxchen17/31/base -> origin/gh/zhxchen17/31/base 2025-09-07T07:55:56.8144698Z * [new branch] gh/zhxchen17/31/head -> origin/gh/zhxchen17/31/head 2025-09-07T07:55:56.8146248Z * [new branch] gh/zhxchen17/31/orig -> origin/gh/zhxchen17/31/orig 2025-09-07T07:55:56.8148609Z * [new branch] gh/zhxchen17/34/base -> origin/gh/zhxchen17/34/base 2025-09-07T07:55:56.8150206Z * [new branch] gh/zhxchen17/34/head -> origin/gh/zhxchen17/34/head 2025-09-07T07:55:56.8152476Z * [new branch] gh/zhxchen17/35/base -> origin/gh/zhxchen17/35/base 2025-09-07T07:55:56.8154097Z * [new branch] gh/zhxchen17/35/head -> origin/gh/zhxchen17/35/head 2025-09-07T07:55:56.8157021Z * [new branch] gh/zhxchen17/37/base -> origin/gh/zhxchen17/37/base 2025-09-07T07:55:56.8158666Z * [new branch] gh/zhxchen17/37/head -> origin/gh/zhxchen17/37/head 2025-09-07T07:55:56.8160277Z * [new branch] gh/zhxchen17/37/orig -> origin/gh/zhxchen17/37/orig 2025-09-07T07:55:56.8162763Z * [new branch] gh/zhxchen17/38/base -> origin/gh/zhxchen17/38/base 2025-09-07T07:55:56.8164653Z * [new branch] gh/zhxchen17/38/head -> origin/gh/zhxchen17/38/head 2025-09-07T07:55:56.8166285Z * [new branch] gh/zhxchen17/38/orig -> origin/gh/zhxchen17/38/orig 2025-09-07T07:55:56.8168553Z * [new branch] gh/zhxchen17/39/base -> origin/gh/zhxchen17/39/base 2025-09-07T07:55:56.8170123Z * [new branch] gh/zhxchen17/39/head -> origin/gh/zhxchen17/39/head 2025-09-07T07:55:56.8171672Z * [new branch] gh/zhxchen17/39/orig -> origin/gh/zhxchen17/39/orig 2025-09-07T07:55:56.8174330Z * [new branch] gh/zhxchen17/40/base -> origin/gh/zhxchen17/40/base 2025-09-07T07:55:56.8176103Z * [new branch] gh/zhxchen17/40/head -> origin/gh/zhxchen17/40/head 2025-09-07T07:55:56.8177771Z * [new branch] gh/zhxchen17/40/orig -> origin/gh/zhxchen17/40/orig 2025-09-07T07:55:56.8180163Z * [new branch] gh/zhxchen17/41/base -> origin/gh/zhxchen17/41/base 2025-09-07T07:55:56.8181856Z * [new branch] gh/zhxchen17/41/head -> origin/gh/zhxchen17/41/head 2025-09-07T07:55:56.8183848Z * [new branch] gh/zhxchen17/41/orig -> origin/gh/zhxchen17/41/orig 2025-09-07T07:55:56.8186663Z * [new branch] gh/zhxchen17/42/base -> origin/gh/zhxchen17/42/base 2025-09-07T07:55:56.8188375Z * [new branch] gh/zhxchen17/42/head -> origin/gh/zhxchen17/42/head 2025-09-07T07:55:56.8190128Z * [new branch] gh/zhxchen17/42/orig -> origin/gh/zhxchen17/42/orig 2025-09-07T07:55:56.8192553Z * [new branch] gh/zhxchen17/43/base -> origin/gh/zhxchen17/43/base 2025-09-07T07:55:56.8194512Z * [new branch] gh/zhxchen17/43/head -> origin/gh/zhxchen17/43/head 2025-09-07T07:55:56.8196152Z * [new branch] gh/zhxchen17/43/orig -> origin/gh/zhxchen17/43/orig 2025-09-07T07:55:56.8198972Z * [new branch] gh/zhxchen17/44/base -> origin/gh/zhxchen17/44/base 2025-09-07T07:55:56.8200437Z * [new branch] gh/zhxchen17/44/head -> origin/gh/zhxchen17/44/head 2025-09-07T07:55:56.8202208Z * [new branch] gh/zhxchen17/44/orig -> origin/gh/zhxchen17/44/orig 2025-09-07T07:55:56.8204667Z * [new branch] gh/zhxchen17/45/base -> origin/gh/zhxchen17/45/base 2025-09-07T07:55:56.8206410Z * [new branch] gh/zhxchen17/45/head -> origin/gh/zhxchen17/45/head 2025-09-07T07:55:56.8207946Z * [new branch] gh/zhxchen17/45/orig -> origin/gh/zhxchen17/45/orig 2025-09-07T07:55:56.8211069Z * [new branch] gh/zklaus/10/base -> origin/gh/zklaus/10/base 2025-09-07T07:55:56.8212676Z * [new branch] gh/zklaus/10/head -> origin/gh/zklaus/10/head 2025-09-07T07:55:56.8214487Z * [new branch] gh/zklaus/10/orig -> origin/gh/zklaus/10/orig 2025-09-07T07:55:56.8216983Z * [new branch] gh/zklaus/11/base -> origin/gh/zklaus/11/base 2025-09-07T07:55:56.8218511Z * [new branch] gh/zklaus/11/head -> origin/gh/zklaus/11/head 2025-09-07T07:55:56.8220060Z * [new branch] gh/zklaus/11/orig -> origin/gh/zklaus/11/orig 2025-09-07T07:55:56.8222454Z * [new branch] gh/zklaus/12/base -> origin/gh/zklaus/12/base 2025-09-07T07:55:56.8224106Z * [new branch] gh/zklaus/12/head -> origin/gh/zklaus/12/head 2025-09-07T07:55:56.8225886Z * [new branch] gh/zklaus/12/orig -> origin/gh/zklaus/12/orig 2025-09-07T07:55:56.8228445Z * [new branch] gh/zklaus/14/base -> origin/gh/zklaus/14/base 2025-09-07T07:55:56.8230003Z * [new branch] gh/zklaus/14/head -> origin/gh/zklaus/14/head 2025-09-07T07:55:56.8231587Z * [new branch] gh/zklaus/14/orig -> origin/gh/zklaus/14/orig 2025-09-07T07:55:56.8234133Z * [new branch] gh/zklaus/15/base -> origin/gh/zklaus/15/base 2025-09-07T07:55:56.8236018Z * [new branch] gh/zklaus/15/head -> origin/gh/zklaus/15/head 2025-09-07T07:55:56.8237580Z * [new branch] gh/zklaus/15/orig -> origin/gh/zklaus/15/orig 2025-09-07T07:55:56.8240035Z * [new branch] gh/zklaus/16/base -> origin/gh/zklaus/16/base 2025-09-07T07:55:56.8241642Z * [new branch] gh/zklaus/16/head -> origin/gh/zklaus/16/head 2025-09-07T07:55:56.8243129Z * [new branch] gh/zklaus/16/orig -> origin/gh/zklaus/16/orig 2025-09-07T07:55:56.8245988Z * [new branch] gh/zklaus/17/base -> origin/gh/zklaus/17/base 2025-09-07T07:55:56.8247506Z * [new branch] gh/zklaus/17/head -> origin/gh/zklaus/17/head 2025-09-07T07:55:56.8248994Z * [new branch] gh/zklaus/17/orig -> origin/gh/zklaus/17/orig 2025-09-07T07:55:56.8251273Z * [new branch] gh/zklaus/18/base -> origin/gh/zklaus/18/base 2025-09-07T07:55:56.8255735Z * [new branch] gh/zklaus/18/head -> origin/gh/zklaus/18/head 2025-09-07T07:55:56.8257228Z * [new branch] gh/zklaus/18/orig -> origin/gh/zklaus/18/orig 2025-09-07T07:55:56.8259761Z * [new branch] gh/zklaus/19/base -> origin/gh/zklaus/19/base 2025-09-07T07:55:56.8261475Z * [new branch] gh/zklaus/19/head -> origin/gh/zklaus/19/head 2025-09-07T07:55:56.8263019Z * [new branch] gh/zklaus/19/orig -> origin/gh/zklaus/19/orig 2025-09-07T07:55:56.8265727Z * [new branch] gh/zklaus/20/base -> origin/gh/zklaus/20/base 2025-09-07T07:55:56.8267233Z * [new branch] gh/zklaus/20/head -> origin/gh/zklaus/20/head 2025-09-07T07:55:56.8268832Z * [new branch] gh/zklaus/20/orig -> origin/gh/zklaus/20/orig 2025-09-07T07:55:56.8271148Z * [new branch] gh/zklaus/7/base -> origin/gh/zklaus/7/base 2025-09-07T07:55:56.8272779Z * [new branch] gh/zklaus/7/head -> origin/gh/zklaus/7/head 2025-09-07T07:55:56.8274781Z * [new branch] gh/zklaus/7/orig -> origin/gh/zklaus/7/orig 2025-09-07T07:55:56.8277051Z * [new branch] gh/zklaus/9/base -> origin/gh/zklaus/9/base 2025-09-07T07:55:56.8278703Z * [new branch] gh/zklaus/9/head -> origin/gh/zklaus/9/head 2025-09-07T07:55:56.8280224Z * [new branch] gh/zklaus/9/orig -> origin/gh/zklaus/9/orig 2025-09-07T07:55:56.8283390Z * [new branch] gh/zou3519/1175/base -> origin/gh/zou3519/1175/base 2025-09-07T07:55:56.8285341Z * [new branch] gh/zou3519/1175/head -> origin/gh/zou3519/1175/head 2025-09-07T07:55:56.8286840Z * [new branch] gh/zou3519/1175/orig -> origin/gh/zou3519/1175/orig 2025-09-07T07:55:56.8289152Z * [new branch] gh/zou3519/1177/base -> origin/gh/zou3519/1177/base 2025-09-07T07:55:56.8290741Z * [new branch] gh/zou3519/1177/head -> origin/gh/zou3519/1177/head 2025-09-07T07:55:56.8292315Z * [new branch] gh/zou3519/1177/orig -> origin/gh/zou3519/1177/orig 2025-09-07T07:55:56.8295154Z * [new branch] gh/zou3519/1191/base -> origin/gh/zou3519/1191/base 2025-09-07T07:55:56.8296879Z * [new branch] gh/zou3519/1191/head -> origin/gh/zou3519/1191/head 2025-09-07T07:55:56.8298437Z * [new branch] gh/zou3519/1191/orig -> origin/gh/zou3519/1191/orig 2025-09-07T07:55:56.8301013Z * [new branch] gh/zou3519/1192/base -> origin/gh/zou3519/1192/base 2025-09-07T07:55:56.8302593Z * [new branch] gh/zou3519/1192/head -> origin/gh/zou3519/1192/head 2025-09-07T07:55:56.8304274Z * [new branch] gh/zou3519/1192/orig -> origin/gh/zou3519/1192/orig 2025-09-07T07:55:56.8306606Z * [new branch] gh/zou3519/1193/base -> origin/gh/zou3519/1193/base 2025-09-07T07:55:56.8308298Z * [new branch] gh/zou3519/1193/head -> origin/gh/zou3519/1193/head 2025-09-07T07:55:56.8309836Z * [new branch] gh/zou3519/1193/orig -> origin/gh/zou3519/1193/orig 2025-09-07T07:55:56.8312065Z * [new branch] gh/zou3519/1194/base -> origin/gh/zou3519/1194/base 2025-09-07T07:55:56.8313829Z * [new branch] gh/zou3519/1194/head -> origin/gh/zou3519/1194/head 2025-09-07T07:55:56.8315724Z * [new branch] gh/zou3519/1194/orig -> origin/gh/zou3519/1194/orig 2025-09-07T07:55:56.8318153Z * [new branch] gh/zou3519/1195/base -> origin/gh/zou3519/1195/base 2025-09-07T07:55:56.8319774Z * [new branch] gh/zou3519/1195/head -> origin/gh/zou3519/1195/head 2025-09-07T07:55:56.8321372Z * [new branch] gh/zou3519/1195/orig -> origin/gh/zou3519/1195/orig 2025-09-07T07:55:56.8323662Z * [new branch] gh/zou3519/1196/base -> origin/gh/zou3519/1196/base 2025-09-07T07:55:56.8325698Z * [new branch] gh/zou3519/1196/head -> origin/gh/zou3519/1196/head 2025-09-07T07:55:56.8327401Z * [new branch] gh/zou3519/1196/orig -> origin/gh/zou3519/1196/orig 2025-09-07T07:55:56.8329682Z * [new branch] gh/zou3519/1197/base -> origin/gh/zou3519/1197/base 2025-09-07T07:55:56.8331334Z * [new branch] gh/zou3519/1197/head -> origin/gh/zou3519/1197/head 2025-09-07T07:55:56.8332921Z * [new branch] gh/zou3519/1197/orig -> origin/gh/zou3519/1197/orig 2025-09-07T07:55:56.8336575Z * [new branch] gh/zpcore/1/base -> origin/gh/zpcore/1/base 2025-09-07T07:55:56.8338148Z * [new branch] gh/zpcore/1/head -> origin/gh/zpcore/1/head 2025-09-07T07:55:56.8340614Z * [new branch] gh/zpcore/10/base -> origin/gh/zpcore/10/base 2025-09-07T07:55:56.8342032Z * [new branch] gh/zpcore/10/head -> origin/gh/zpcore/10/head 2025-09-07T07:55:56.8343909Z * [new branch] gh/zpcore/10/orig -> origin/gh/zpcore/10/orig 2025-09-07T07:55:56.8346590Z * [new branch] gh/zpcore/11/base -> origin/gh/zpcore/11/base 2025-09-07T07:55:56.8348020Z * [new branch] gh/zpcore/11/head -> origin/gh/zpcore/11/head 2025-09-07T07:55:56.8349583Z * [new branch] gh/zpcore/11/orig -> origin/gh/zpcore/11/orig 2025-09-07T07:55:56.8352132Z * [new branch] gh/zpcore/12/base -> origin/gh/zpcore/12/base 2025-09-07T07:55:56.8354060Z * [new branch] gh/zpcore/12/head -> origin/gh/zpcore/12/head 2025-09-07T07:55:56.8355903Z * [new branch] gh/zpcore/12/orig -> origin/gh/zpcore/12/orig 2025-09-07T07:55:56.8358409Z * [new branch] gh/zpcore/13/base -> origin/gh/zpcore/13/base 2025-09-07T07:55:56.8360101Z * [new branch] gh/zpcore/13/head -> origin/gh/zpcore/13/head 2025-09-07T07:55:56.8361620Z * [new branch] gh/zpcore/13/orig -> origin/gh/zpcore/13/orig 2025-09-07T07:55:56.8364193Z * [new branch] gh/zpcore/14/base -> origin/gh/zpcore/14/base 2025-09-07T07:55:56.8365890Z * [new branch] gh/zpcore/14/head -> origin/gh/zpcore/14/head 2025-09-07T07:55:56.8368181Z * [new branch] gh/zpcore/2/base -> origin/gh/zpcore/2/base 2025-09-07T07:55:56.8369784Z * [new branch] gh/zpcore/2/head -> origin/gh/zpcore/2/head 2025-09-07T07:55:56.8372052Z * [new branch] gh/zpcore/3/base -> origin/gh/zpcore/3/base 2025-09-07T07:55:56.8373658Z * [new branch] gh/zpcore/3/head -> origin/gh/zpcore/3/head 2025-09-07T07:55:56.8376254Z * [new branch] gh/zpcore/4/base -> origin/gh/zpcore/4/base 2025-09-07T07:55:56.8377745Z * [new branch] gh/zpcore/4/head -> origin/gh/zpcore/4/head 2025-09-07T07:55:56.8380010Z * [new branch] gh/zpcore/5/base -> origin/gh/zpcore/5/base 2025-09-07T07:55:56.8381517Z * [new branch] gh/zpcore/5/head -> origin/gh/zpcore/5/head 2025-09-07T07:55:56.8383910Z * [new branch] gh/zpcore/6/base -> origin/gh/zpcore/6/base 2025-09-07T07:55:56.8385611Z * [new branch] gh/zpcore/6/head -> origin/gh/zpcore/6/head 2025-09-07T07:55:56.8387730Z * [new branch] gh/zpcore/7/base -> origin/gh/zpcore/7/base 2025-09-07T07:55:56.8389203Z * [new branch] gh/zpcore/7/head -> origin/gh/zpcore/7/head 2025-09-07T07:55:56.8391588Z * [new branch] gh/zpcore/8/base -> origin/gh/zpcore/8/base 2025-09-07T07:55:56.8393086Z * [new branch] gh/zpcore/8/head -> origin/gh/zpcore/8/head 2025-09-07T07:55:56.8395617Z * [new branch] google-main -> origin/google-main 2025-09-07T07:55:56.8398204Z * [new branch] guangyey/external_stream -> origin/guangyey/external_stream 2025-09-07T07:55:56.8399622Z * [new branch] guangyey/host_alloc -> origin/guangyey/host_alloc 2025-09-07T07:55:56.8401064Z * [new branch] guangyey/reimport -> origin/guangyey/reimport 2025-09-07T07:55:56.8402602Z * [new branch] guangyey/test_2025 -> origin/guangyey/test_2025 2025-09-07T07:55:56.8405700Z * [new branch] guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9 2025-09-07T07:55:56.8408062Z * [new branch] haozhe/bf16-dynamic-shape -> origin/haozhe/bf16-dynamic-shape 2025-09-07T07:55:56.8409850Z * [new branch] hc_baseline -> origin/hc_baseline 2025-09-07T07:55:56.8411770Z * [new branch] hf_update -> origin/hf_update 2025-09-07T07:55:56.8413468Z * [new branch] hhh_decomp_mul -> origin/hhh_decomp_mul 2025-09-07T07:55:56.8415776Z * [new branch] hhh_rand -> origin/hhh_rand 2025-09-07T07:55:56.8418081Z * [new branch] hoy/mmsplitk -> origin/hoy/mmsplitk 2025-09-07T07:55:56.8419588Z * [new branch] hoy/triton-PR3973 -> origin/hoy/triton-PR3973 2025-09-07T07:55:56.8421230Z * [new branch] hoy/triton-coalescing-baseline -> origin/hoy/triton-coalescing-baseline 2025-09-07T07:55:56.8422676Z * [new branch] hoy/triton-coalescing-new -> origin/hoy/triton-coalescing-new 2025-09-07T07:55:56.8424386Z * [new branch] hoy/triton-coalescing-vec -> origin/hoy/triton-coalescing-vec 2025-09-07T07:55:56.8426414Z * [new branch] inductordecompfix -> origin/inductordecompfix 2025-09-07T07:55:56.8428270Z * [new branch] inline -> origin/inline 2025-09-07T07:55:56.8430032Z * [new branch] inlining -> origin/inlining 2025-09-07T07:55:56.8431946Z * [new branch] inlining-ezyang -> origin/inlining-ezyang 2025-09-07T07:55:56.8434036Z * [new branch] install-torchao-0.13.0 -> origin/install-torchao-0.13.0 2025-09-07T07:55:56.8436009Z * [new branch] int8_sdpa -> origin/int8_sdpa 2025-09-07T07:55:56.8437854Z * [new branch] invoke-subgraph -> origin/invoke-subgraph 2025-09-07T07:55:56.8439825Z * [new branch] issue#58739 -> origin/issue#58739 2025-09-07T07:55:56.8442485Z * [new branch] jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2 2025-09-07T07:55:56.8444130Z * [new branch] jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2 2025-09-07T07:55:56.8446787Z * [new branch] jeanschmidt/disable_rocm_build_tests -> origin/jeanschmidt/disable_rocm_build_tests 2025-09-07T07:55:56.8448555Z * [new branch] jithunnair-amd-patch-1 -> origin/jithunnair-amd-patch-1 2025-09-07T07:55:56.8450391Z * [new branch] jithunnair-amd-patch-2 -> origin/jithunnair-amd-patch-2 2025-09-07T07:55:56.8452779Z * [new branch] justinchu/attention-tests -> origin/justinchu/attention-tests 2025-09-07T07:55:56.8454586Z * [new branch] justinchu/native-qdq -> origin/justinchu/native-qdq 2025-09-07T07:55:56.8456202Z * [new branch] justinchu/ort-122 -> origin/justinchu/ort-122 2025-09-07T07:55:56.8458775Z * [new branch] justinchuby/dynamo-true -> origin/justinchuby/dynamo-true 2025-09-07T07:55:56.8461190Z * [new branch] kainan666/xlf_debug -> origin/kainan666/xlf_debug 2025-09-07T07:55:56.8463004Z * [new branch] kainan_test -> origin/kainan_test 2025-09-07T07:55:56.8465121Z * [new branch] learnablebias -> origin/learnablebias 2025-09-07T07:55:56.8467635Z * [new branch] leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues 2025-09-07T07:55:56.8470101Z * [new branch] lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error 2025-09-07T07:55:56.8472541Z * [new branch] liaoxuan/shm_all_reduce -> origin/liaoxuan/shm_all_reduce 2025-09-07T07:55:56.8474278Z * [new branch] liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax 2025-09-07T07:55:56.8475902Z * [new branch] liaoxuan/test_int8_sdpa -> origin/liaoxuan/test_int8_sdpa 2025-09-07T07:55:56.8477797Z * [new branch] lintbuilddocker -> origin/lintbuilddocker 2025-09-07T07:55:56.8479530Z * [new branch] llama4-stable -> origin/llama4-stable 2025-09-07T07:55:56.8481463Z * [new branch] logdetfix -> origin/logdetfix 2025-09-07T07:55:56.8485186Z * [new branch] lts/release/1.8 -> origin/lts/release/1.8 2025-09-07T07:55:56.8487934Z * [new branch] lucaskabela/#94773 -> origin/lucaskabela/#94773 2025-09-07T07:55:56.8489347Z * [new branch] lucaskabela/flop_counter -> origin/lucaskabela/flop_counter 2025-09-07T07:55:56.8490899Z * [new branch] lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp 2025-09-07T07:55:56.8492468Z * [new branch] lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo 2025-09-07T07:55:56.8494192Z * [new branch] lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr 2025-09-07T07:55:56.8495814Z * [new branch] lucaskabela/issue_120648 -> origin/lucaskabela/issue_120648 2025-09-07T07:55:56.8497328Z * [new branch] lucaskabela/misc_typing_dynamo -> origin/lucaskabela/misc_typing_dynamo 2025-09-07T07:55:56.8498853Z * [new branch] lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr 2025-09-07T07:55:56.8500382Z * [new branch] lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata 2025-09-07T07:55:56.8501804Z * [new branch] lucaskabela/rnn_decomp -> origin/lucaskabela/rnn_decomp 2025-09-07T07:55:56.8503262Z * [new branch] lucaskabela/typing_backends -> origin/lucaskabela/typing_backends 2025-09-07T07:55:56.8505174Z * [new branch] lucaskabela/typing_symbolic_convert -> origin/lucaskabela/typing_symbolic_convert 2025-09-07T07:55:56.8506660Z * [new branch] lucaskabela/typing_utils_improvements -> origin/lucaskabela/typing_utils_improvements 2025-09-07T07:55:56.8508432Z * [new branch] main -> origin/main 2025-09-07T07:55:56.8510421Z * [new branch] main-enable-b200-distributed-tests -> origin/main-enable-b200-distributed-tests 2025-09-07T07:55:56.8512268Z * [new branch] malfet-patch-1 -> origin/malfet-patch-1 2025-09-07T07:55:56.8514198Z * [new branch] malfet-patch-12 -> origin/malfet-patch-12 2025-09-07T07:55:56.8516385Z * [new branch] malfet-patch-14 -> origin/malfet-patch-14 2025-09-07T07:55:56.8518454Z * [new branch] malfet-patch-6 -> origin/malfet-patch-6 2025-09-07T07:55:56.8520305Z * [new branch] malfet-patch-8 -> origin/malfet-patch-8 2025-09-07T07:55:56.8523231Z * [new branch] malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch 2025-09-07T07:55:56.8525037Z * [new branch] malfet/delete-upsteam-cuda -> origin/malfet/delete-upsteam-cuda 2025-09-07T07:55:56.8526533Z * [new branch] malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im 2025-09-07T07:55:56.8529073Z * [new branch] manuel/test-ops-common-allow-mps -> origin/manuel/test-ops-common-allow-mps 2025-09-07T07:55:56.8530827Z * [new branch] metascroy-patch-1 -> origin/metascroy-patch-1 2025-09-07T07:55:56.8533339Z * [new branch] mlazos/S429861-debug -> origin/mlazos/S429861-debug 2025-09-07T07:55:56.8535104Z * [new branch] mlazos/aa -> origin/mlazos/aa 2025-09-07T07:55:56.8536571Z * [new branch] mlazos/arg-renames -> origin/mlazos/arg-renames 2025-09-07T07:55:56.8538135Z * [new branch] mlazos/backup-test-branch -> origin/mlazos/backup-test-branch 2025-09-07T07:55:56.8539625Z * [new branch] mlazos/bad-cudagraphs -> origin/mlazos/bad-cudagraphs 2025-09-07T07:55:56.8541189Z * [new branch] mlazos/baseline -> origin/mlazos/baseline 2025-09-07T07:55:56.8542721Z * [new branch] mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks 2025-09-07T07:55:56.8544279Z * [new branch] mlazos/beta-tensor -> origin/mlazos/beta-tensor 2025-09-07T07:55:56.8546159Z * [new branch] mlazos/better-msg -> origin/mlazos/better-msg 2025-09-07T07:55:56.8547517Z * [new branch] mlazos/buffers -> origin/mlazos/buffers 2025-09-07T07:55:56.8548888Z * [new branch] mlazos/buffers2 -> origin/mlazos/buffers2 2025-09-07T07:55:56.8550474Z * [new branch] mlazos/buffers3 -> origin/mlazos/buffers3 2025-09-07T07:55:56.8552353Z * [new branch] mlazos/ck2 -> origin/mlazos/ck2 2025-09-07T07:55:56.8554110Z * [new branch] mlazos/combokernels -> origin/mlazos/combokernels 2025-09-07T07:55:56.8555841Z * [new branch] mlazos/ctx-cleanup -> origin/mlazos/ctx-cleanup 2025-09-07T07:55:56.8557231Z * [new branch] mlazos/cuda-cmd-log -> origin/mlazos/cuda-cmd-log 2025-09-07T07:55:56.8559027Z * [new branch] mlazos/cudagraph-tests -> origin/mlazos/cudagraph-tests 2025-09-07T07:55:56.8560625Z * [new branch] mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement 2025-09-07T07:55:56.8562228Z * [new branch] mlazos/cutlass-test -> origin/mlazos/cutlass-test 2025-09-07T07:55:56.8563870Z * [new branch] mlazos/cutlass-topo-bug -> origin/mlazos/cutlass-topo-bug 2025-09-07T07:55:56.8565757Z * [new branch] mlazos/data-gather -> origin/mlazos/data-gather 2025-09-07T07:55:56.8567205Z * [new branch] mlazos/data-ptrs2 -> origin/mlazos/data-ptrs2 2025-09-07T07:55:56.8568765Z * [new branch] mlazos/data-ptrs3 -> origin/mlazos/data-ptrs3 2025-09-07T07:55:56.8570321Z * [new branch] mlazos/dataclass-proxy -> origin/mlazos/dataclass-proxy 2025-09-07T07:55:56.8571888Z * [new branch] mlazos/dc-attrs -> origin/mlazos/dc-attrs 2025-09-07T07:55:56.8573399Z * [new branch] mlazos/dc-helion -> origin/mlazos/dc-helion 2025-09-07T07:55:56.8575156Z * [new branch] mlazos/dict-fix -> origin/mlazos/dict-fix 2025-09-07T07:55:56.8576701Z * [new branch] mlazos/disable-closures -> origin/mlazos/disable-closures 2025-09-07T07:55:56.8578236Z * [new branch] mlazos/disable-tf -> origin/mlazos/disable-tf 2025-09-07T07:55:56.8579751Z * [new branch] mlazos/dupe-fix -> origin/mlazos/dupe-fix 2025-09-07T07:55:56.8581316Z * [new branch] mlazos/dyn-batch -> origin/mlazos/dyn-batch 2025-09-07T07:55:56.8582940Z * [new branch] mlazos/evt -> origin/mlazos/evt 2025-09-07T07:55:56.8584930Z * [new branch] mlazos/exp_disable -> origin/mlazos/exp_disable 2025-09-07T07:55:56.8586451Z * [new branch] mlazos/extract-examples -> origin/mlazos/extract-examples 2025-09-07T07:55:56.8587987Z * [new branch] mlazos/foreach-op -> origin/mlazos/foreach-op 2025-09-07T07:55:56.8589526Z * [new branch] mlazos/fp8 -> origin/mlazos/fp8 2025-09-07T07:55:56.8591113Z * [new branch] mlazos/fp8-bias -> origin/mlazos/fp8-bias 2025-09-07T07:55:56.8592723Z * [new branch] mlazos/fp8-bias-fusion -> origin/mlazos/fp8-bias-fusion 2025-09-07T07:55:56.8594241Z * [new branch] mlazos/fp8-fixes -> origin/mlazos/fp8-fixes 2025-09-07T07:55:56.8596175Z * [new branch] mlazos/freezing -> origin/mlazos/freezing 2025-09-07T07:55:56.8597918Z * [new branch] mlazos/h-comp -> origin/mlazos/h-comp 2025-09-07T07:55:56.8599577Z * [new branch] mlazos/h-comp2 -> origin/mlazos/h-comp2 2025-09-07T07:55:56.8601190Z * [new branch] mlazos/hash-hop -> origin/mlazos/hash-hop 2025-09-07T07:55:56.8602821Z * [new branch] mlazos/hc -> origin/mlazos/hc 2025-09-07T07:55:56.8604972Z * [new branch] mlazos/hc-cycles -> origin/mlazos/hc-cycles 2025-09-07T07:55:56.8606424Z * [new branch] mlazos/hc-fixes -> origin/mlazos/hc-fixes 2025-09-07T07:55:56.8608114Z * [new branch] mlazos/hc-fixes3 -> origin/mlazos/hc-fixes3 2025-09-07T07:55:56.8609859Z * [new branch] mlazos/hc-fixes4 -> origin/mlazos/hc-fixes4 2025-09-07T07:55:56.8611424Z * [new branch] mlazos/hc-hf -> origin/mlazos/hc-hf 2025-09-07T07:55:56.8612968Z * [new branch] mlazos/hc-mut -> origin/mlazos/hc-mut 2025-09-07T07:55:56.8614932Z * [new branch] mlazos/hc10 -> origin/mlazos/hc10 2025-09-07T07:55:56.8616601Z * [new branch] mlazos/hc11 -> origin/mlazos/hc11 2025-09-07T07:55:56.8618166Z * [new branch] mlazos/hc12 -> origin/mlazos/hc12 2025-09-07T07:55:56.8619737Z * [new branch] mlazos/hc13 -> origin/mlazos/hc13 2025-09-07T07:55:56.8621354Z * [new branch] mlazos/hc14 -> origin/mlazos/hc14 2025-09-07T07:55:56.8622949Z * [new branch] mlazos/hc15 -> origin/mlazos/hc15 2025-09-07T07:55:56.8624989Z * [new branch] mlazos/hc2 -> origin/mlazos/hc2 2025-09-07T07:55:56.8626609Z * [new branch] mlazos/hc4 -> origin/mlazos/hc4 2025-09-07T07:55:56.8628258Z * [new branch] mlazos/hc5 -> origin/mlazos/hc5 2025-09-07T07:55:56.8629823Z * [new branch] mlazos/hc6 -> origin/mlazos/hc6 2025-09-07T07:55:56.8631459Z * [new branch] mlazos/hc7 -> origin/mlazos/hc7 2025-09-07T07:55:56.8632979Z * [new branch] mlazos/hc8 -> origin/mlazos/hc8 2025-09-07T07:55:56.8635085Z * [new branch] mlazos/hc9 -> origin/mlazos/hc9 2025-09-07T07:55:56.8636763Z * [new branch] mlazos/hc_baseline2 -> origin/mlazos/hc_baseline2 2025-09-07T07:55:56.8638562Z * [new branch] mlazos/init-per-param -> origin/mlazos/init-per-param 2025-09-07T07:55:56.8640201Z * [new branch] mlazos/init_per_param -> origin/mlazos/init_per_param 2025-09-07T07:55:56.8641867Z * [new branch] mlazos/less-guards -> origin/mlazos/less-guards 2025-09-07T07:55:56.8643565Z * [new branch] mlazos/lr-composibility -> origin/mlazos/lr-composibility 2025-09-07T07:55:56.8645417Z * [new branch] mlazos/main -> origin/mlazos/main 2025-09-07T07:55:56.8647162Z * [new branch] mlazos/main-test-enablement -> origin/mlazos/main-test-enablement 2025-09-07T07:55:56.8648771Z * [new branch] mlazos/main2 -> origin/mlazos/main2 2025-09-07T07:55:56.8650540Z * [new branch] mlazos/mark-static-update -> origin/mlazos/mark-static-update 2025-09-07T07:55:56.8652177Z * [new branch] mlazos/mcg -> origin/mlazos/mcg 2025-09-07T07:55:56.8653997Z * [new branch] mlazos/mcg2 -> origin/mlazos/mcg2 2025-09-07T07:55:56.8655926Z * [new branch] mlazos/meta-guards -> origin/mlazos/meta-guards 2025-09-07T07:55:56.8658139Z * [new branch] mlazos/mlazos/ck2 -> origin/mlazos/mlazos/ck2 2025-09-07T07:55:56.8659792Z * [new branch] mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam 2025-09-07T07:55:56.8661353Z * [new branch] mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup 2025-09-07T07:55:56.8663033Z * [new branch] mlazos/mod-fix -> origin/mlazos/mod-fix 2025-09-07T07:55:56.8665198Z * [new branch] mlazos/mode-fix -> origin/mlazos/mode-fix 2025-09-07T07:55:56.8667011Z * [new branch] mlazos/more-tests -> origin/mlazos/more-tests 2025-09-07T07:55:56.8668607Z * [new branch] mlazos/no-cpp -> origin/mlazos/no-cpp 2025-09-07T07:55:56.8670391Z * [new branch] mlazos/no-init-group-handling -> origin/mlazos/no-init-group-handling 2025-09-07T07:55:56.8672045Z * [new branch] mlazos/offsets -> origin/mlazos/offsets 2025-09-07T07:55:56.8673969Z * [new branch] mlazos/opt-bench-exp2 -> origin/mlazos/opt-bench-exp2 2025-09-07T07:55:56.8675869Z * [new branch] mlazos/opt-incr -> origin/mlazos/opt-incr 2025-09-07T07:55:56.8677549Z * [new branch] mlazos/proxy-ctors -> origin/mlazos/proxy-ctors 2025-09-07T07:55:56.8679350Z * [new branch] mlazos/quant-fix -> origin/mlazos/quant-fix 2025-09-07T07:55:56.8681075Z * [new branch] mlazos/resnet-fix -> origin/mlazos/resnet-fix 2025-09-07T07:55:56.8682815Z * [new branch] mlazos/revert-inline -> origin/mlazos/revert-inline 2025-09-07T07:55:56.8684865Z * [new branch] mlazos/rm-buf-names -> origin/mlazos/rm-buf-names 2025-09-07T07:55:56.8686424Z * [new branch] mlazos/rm-code -> origin/mlazos/rm-code 2025-09-07T07:55:56.8688082Z * [new branch] mlazos/rm-spam -> origin/mlazos/rm-spam 2025-09-07T07:55:56.8689904Z * [new branch] mlazos/rtp -> origin/mlazos/rtp 2025-09-07T07:55:56.8691646Z * [new branch] mlazos/static-idx-dbg -> origin/mlazos/static-idx-dbg 2025-09-07T07:55:56.8693342Z * [new branch] mlazos/static-inputs-log -> origin/mlazos/static-inputs-log 2025-09-07T07:55:56.8695582Z * [new branch] mlazos/sub-param-fix -> origin/mlazos/sub-param-fix 2025-09-07T07:55:56.8697294Z * [new branch] mlazos/td-fix2 -> origin/mlazos/td-fix2 2025-09-07T07:55:56.8699067Z * [new branch] mlazos/tensor-hasattr2 -> origin/mlazos/tensor-hasattr2 2025-09-07T07:55:56.8700756Z * [new branch] mlazos/test -> origin/mlazos/test 2025-09-07T07:55:56.8702537Z * [new branch] mlazos/tf-mode -> origin/mlazos/tf-mode 2025-09-07T07:55:56.8704613Z * [new branch] mlazos/tf-mode-backup2 -> origin/mlazos/tf-mode-backup2 2025-09-07T07:55:56.8706407Z * [new branch] mlazos/tf-mode-reland -> origin/mlazos/tf-mode-reland 2025-09-07T07:55:56.8708235Z * [new branch] mlazos/tf-mode-reland2 -> origin/mlazos/tf-mode-reland2 2025-09-07T07:55:56.8710045Z * [new branch] mlazos/tf-mode-reland3 -> origin/mlazos/tf-mode-reland3 2025-09-07T07:55:56.8711789Z * [new branch] mlazos/topo-fix -> origin/mlazos/topo-fix 2025-09-07T07:55:56.8713557Z * [new branch] mlazos/triton-no-epi -> origin/mlazos/triton-no-epi 2025-09-07T07:55:56.8715736Z * [new branch] mlazos/tune-proto -> origin/mlazos/tune-proto 2025-09-07T07:55:56.8717522Z * [new branch] mlazos/tuple-fixes -> origin/mlazos/tuple-fixes 2025-09-07T07:55:56.8719259Z * [new branch] mlazos/tuple-fixes2 -> origin/mlazos/tuple-fixes2 2025-09-07T07:55:56.8721010Z * [new branch] mlazos/tuple-handling -> origin/mlazos/tuple-handling 2025-09-07T07:55:56.8722866Z * [new branch] mlazos/user-streams -> origin/mlazos/user-streams 2025-09-07T07:55:56.8724986Z * [new branch] mlazos/vary-beta -> origin/mlazos/vary-beta 2025-09-07T07:55:56.8726792Z * [new branch] mlazos/vary-beta2 -> origin/mlazos/vary-beta2 2025-09-07T07:55:56.8728524Z * [new branch] mlazos/weird-perf1 -> origin/mlazos/weird-perf1 2025-09-07T07:55:56.8730328Z * [new branch] mm_out_dtype_compile -> origin/mm_out_dtype_compile 2025-09-07T07:55:56.8732332Z * [new branch] modify-setupvllm -> origin/modify-setupvllm 2025-09-07T07:55:56.8734150Z * [new branch] module-shim -> origin/module-shim 2025-09-07T07:55:56.8736221Z * [new branch] move-theme-out-docker -> origin/move-theme-out-docker 2025-09-07T07:55:56.8738724Z * [new branch] msaroufim/be1 -> origin/msaroufim/be1 2025-09-07T07:55:56.8740341Z * [new branch] msaroufim/cn_path -> origin/msaroufim/cn_path 2025-09-07T07:55:56.8741927Z * [new branch] msaroufim/dtensorfusedadam -> origin/msaroufim/dtensorfusedadam 2025-09-07T07:55:56.8743445Z * [new branch] msaroufim/reduce -> origin/msaroufim/reduce 2025-09-07T07:55:56.8746258Z * [new branch] mtia/basic-cmake -> origin/mtia/basic-cmake 2025-09-07T07:55:56.8748054Z * [new branch] muon_dev -> origin/muon_dev 2025-09-07T07:55:56.8749894Z * [new branch] muon_dev_1 -> origin/muon_dev_1 2025-09-07T07:55:56.8751680Z * [new branch] nativert_num_outputs -> origin/nativert_num_outputs 2025-09-07T07:55:56.8753676Z * [new branch] nativert_numoutputs -> origin/nativert_numoutputs 2025-09-07T07:55:56.8755838Z * [new branch] new-modifiy-setupvllm -> origin/new-modifiy-setupvllm 2025-09-07T07:55:56.8757903Z * [new branch] new-setupvllm -> origin/new-setupvllm 2025-09-07T07:55:56.8759787Z * [new branch] new_zeros_dtype -> origin/new_zeros_dtype 2025-09-07T07:55:56.8761625Z * [new branch] newtest-base -> origin/newtest-base 2025-09-07T07:55:56.8764233Z * [new branch] ngimel/cat_perf1 -> origin/ngimel/cat_perf1 2025-09-07T07:55:56.8765940Z * [new branch] ngimel/einsum_fix -> origin/ngimel/einsum_fix 2025-09-07T07:55:56.8767407Z * [new branch] ngimel/error_index_list -> origin/ngimel/error_index_list 2025-09-07T07:55:56.8768930Z * [new branch] ngimel/fabric_check -> origin/ngimel/fabric_check 2025-09-07T07:55:56.8770394Z * [new branch] ngimel/fabric_fix -> origin/ngimel/fabric_fix 2025-09-07T07:55:56.8771931Z * [new branch] ngimel/fix_driver_init_error -> origin/ngimel/fix_driver_init_error 2025-09-07T07:55:56.8773284Z * [new branch] ngimel/fix_nccl_segment_seg -> origin/ngimel/fix_nccl_segment_seg 2025-09-07T07:55:56.8775094Z * [new branch] ngimel/gg_new -> origin/ngimel/gg_new 2025-09-07T07:55:56.8776613Z * [new branch] ngimel/modeguard -> origin/ngimel/modeguard 2025-09-07T07:55:56.8778127Z * [new branch] ngimel/multicast_fix -> origin/ngimel/multicast_fix 2025-09-07T07:55:56.8779661Z * [new branch] ngimel/rocm_handle_type -> origin/ngimel/rocm_handle_type 2025-09-07T07:55:56.8781189Z * [new branch] ngimel/symm_handle_fabric -> origin/ngimel/symm_handle_fabric 2025-09-07T07:55:56.8782678Z * [new branch] ngimel/unbind_multimem -> origin/ngimel/unbind_multimem 2025-09-07T07:55:56.8784771Z * [new branch] nightly -> origin/nightly 2025-09-07T07:55:56.8786886Z * [new branch] nmacchioni-patch-10 -> origin/nmacchioni-patch-10 2025-09-07T07:55:56.8788882Z * [new branch] nmacchioni-patch-7 -> origin/nmacchioni-patch-7 2025-09-07T07:55:56.8790937Z * [new branch] nmacchioni-patch-8 -> origin/nmacchioni-patch-8 2025-09-07T07:55:56.8792973Z * [new branch] nmacchioni-patch-9 -> origin/nmacchioni-patch-9 2025-09-07T07:55:56.8795729Z * [new branch] nullplay/fuse_matmul -> origin/nullplay/fuse_matmul 2025-09-07T07:55:56.8797925Z * [new branch] nullplay_fuse_matmul -> origin/nullplay_fuse_matmul 2025-09-07T07:55:56.8799747Z * [new branch] one-off -> origin/one-off 2025-09-07T07:55:56.8803010Z * [new branch] orig/release/1.10 -> origin/orig/release/1.10 2025-09-07T07:55:56.8804892Z * [new branch] orig/release/1.11 -> origin/orig/release/1.11 2025-09-07T07:55:56.8806485Z * [new branch] orig/release/1.12 -> origin/orig/release/1.12 2025-09-07T07:55:56.8808169Z * [new branch] orig/release/1.13 -> origin/orig/release/1.13 2025-09-07T07:55:56.8809855Z * [new branch] orig/release/1.6 -> origin/orig/release/1.6 2025-09-07T07:55:56.8811878Z * [new branch] orig/release/1.7 -> origin/orig/release/1.7 2025-09-07T07:55:56.8814829Z * [new branch] orig/release/1.8 -> origin/orig/release/1.8 2025-09-07T07:55:56.8815149Z * [new branch] orig/release/1.9 -> origin/orig/release/1.9 2025-09-07T07:55:56.8816884Z * [new branch] orig/release/2.0 -> origin/orig/release/2.0 2025-09-07T07:55:56.8818563Z * [new branch] orig/release/2.1 -> origin/orig/release/2.1 2025-09-07T07:55:56.8820178Z * [new branch] orig/release/2.2 -> origin/orig/release/2.2 2025-09-07T07:55:56.8821586Z * [new branch] orig/release/2.3 -> origin/orig/release/2.3 2025-09-07T07:55:56.8823327Z * [new branch] orig/release/2.4 -> origin/orig/release/2.4 2025-09-07T07:55:56.8825203Z * [new branch] orig/release/2.5 -> origin/orig/release/2.5 2025-09-07T07:55:56.8826655Z * [new branch] orig/release/2.6 -> origin/orig/release/2.6 2025-09-07T07:55:56.8828231Z * [new branch] orig/release/2.7 -> origin/orig/release/2.7 2025-09-07T07:55:56.8830003Z * [new branch] orig/release/2.8 -> origin/orig/release/2.8 2025-09-07T07:55:56.8832520Z * [new branch] oulgen/fx_graph -> origin/oulgen/fx_graph 2025-09-07T07:55:56.8834657Z * [new branch] padded-tensor -> origin/padded-tensor 2025-09-07T07:55:56.8836613Z * [new branch] pca2 -> origin/pca2 2025-09-07T07:55:56.8838727Z * [new branch] pianpwk-patch-1 -> origin/pianpwk-patch-1 2025-09-07T07:55:56.8841429Z * [new branch] pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export 2025-09-07T07:55:56.8842864Z * [new branch] pianpwk/invalidate_fake_memo -> origin/pianpwk/invalidate_fake_memo 2025-09-07T07:55:56.8844562Z * [new branch] pianpwk/max_1_strides -> origin/pianpwk/max_1_strides 2025-09-07T07:55:56.8846145Z * [new branch] pianpwk/maybe_guard_rel -> origin/pianpwk/maybe_guard_rel 2025-09-07T07:55:56.8847588Z * [new branch] pianpwk/nonzero_memo -> origin/pianpwk/nonzero_memo 2025-09-07T07:55:56.8849088Z * [new branch] pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better 2025-09-07T07:55:56.8850534Z * [new branch] pianpwk/oblivious_slice_forward -> origin/pianpwk/oblivious_slice_forward 2025-09-07T07:55:56.8851959Z * [new branch] pianpwk/oblivious_where -> origin/pianpwk/oblivious_where 2025-09-07T07:55:56.8853404Z * [new branch] pianpwk/param_static_pgo -> origin/pianpwk/param_static_pgo 2025-09-07T07:55:56.8855240Z * [new branch] pianpwk/pre_forward_hook -> origin/pianpwk/pre_forward_hook 2025-09-07T07:55:56.8856851Z * [new branch] pianpwk/remove_guard_fail_break -> origin/pianpwk/remove_guard_fail_break 2025-09-07T07:55:56.8858313Z * [new branch] pianpwk/slice_fresh_symbols -> origin/pianpwk/slice_fresh_symbols 2025-09-07T07:55:56.8859965Z * [new branch] pianpwk/sym_tokens_draft -> origin/pianpwk/sym_tokens_draft 2025-09-07T07:55:56.8861485Z * [new branch] pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false 2025-09-07T07:55:56.8862900Z * [new branch] pianpwk/test_slice_fake_impl -> origin/pianpwk/test_slice_fake_impl 2025-09-07T07:55:56.8864808Z * [new branch] pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap 2025-09-07T07:55:56.8866343Z * [new branch] pianpwk/unbacked_channels_last -> origin/pianpwk/unbacked_channels_last 2025-09-07T07:55:56.8867905Z * [new branch] pianpwk/unbacked_safe_conv1d -> origin/pianpwk/unbacked_safe_conv1d 2025-09-07T07:55:56.8869395Z * [new branch] pianpwk/unbacked_sdpa_flash -> origin/pianpwk/unbacked_sdpa_flash 2025-09-07T07:55:56.8870976Z * [new branch] pianpwk/unbacked_should_swap -> origin/pianpwk/unbacked_should_swap 2025-09-07T07:55:56.8872503Z * [new branch] pianpwk/unbacked_should_swap_2 -> origin/pianpwk/unbacked_should_swap_2 2025-09-07T07:55:56.8874177Z * [new branch] pianpwk/unbacked_slice_binding -> origin/pianpwk/unbacked_slice_binding 2025-09-07T07:55:56.8875994Z * [new branch] pianpwk/unbacked_slice_forward -> origin/pianpwk/unbacked_slice_forward 2025-09-07T07:55:56.8877678Z * [new branch] pianpwk/user_symints -> origin/pianpwk/user_symints 2025-09-07T07:55:56.8879317Z * [new branch] pianpwk/wan21_reshape -> origin/pianpwk/wan21_reshape 2025-09-07T07:55:56.8880934Z * [new branch] pianpwk/whitelist_optimizer -> origin/pianpwk/whitelist_optimizer 2025-09-07T07:55:56.8882905Z * [new branch] pin-torchao -> origin/pin-torchao 2025-09-07T07:55:56.8886105Z * [new branch] piz/fall_back_missing_0716 -> origin/piz/fall_back_missing_0716 2025-09-07T07:55:56.8887638Z * [new branch] piz/improve_scatter_0808 -> origin/piz/improve_scatter_0808 2025-09-07T07:55:56.8889449Z * [new branch] pool-separate -> origin/pool-separate 2025-09-07T07:55:56.8891347Z * [new branch] pr-156087 -> origin/pr-156087 2025-09-07T07:55:56.8894103Z * [new branch] pr/131860 -> origin/pr/131860 2025-09-07T07:55:56.8896250Z * [new branch] predispatch_to -> origin/predispatch_to 2025-09-07T07:55:56.8898039Z * [new branch] pt-opt-cuda3 -> origin/pt-opt-cuda3 2025-09-07T07:55:56.8899924Z * [new branch] pyobjectslot -> origin/pyobjectslot 2025-09-07T07:55:56.8902146Z * [new branch] python_compiled_autograd -> origin/python_compiled_autograd 2025-09-07T07:55:56.8905474Z * [new branch] qchip/export-D54134695 -> origin/qchip/export-D54134695 2025-09-07T07:55:56.8907320Z * [new branch] quint-bits -> origin/quint-bits 2025-09-07T07:55:56.8909936Z * [new branch] release/1.10 -> origin/release/1.10 2025-09-07T07:55:56.8911532Z * [new branch] release/1.11 -> origin/release/1.11 2025-09-07T07:55:56.8913150Z * [new branch] release/1.12 -> origin/release/1.12 2025-09-07T07:55:56.8915175Z * [new branch] release/1.13 -> origin/release/1.13 2025-09-07T07:55:56.8916646Z * [new branch] release/1.4 -> origin/release/1.4 2025-09-07T07:55:56.8918124Z * [new branch] release/1.4.1 -> origin/release/1.4.1 2025-09-07T07:55:56.8919686Z * [new branch] release/1.5 -> origin/release/1.5 2025-09-07T07:55:56.8921341Z * [new branch] release/1.6 -> origin/release/1.6 2025-09-07T07:55:56.8923157Z * [new branch] release/1.7 -> origin/release/1.7 2025-09-07T07:55:56.8925338Z * [new branch] release/1.8 -> origin/release/1.8 2025-09-07T07:55:56.8926643Z * [new branch] release/1.9 -> origin/release/1.9 2025-09-07T07:55:56.8928201Z * [new branch] release/2.0 -> origin/release/2.0 2025-09-07T07:55:56.8929902Z * [new branch] release/2.1 -> origin/release/2.1 2025-09-07T07:55:56.8931453Z * [new branch] release/2.2 -> origin/release/2.2 2025-09-07T07:55:56.8933098Z * [new branch] release/2.3 -> origin/release/2.3 2025-09-07T07:55:56.8935131Z * [new branch] release/2.4 -> origin/release/2.4 2025-09-07T07:55:56.8936722Z * [new branch] release/2.5 -> origin/release/2.5 2025-09-07T07:55:56.8938269Z * [new branch] release/2.6 -> origin/release/2.6 2025-09-07T07:55:56.8939868Z * [new branch] release/2.7 -> origin/release/2.7 2025-09-07T07:55:56.8941468Z * [new branch] release/2.8 -> origin/release/2.8 2025-09-07T07:55:56.8943458Z * [new branch] release_notes -> origin/release_notes 2025-09-07T07:55:56.8945850Z * [new branch] remove-actionable-label -> origin/remove-actionable-label 2025-09-07T07:55:56.8947759Z * [new branch] remove-ao -> origin/remove-ao 2025-09-07T07:55:56.8949752Z * [new branch] removedeprecatedvllmtest -> origin/removedeprecatedvllmtest 2025-09-07T07:55:56.8952147Z * [new branch] replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836 2025-09-07T07:55:56.8953995Z * [new branch] replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248 2025-09-07T07:55:56.8955924Z * [new branch] replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324 2025-09-07T07:55:56.8958011Z * [new branch] replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020 2025-09-07T07:55:56.8959800Z * [new branch] replace-pytorch-labs-20250812-204125 -> origin/replace-pytorch-labs-20250812-204125 2025-09-07T07:55:56.8961675Z * [new branch] replace-pytorch-labs-20250812-205624 -> origin/replace-pytorch-labs-20250812-205624 2025-09-07T07:55:56.8965840Z * [new branch] revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head 2025-09-07T07:55:56.8969815Z * [new branch] revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head 2025-09-07T07:55:56.8973868Z * [new branch] revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head 2025-09-07T07:55:56.8977020Z * [new branch] revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ 2025-09-07T07:55:56.8978702Z * [new branch] rocm-monitoring -> origin/rocm-monitoring 2025-09-07T07:55:56.8981218Z * [new branch] ruisi/relax_memory -> origin/ruisi/relax_memory 2025-09-07T07:55:56.8983586Z * [new branch] run-torchbench-smoke-test-h100 -> origin/run-torchbench-smoke-test-h100 2025-09-07T07:55:56.8986554Z * [new branch] ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures 2025-09-07T07:55:56.8987973Z * [new branch] ryanguo99/fix-closure-var -> origin/ryanguo99/fix-closure-var 2025-09-07T07:55:56.8990379Z * [new branch] rzou/faketensor_bench -> origin/rzou/faketensor_bench 2025-09-07T07:55:56.8991922Z * [new branch] rzou/njt -> origin/rzou/njt 2025-09-07T07:55:56.8993494Z * [new branch] rzou/pca -> origin/rzou/pca 2025-09-07T07:55:56.8995345Z * [new branch] rzou/realprop -> origin/rzou/realprop 2025-09-07T07:55:56.8997130Z * [new branch] rzou/setup_context -> origin/rzou/setup_context 2025-09-07T07:55:56.8999696Z * [new branch] sanchitintel/refactor_aten_int8_woq_gemm -> origin/sanchitintel/refactor_aten_int8_woq_gemm 2025-09-07T07:55:56.9001396Z * [new branch] sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm 2025-09-07T07:55:56.9002966Z * [new branch] sapling-pr-archive-SS-JIA -> origin/sapling-pr-archive-SS-JIA 2025-09-07T07:55:56.9005037Z * [new branch] save -> origin/save 2025-09-07T07:55:56.9007502Z * [new branch] sdym/2.5.1 -> origin/sdym/2.5.1 2025-09-07T07:55:56.9009357Z * [new branch] seemethere-patch-1 -> origin/seemethere-patch-1 2025-09-07T07:55:56.9011005Z * [new branch] setupvllm -> origin/setupvllm 2025-09-07T07:55:56.9012752Z * [new branch] share_and_pin_fork -> origin/share_and_pin_fork 2025-09-07T07:55:56.9015486Z * [new branch] shengf/fx-xform-perf -> origin/shengf/fx-xform-perf 2025-09-07T07:55:56.9017252Z * [new branch] shikaili_fp8_allgather -> origin/shikaili_fp8_allgather 2025-09-07T07:55:56.9019052Z * [new branch] shoumikhin-patch-1 -> origin/shoumikhin-patch-1 2025-09-07T07:55:56.9020862Z * [new branch] shoumikhin-patch-12 -> origin/shoumikhin-patch-12 2025-09-07T07:55:56.9022533Z * [new branch] simplify-fq-per-channel -> origin/simplify-fq-per-channel 2025-09-07T07:55:56.9024496Z * [new branch] solve-accuracy-fix -> origin/solve-accuracy-fix 2025-09-07T07:55:56.9026875Z * [new branch] soulitzer/stash-tls-ac -> origin/soulitzer/stash-tls-ac 2025-09-07T07:55:56.9029532Z * [new branch] sqzhang/flight4 -> origin/sqzhang/flight4 2025-09-07T07:55:56.9031118Z * [new branch] sqzhang/flight4plus -> origin/sqzhang/flight4plus 2025-09-07T07:55:56.9033509Z * [new branch] sraikund/record_funct_test -> origin/sraikund/record_funct_test 2025-09-07T07:55:56.9036342Z * [new branch] sraikund16/test -> origin/sraikund16/test 2025-09-07T07:55:56.9038241Z * [new branch] stablize-compilation-time -> origin/stablize-compilation-time 2025-09-07T07:55:56.9040009Z * [new branch] standalone-templates -> origin/standalone-templates 2025-09-07T07:55:56.9041829Z * [new branch] standalone_package_weights -> origin/standalone_package_weights 2025-09-07T07:55:56.9043488Z * [new branch] starterTaskUpdate -> origin/starterTaskUpdate 2025-09-07T07:55:56.9045479Z * [new branch] subgraph_fuse -> origin/subgraph_fuse 2025-09-07T07:55:56.9047285Z * [new branch] support-uv-in-collect_env -> origin/support-uv-in-collect_env 2025-09-07T07:55:56.9048994Z * [new branch] sve-poc -> origin/sve-poc 2025-09-07T07:55:56.9050759Z * [new branch] svekars-patch-1 -> origin/svekars-patch-1 2025-09-07T07:55:56.9052561Z * [new branch] switch-bn -> origin/switch-bn 2025-09-07T07:55:56.9054638Z * [new branch] sympy-bottleneck-repro -> origin/sympy-bottleneck-repro 2025-09-07T07:55:56.9057194Z * [new branch] tenpercent/ck_rocm_ci_v3 -> origin/tenpercent/ck_rocm_ci_v3 2025-09-07T07:55:56.9058930Z * [new branch] tensordict_integration -> origin/tensordict_integration 2025-09-07T07:55:56.9060652Z * [new branch] test-7054 -> origin/test-7054 2025-09-07T07:55:56.9062495Z * [new branch] test-move-conda-builds -> origin/test-move-conda-builds 2025-09-07T07:55:56.9064862Z * [new branch] test-myst-markdown-docstring -> origin/test-myst-markdown-docstring 2025-09-07T07:55:56.9066340Z * [new branch] test-old -> origin/test-old 2025-09-07T07:55:56.9068126Z * [new branch] test-vec-migration-internally -> origin/test-vec-migration-internally 2025-09-07T07:55:56.9070493Z * [new branch] test/bmm_heur -> origin/test/bmm_heur 2025-09-07T07:55:56.9071980Z * [new branch] test/inductor -> origin/test/inductor 2025-09-07T07:55:56.9074871Z * [new branch] tianren/flex_paged_attn_fix -> origin/tianren/flex_paged_attn_fix 2025-09-07T07:55:56.9076416Z * [new branch] tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp 2025-09-07T07:55:56.9078168Z * [new branch] tianren/test -> origin/tianren/test 2025-09-07T07:55:56.9079944Z * [new branch] tidy_performance_cyy -> origin/tidy_performance_cyy 2025-09-07T07:55:56.9081764Z * [new branch] torchtitan_ep -> origin/torchtitan_ep 2025-09-07T07:55:56.9083599Z * [new branch] trace_fsdp_torchtune_lora -> origin/trace_fsdp_torchtune_lora 2025-09-07T07:55:56.9085679Z * [new branch] traceable_fsdp_unit_tests -> origin/traceable_fsdp_unit_tests 2025-09-07T07:55:56.9087363Z * [new branch] tree_loop_vec_base -> origin/tree_loop_vec_base 2025-09-07T07:55:56.9089208Z * [new branch] tree_vec_base -> origin/tree_vec_base 2025-09-07T07:55:56.9090946Z * [new branch] triton-update -> origin/triton-update 2025-09-07T07:55:56.9092682Z * [new branch] triton_kernel -> origin/triton_kernel 2025-09-07T07:55:56.9094565Z * [new branch] triton_kernel_perf -> origin/triton_kernel_perf 2025-09-07T07:55:56.9096374Z * [new branch] tt_pkg_1908 -> origin/tt_pkg_1908 2025-09-07T07:55:56.9098187Z * [new branch] tweak-transformer-dependabot -> origin/tweak-transformer-dependabot 2025-09-07T07:55:56.9099798Z * [new branch] type_dec -> origin/type_dec 2025-09-07T07:55:56.9101705Z * [new branch] udate-sphinx-dependancies -> origin/udate-sphinx-dependancies 2025-09-07T07:55:56.9104571Z * [new branch] update-audio-commit-hash/16818882925-1712-1 -> origin/update-audio-commit-hash/16818882925-1712-1 2025-09-07T07:55:56.9106088Z * [new branch] update-audio-commit-hash/16895560422-1720-1 -> origin/update-audio-commit-hash/16895560422-1720-1 2025-09-07T07:55:56.9107545Z * [new branch] update-audio-commit-hash/16924174496-1738-1 -> origin/update-audio-commit-hash/16924174496-1738-1 2025-09-07T07:55:56.9109161Z * [new branch] update-audio-commit-hash/17002010821-1749-1 -> origin/update-audio-commit-hash/17002010821-1749-1 2025-09-07T07:55:56.9110713Z * [new branch] update-audio-commit-hash/17056004427-1766-1 -> origin/update-audio-commit-hash/17056004427-1766-1 2025-09-07T07:55:56.9112177Z * [new branch] update-audio-commit-hash/17085054029-1767-1 -> origin/update-audio-commit-hash/17085054029-1767-1 2025-09-07T07:55:56.9113818Z * [new branch] update-audio-commit-hash/17142507405-1771-1 -> origin/update-audio-commit-hash/17142507405-1771-1 2025-09-07T07:55:56.9115522Z * [new branch] update-audio-commit-hash/17168762740-1773-1 -> origin/update-audio-commit-hash/17168762740-1773-1 2025-09-07T07:55:56.9116955Z * [new branch] update-audio-commit-hash/17311174639-1780-1 -> origin/update-audio-commit-hash/17311174639-1780-1 2025-09-07T07:55:56.9118648Z * [new branch] update-audio-commit-hash/17336898740-1781-1 -> origin/update-audio-commit-hash/17336898740-1781-1 2025-09-07T07:55:56.9120139Z * [new branch] update-audio-commit-hash/17389727684-1786-1 -> origin/update-audio-commit-hash/17389727684-1786-1 2025-09-07T07:55:56.9121867Z * [new branch] update-audio-commit-hash/17449538142-1790-1 -> origin/update-audio-commit-hash/17449538142-1790-1 2025-09-07T07:55:56.9123244Z * [new branch] update-audio-commit-hash/17507351808-1794-1 -> origin/update-audio-commit-hash/17507351808-1794-1 2025-09-07T07:55:56.9125345Z * [new branch] update-dynamic-shapes-doc -> origin/update-dynamic-shapes-doc 2025-09-07T07:55:56.9127833Z * [new branch] update-executorch-commit-hash/15694981040-1626-1 -> origin/update-executorch-commit-hash/15694981040-1626-1 2025-09-07T07:55:56.9130198Z * [new branch] update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2 2025-09-07T07:55:56.9132705Z * [new branch] update-vision-commit-hash/15336342773-1607-1 -> origin/update-vision-commit-hash/15336342773-1607-1 2025-09-07T07:55:56.9135605Z * [new branch] update-vllm-commit-hash/16737365217-1704-1 -> origin/update-vllm-commit-hash/16737365217-1704-1 2025-09-07T07:55:56.9137206Z * [new branch] update-vllm-commit-hash/16843157111-1713-1 -> origin/update-vllm-commit-hash/16843157111-1713-1 2025-09-07T07:55:56.9138697Z * [new branch] update-vllm-commit-hash/16855312394-1714-1 -> origin/update-vllm-commit-hash/16855312394-1714-1 2025-09-07T07:55:56.9140090Z * [new branch] update-vllm-commit-hash/16924174496-1738-1 -> origin/update-vllm-commit-hash/16924174496-1738-1 2025-09-07T07:55:56.9141598Z * [new branch] update-vllm-commit-hash/16952608705-1745-1 -> origin/update-vllm-commit-hash/16952608705-1745-1 2025-09-07T07:55:56.9143109Z * [new branch] update-vllm-commit-hash/16979836546-1748-1 -> origin/update-vllm-commit-hash/16979836546-1748-1 2025-09-07T07:55:56.9144853Z * [new branch] update-vllm-commit-hash/17014576881-1756-1 -> origin/update-vllm-commit-hash/17014576881-1756-1 2025-09-07T07:55:56.9146322Z * [new branch] update-vllm-commit-hash/17027830869-1761-1 -> origin/update-vllm-commit-hash/17027830869-1761-1 2025-09-07T07:55:56.9147819Z * [new branch] update-vllm-commit-hash/17056004427-1766-1 -> origin/update-vllm-commit-hash/17056004427-1766-1 2025-09-07T07:55:56.9149323Z * [new branch] update-vllm-commit-hash/17085054029-1767-1 -> origin/update-vllm-commit-hash/17085054029-1767-1 2025-09-07T07:55:56.9150804Z * [new branch] update-vllm-commit-hash/17113610216-1768-1 -> origin/update-vllm-commit-hash/17113610216-1768-1 2025-09-07T07:55:56.9152304Z * [new branch] update-vllm-commit-hash/17142507405-1771-1 -> origin/update-vllm-commit-hash/17142507405-1771-1 2025-09-07T07:55:56.9153930Z * [new branch] update-vllm-commit-hash/17181878974-1774-1 -> origin/update-vllm-commit-hash/17181878974-1774-1 2025-09-07T07:55:56.9155632Z * [new branch] update-vllm-commit-hash/17311174639-1780-1 -> origin/update-vllm-commit-hash/17311174639-1780-1 2025-09-07T07:55:56.9157222Z * [new branch] update-vllm-commit-hash/17336898740-1781-1 -> origin/update-vllm-commit-hash/17336898740-1781-1 2025-09-07T07:55:56.9158869Z * [new branch] update-vllm-commit-hash/17364352302-1785-1 -> origin/update-vllm-commit-hash/17364352302-1785-1 2025-09-07T07:55:56.9160367Z * [new branch] update-vllm-commit-hash/17389727684-1786-1 -> origin/update-vllm-commit-hash/17389727684-1786-1 2025-09-07T07:55:56.9162041Z * [new branch] update-vllm-commit-hash/17449538142-1790-1 -> origin/update-vllm-commit-hash/17449538142-1790-1 2025-09-07T07:55:56.9163647Z * [new branch] update-vllm-commit-hash/17480069797-1791-1 -> origin/update-vllm-commit-hash/17480069797-1791-1 2025-09-07T07:55:56.9165670Z * [new branch] update-vllm-commit-hash/17507351808-1794-1 -> origin/update-vllm-commit-hash/17507351808-1794-1 2025-09-07T07:55:56.9168215Z * [new branch] update-xla-commit-hash/16873912760-198-1 -> origin/update-xla-commit-hash/16873912760-198-1 2025-09-07T07:55:56.9169583Z * [new branch] update-xla-commit-hash/17034266655-199-1 -> origin/update-xla-commit-hash/17034266655-199-1 2025-09-07T07:55:56.9171091Z * [new branch] update-xla-commit-hash/17202464405-200-1 -> origin/update-xla-commit-hash/17202464405-200-1 2025-09-07T07:55:56.9172877Z * [new branch] update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388 2025-09-07T07:55:56.9174908Z * [new branch] update_executorch_pin -> origin/update_executorch_pin 2025-09-07T07:55:56.9176818Z * [new branch] update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736 2025-09-07T07:55:56.9178590Z * [new branch] update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173 2025-09-07T07:55:56.9180437Z * [new branch] update_slow_tests_1752478971 -> origin/update_slow_tests_1752478971 2025-09-07T07:55:56.9182152Z * [new branch] update_slow_tests_1755502951 -> origin/update_slow_tests_1755502951 2025-09-07T07:55:56.9184006Z * [new branch] update_slow_tests_1756107664 -> origin/update_slow_tests_1756107664 2025-09-07T07:55:56.9186013Z * [new branch] update_submodule_FBGEMM -> origin/update_submodule_FBGEMM 2025-09-07T07:55:56.9187777Z * [new branch] update_submodule_kineto -> origin/update_submodule_kineto 2025-09-07T07:55:56.9189527Z * [new branch] update_submodule_tensorpipe -> origin/update_submodule_tensorpipe 2025-09-07T07:55:56.9191350Z * [new branch] v0.1.2 -> origin/v0.1.2 2025-09-07T07:55:56.9193213Z * [new branch] v1.0.1 -> origin/v1.0.1 2025-09-07T07:55:56.9195356Z * [new branch] v1.0.3 -> origin/v1.0.3 2025-09-07T07:55:56.9197153Z * [new branch] v1.1.0 -> origin/v1.1.0 2025-09-07T07:55:56.9199077Z * [new branch] v1.2.0 -> origin/v1.2.0 2025-09-07T07:55:56.9200880Z * [new branch] v1.3.0 -> origin/v1.3.0 2025-09-07T07:55:56.9202728Z * [new branch] v1.3.1 -> origin/v1.3.1 2025-09-07T07:55:56.9204823Z * [new branch] validate_fn -> origin/validate_fn 2025-09-07T07:55:56.9206764Z * [new branch] validations_2.6 -> origin/validations_2.6 2025-09-07T07:55:56.9208602Z * [new branch] validations_2.8 -> origin/validations_2.8 2025-09-07T07:55:56.9211084Z * [new branch] viable/strict -> origin/viable/strict 2025-09-07T07:55:56.9212877Z * [new branch] vllmbuildci -> origin/vllmbuildci 2025-09-07T07:55:56.9215085Z * [new branch] vllmpin -> origin/vllmpin 2025-09-07T07:55:56.9217641Z * [new branch] wdvr/conda_devcontainer -> origin/wdvr/conda_devcontainer 2025-09-07T07:55:56.9219108Z * [new branch] wdvr/iss_145259 -> origin/wdvr/iss_145259 2025-09-07T07:55:56.9220891Z * [new branch] weight_sharing_cpp -> origin/weight_sharing_cpp 2025-09-07T07:55:56.9223547Z * [new branch] whc/flight4 -> origin/whc/flight4 2025-09-07T07:55:56.9225397Z * [new branch] whc/flight51 -> origin/whc/flight51 2025-09-07T07:55:56.9226938Z * [new branch] whc/flight53 -> origin/whc/flight53 2025-09-07T07:55:56.9228555Z * [new branch] whc/stage2 -> origin/whc/stage2 2025-09-07T07:55:56.9230048Z * [new branch] whc/uneven -> origin/whc/uneven 2025-09-07T07:55:56.9231952Z * [new branch] whc/uneven-merge -> origin/whc/uneven-merge 2025-09-07T07:55:56.9233841Z * [new branch] win_warnings -> origin/win_warnings 2025-09-07T07:55:56.9236028Z * [new branch] windows_libtorch_free -> origin/windows_libtorch_free 2025-09-07T07:55:56.9237547Z * [new branch] workonoldcommit -> origin/workonoldcommit 2025-09-07T07:55:56.9239607Z * [new branch] wychi-autotune-prune-configs-by-shared-mem -> origin/wychi-autotune-prune-configs-by-shared-mem 2025-09-07T07:55:56.9242054Z * [new branch] xmfan/ca_0516 -> origin/xmfan/ca_0516 2025-09-07T07:55:56.9243581Z * [new branch] xmfan/ca_1051b93192 -> origin/xmfan/ca_1051b93192 2025-09-07T07:55:56.9245580Z * [new branch] xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 2025-09-07T07:55:56.9246940Z * [new branch] xmfan/ca_5a2be192d1 -> origin/xmfan/ca_5a2be192d1 2025-09-07T07:55:56.9248357Z * [new branch] xmfan/ca_9d59b516e9 -> origin/xmfan/ca_9d59b516e9 2025-09-07T07:55:56.9249950Z * [new branch] xmfan/ca_api -> origin/xmfan/ca_api 2025-09-07T07:55:56.9251431Z * [new branch] xmfan/ca_apr8 -> origin/xmfan/ca_apr8 2025-09-07T07:55:56.9252864Z * [new branch] xmfan/ca_base -> origin/xmfan/ca_base 2025-09-07T07:55:56.9254798Z * [new branch] xmfan/ca_cudagraphs -> origin/xmfan/ca_cudagraphs 2025-09-07T07:55:56.9256347Z * [new branch] xmfan/ca_dynamic -> origin/xmfan/ca_dynamic 2025-09-07T07:55:56.9257931Z * [new branch] xmfan/ca_fix_dyn -> origin/xmfan/ca_fix_dyn 2025-09-07T07:55:56.9259451Z * [new branch] xmfan/ca_fix_lowering -> origin/xmfan/ca_fix_lowering 2025-09-07T07:55:56.9260961Z * [new branch] xmfan/ca_fix_polyfills -> origin/xmfan/ca_fix_polyfills 2025-09-07T07:55:56.9262336Z * [new branch] xmfan/ca_jan3 -> origin/xmfan/ca_jan3 2025-09-07T07:55:56.9263939Z * [new branch] xmfan/ca_jun18 -> origin/xmfan/ca_jun18 2025-09-07T07:55:56.9265671Z * [new branch] xmfan/ca_jun24 -> origin/xmfan/ca_jun24 2025-09-07T07:55:56.9267154Z * [new branch] xmfan/ca_mem_base -> origin/xmfan/ca_mem_base 2025-09-07T07:55:56.9268677Z * [new branch] xmfan/ca_mem_fix -> origin/xmfan/ca_mem_fix 2025-09-07T07:55:56.9270266Z * [new branch] xmfan/ca_memory_fix -> origin/xmfan/ca_memory_fix 2025-09-07T07:55:56.9271875Z * [new branch] xmfan/ca_memory_fix_rebased -> origin/xmfan/ca_memory_fix_rebased 2025-09-07T07:55:56.9273420Z * [new branch] xmfan/ca_memory_fix_rebased2 -> origin/xmfan/ca_memory_fix_rebased2 2025-09-07T07:55:56.9275282Z * [new branch] xmfan/ca_move_to_cuda -> origin/xmfan/ca_move_to_cuda 2025-09-07T07:55:56.9276711Z * [new branch] xmfan/ca_nested -> origin/xmfan/ca_nested 2025-09-07T07:55:56.9278450Z * [new branch] xmfan/ca_overhead -> origin/xmfan/ca_overhead 2025-09-07T07:55:56.9280019Z * [new branch] xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451 2025-09-07T07:55:56.9281540Z * [new branch] xmfan/ca_scalar -> origin/xmfan/ca_scalar 2025-09-07T07:55:56.9283071Z * [new branch] xmfan/ca_subclass_mem_fix -> origin/xmfan/ca_subclass_mem_fix 2025-09-07T07:55:56.9284939Z * [new branch] xmfan/ca_warm_mem -> origin/xmfan/ca_warm_mem 2025-09-07T07:55:56.9286392Z * [new branch] xmfan/ca_warm_mem_base -> origin/xmfan/ca_warm_mem_base 2025-09-07T07:55:56.9287881Z * [new branch] xmfan/cacu_jun18 -> origin/xmfan/cacu_jun18 2025-09-07T07:55:56.9289441Z * [new branch] xmfan/cacu_jun19 -> origin/xmfan/cacu_jun19 2025-09-07T07:55:56.9291053Z * [new branch] xmfan/cacu_jun4 -> origin/xmfan/cacu_jun4 2025-09-07T07:55:56.9292962Z * [new branch] xmfan/cacu_may27 -> origin/xmfan/cacu_may27 2025-09-07T07:55:56.9294807Z * [new branch] xmfan/disable_duck_shape -> origin/xmfan/disable_duck_shape 2025-09-07T07:55:56.9296409Z * [new branch] xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough 2025-09-07T07:55:56.9297876Z * [new branch] xmfan/issue_123374 -> origin/xmfan/issue_123374 2025-09-07T07:55:56.9299646Z * [new branch] xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 2025-09-07T07:55:56.9301207Z * [new branch] xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 2025-09-07T07:55:56.9302632Z * [new branch] xmfan/segfault_test -> origin/xmfan/segfault_test 2025-09-07T07:55:56.9304434Z * [new branch] xmfan/single_step -> origin/xmfan/single_step 2025-09-07T07:55:56.9306175Z * [new branch] xmfan/sth_0829 -> origin/xmfan/sth_0829 2025-09-07T07:55:56.9307830Z * [new branch] xmfan/test -> origin/xmfan/test 2025-09-07T07:55:56.9310392Z * [new branch] yguo/debug-0226-constexpr -> origin/yguo/debug-0226-constexpr 2025-09-07T07:55:56.9311904Z * [new branch] yguo/new_latest_changes -> origin/yguo/new_latest_changes 2025-09-07T07:55:56.9313496Z * [new branch] yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes 2025-09-07T07:55:56.9315474Z * [new branch] yihan_quantization -> origin/yihan_quantization 2025-09-07T07:55:56.9318280Z * [new branch] yiming/add_jit_trace_benchmark -> origin/yiming/add_jit_trace_benchmark 2025-09-07T07:55:56.9319620Z * [new branch] yiming/add_nativert_benchmark -> origin/yiming/add_nativert_benchmark 2025-09-07T07:55:56.9321161Z * [new branch] yiming/bootcamp -> origin/yiming/bootcamp 2025-09-07T07:55:56.9323558Z * [new branch] zainr/canary-test -> origin/zainr/canary-test 2025-09-07T07:55:56.9325564Z * [new branch] zainr/cleanup-gh-runners -> origin/zainr/cleanup-gh-runners 2025-09-07T07:55:56.9326846Z * [new branch] zainr/git-push-v2 -> origin/zainr/git-push-v2 2025-09-07T07:55:56.9328410Z * [new branch] zainr/pull-migration-c -> origin/zainr/pull-migration-c 2025-09-07T07:55:56.9329887Z * [new branch] zainr/test -> origin/zainr/test 2025-09-07T07:55:56.9331357Z * [new branch] zainr/test2 -> origin/zainr/test2 2025-09-07T07:55:56.9332874Z * [new branch] zainr/unstable -> origin/zainr/unstable 2025-09-07T07:55:56.9334815Z * [new branch] zainr/unstable-xla -> origin/zainr/unstable-xla 2025-09-07T07:55:56.9336708Z * [new branch] zasdfgbnm-patch-3 -> origin/zasdfgbnm-patch-3 2025-09-07T07:55:56.9338398Z * [new branch] zb2p -> origin/zb2p 2025-09-07T07:55:56.9340272Z * [new branch] zero_grad_optimization -> origin/zero_grad_optimization 2025-09-07T07:55:56.9342303Z * [new branch] zeros-and-scatter-part2 -> origin/zeros-and-scatter-part2 2025-09-07T07:55:57.1170051Z * [new branch] zhxchen17/scratch/0 -> origin/zhxchen17/scratch/0 2025-09-07T07:55:57.3617914Z * [new branch] zhxhcen17/moodycamel -> origin/zhxhcen17/moodycamel 2025-09-07T07:55:57.3623048Z * [new branch] zxiiro/main -> origin/zxiiro/main 2025-09-07T07:55:57.3625911Z * [new tag] bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug -> bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug 2025-09-07T07:55:57.3627574Z * [new tag] ci/binaries/77164 -> ci/binaries/77164 2025-09-07T07:55:57.3629348Z * [new tag] ciflow/binaries/156049 -> ciflow/binaries/156049 2025-09-07T07:55:57.3629768Z * [new tag] ciflow/binaries/156712 -> ciflow/binaries/156712 2025-09-07T07:55:57.3630639Z * [new tag] ciflow/binaries/157432 -> ciflow/binaries/157432 2025-09-07T07:55:57.3631578Z * [new tag] ciflow/binaries/157685 -> ciflow/binaries/157685 2025-09-07T07:55:57.3632424Z * [new tag] ciflow/binaries/157689 -> ciflow/binaries/157689 2025-09-07T07:55:57.3633283Z * [new tag] ciflow/binaries/158104 -> ciflow/binaries/158104 2025-09-07T07:55:57.3634963Z * [new tag] ciflow/binaries/160229 -> ciflow/binaries/160229 2025-09-07T07:55:57.3635513Z * [new tag] ciflow/binaries/160720 -> ciflow/binaries/160720 2025-09-07T07:55:57.3636380Z * [new tag] ciflow/binaries/162080 -> ciflow/binaries/162080 2025-09-07T07:55:57.3637325Z * [new tag] ciflow/binaries/162329 -> ciflow/binaries/162329 2025-09-07T07:55:57.3638775Z * [new tag] ciflow/binaries_libtorch/156049 -> ciflow/binaries_libtorch/156049 2025-09-07T07:55:57.3639550Z * [new tag] ciflow/binaries_libtorch/156711 -> ciflow/binaries_libtorch/156711 2025-09-07T07:55:57.3640383Z * [new tag] ciflow/binaries_libtorch/157432 -> ciflow/binaries_libtorch/157432 2025-09-07T07:55:57.3641710Z * [new tag] ciflow/binaries_wheel/156049 -> ciflow/binaries_wheel/156049 2025-09-07T07:55:57.3642485Z * [new tag] ciflow/binaries_wheel/156711 -> ciflow/binaries_wheel/156711 2025-09-07T07:55:57.3643286Z * [new tag] ciflow/binaries_wheel/157432 -> ciflow/binaries_wheel/157432 2025-09-07T07:55:57.3644411Z * [new tag] ciflow/binaries_wheel/162136 -> ciflow/binaries_wheel/162136 2025-09-07T07:55:57.3645424Z * [new tag] ciflow/binaries_wheel/162252 -> ciflow/binaries_wheel/162252 2025-09-07T07:55:57.3646250Z * [new tag] ciflow/binaries_wheel/162325 -> ciflow/binaries_wheel/162325 2025-09-07T07:55:57.3647726Z * [new tag] ciflow/h100-distributed/156703 -> ciflow/h100-distributed/156703 2025-09-07T07:55:57.3649101Z * [new tag] ciflow/h100-symm-mem/157635 -> ciflow/h100-symm-mem/157635 2025-09-07T07:55:57.3649917Z * [new tag] ciflow/h100-symm-mem/161984 -> ciflow/h100-symm-mem/161984 2025-09-07T07:55:57.3650741Z * [new tag] ciflow/h100-symm-mem/162003 -> ciflow/h100-symm-mem/162003 2025-09-07T07:55:57.3651596Z * [new tag] ciflow/h100-symm-mem/162011 -> ciflow/h100-symm-mem/162011 2025-09-07T07:55:57.3652393Z * [new tag] ciflow/h100-symm-mem/162026 -> ciflow/h100-symm-mem/162026 2025-09-07T07:55:57.3653290Z * [new tag] ciflow/h100-symm-mem/162033 -> ciflow/h100-symm-mem/162033 2025-09-07T07:55:57.3654519Z * [new tag] ciflow/h100-symm-mem/162040 -> ciflow/h100-symm-mem/162040 2025-09-07T07:55:57.3655310Z * [new tag] ciflow/h100-symm-mem/162041 -> ciflow/h100-symm-mem/162041 2025-09-07T07:55:57.3656179Z * [new tag] ciflow/h100-symm-mem/162142 -> ciflow/h100-symm-mem/162142 2025-09-07T07:55:57.3657083Z * [new tag] ciflow/h100-symm-mem/162150 -> ciflow/h100-symm-mem/162150 2025-09-07T07:55:57.3657929Z * [new tag] ciflow/h100-symm-mem/162243 -> ciflow/h100-symm-mem/162243 2025-09-07T07:55:57.3658785Z * [new tag] ciflow/h100-symm-mem/162320 -> ciflow/h100-symm-mem/162320 2025-09-07T07:55:57.3660144Z * [new tag] ciflow/h100/159158 -> ciflow/h100/159158 2025-09-07T07:55:57.3661506Z * [new tag] ciflow/h100/160480 -> ciflow/h100/160480 2025-09-07T07:55:57.3662326Z * [new tag] ciflow/h100/161749 -> ciflow/h100/161749 2025-09-07T07:55:57.3663402Z * [new tag] ciflow/h100/162022 -> ciflow/h100/162022 2025-09-07T07:55:57.3664367Z * [new tag] ciflow/h100/162278 -> ciflow/h100/162278 2025-09-07T07:55:57.3666059Z * [new tag] ciflow/inductor-perf-test-nightly-rocm/156592 -> ciflow/inductor-perf-test-nightly-rocm/156592 2025-09-07T07:55:57.3667053Z * [new tag] ciflow/inductor-perf-test-nightly/156592 -> ciflow/inductor-perf-test-nightly/156592 2025-09-07T07:55:57.3668272Z * [new tag] ciflow/inductor-periodic/162063 -> ciflow/inductor-periodic/162063 2025-09-07T07:55:57.3669145Z * [new tag] ciflow/inductor-periodic/162227 -> ciflow/inductor-periodic/162227 2025-09-07T07:55:57.3670138Z * [new tag] ciflow/inductor-periodic/162323 -> ciflow/inductor-periodic/162323 2025-09-07T07:55:57.3671581Z * [new tag] ciflow/inductor-rocm/154170 -> ciflow/inductor-rocm/154170 2025-09-07T07:55:57.3672586Z * [new tag] ciflow/inductor-rocm/159146 -> ciflow/inductor-rocm/159146 2025-09-07T07:55:57.3673461Z * [new tag] ciflow/inductor-rocm/159158 -> ciflow/inductor-rocm/159158 2025-09-07T07:55:57.3674856Z * [new tag] ciflow/inductor-rocm/161715 -> ciflow/inductor-rocm/161715 2025-09-07T07:55:57.3675726Z * [new tag] ciflow/inductor-rocm/162053 -> ciflow/inductor-rocm/162053 2025-09-07T07:55:57.3676743Z * [new tag] ciflow/inductor-rocm/162056 -> ciflow/inductor-rocm/162056 2025-09-07T07:55:57.3678198Z * [new tag] ciflow/inductor/137400 -> ciflow/inductor/137400 2025-09-07T07:55:57.3678987Z * [new tag] ciflow/inductor/148180 -> ciflow/inductor/148180 2025-09-07T07:55:57.3679820Z * [new tag] ciflow/inductor/148328 -> ciflow/inductor/148328 2025-09-07T07:55:57.3680602Z * [new tag] ciflow/inductor/148484 -> ciflow/inductor/148484 2025-09-07T07:55:57.3681498Z * [new tag] ciflow/inductor/148492 -> ciflow/inductor/148492 2025-09-07T07:55:57.3682358Z * [new tag] ciflow/inductor/152624 -> ciflow/inductor/152624 2025-09-07T07:55:57.3683195Z * [new tag] ciflow/inductor/154694 -> ciflow/inductor/154694 2025-09-07T07:55:57.3684253Z * [new tag] ciflow/inductor/156049 -> ciflow/inductor/156049 2025-09-07T07:55:57.3685230Z * [new tag] ciflow/inductor/156592 -> ciflow/inductor/156592 2025-09-07T07:55:57.3686113Z * [new tag] ciflow/inductor/157635 -> ciflow/inductor/157635 2025-09-07T07:55:57.3686940Z * [new tag] ciflow/inductor/157685 -> ciflow/inductor/157685 2025-09-07T07:55:57.3687797Z * [new tag] ciflow/inductor/157686 -> ciflow/inductor/157686 2025-09-07T07:55:57.3688689Z * [new tag] ciflow/inductor/157689 -> ciflow/inductor/157689 2025-09-07T07:55:57.3689586Z * [new tag] ciflow/inductor/157699 -> ciflow/inductor/157699 2025-09-07T07:55:57.3690578Z * [new tag] ciflow/inductor/157743 -> ciflow/inductor/157743 2025-09-07T07:55:57.3691555Z * [new tag] ciflow/inductor/157994 -> ciflow/inductor/157994 2025-09-07T07:55:57.3692406Z * [new tag] ciflow/inductor/158091 -> ciflow/inductor/158091 2025-09-07T07:55:57.3693307Z * [new tag] ciflow/inductor/158104 -> ciflow/inductor/158104 2025-09-07T07:55:57.3694751Z * [new tag] ciflow/inductor/158404 -> ciflow/inductor/158404 2025-09-07T07:55:57.3695456Z * [new tag] ciflow/inductor/158647 -> ciflow/inductor/158647 2025-09-07T07:55:57.3696752Z * [new tag] ciflow/inductor/158932 -> ciflow/inductor/158932 2025-09-07T07:55:57.3697582Z * [new tag] ciflow/inductor/159146 -> ciflow/inductor/159146 2025-09-07T07:55:57.3698676Z * [new tag] ciflow/inductor/159158 -> ciflow/inductor/159158 2025-09-07T07:55:57.3699559Z * [new tag] ciflow/inductor/159274 -> ciflow/inductor/159274 2025-09-07T07:55:57.3700414Z * [new tag] ciflow/inductor/159664 -> ciflow/inductor/159664 2025-09-07T07:55:57.3701439Z * [new tag] ciflow/inductor/159778 -> ciflow/inductor/159778 2025-09-07T07:55:57.3702310Z * [new tag] ciflow/inductor/159835 -> ciflow/inductor/159835 2025-09-07T07:55:57.3703370Z * [new tag] ciflow/inductor/159944 -> ciflow/inductor/159944 2025-09-07T07:55:57.3704828Z * [new tag] ciflow/inductor/160161 -> ciflow/inductor/160161 2025-09-07T07:55:57.3705665Z * [new tag] ciflow/inductor/160174 -> ciflow/inductor/160174 2025-09-07T07:55:57.3706655Z * [new tag] ciflow/inductor/160323 -> ciflow/inductor/160323 2025-09-07T07:55:57.3707872Z * [new tag] ciflow/inductor/160324 -> ciflow/inductor/160324 2025-09-07T07:55:57.3708905Z * [new tag] ciflow/inductor/160325 -> ciflow/inductor/160325 2025-09-07T07:55:57.3710131Z * [new tag] ciflow/inductor/160326 -> ciflow/inductor/160326 2025-09-07T07:55:57.3710941Z * [new tag] ciflow/inductor/160327 -> ciflow/inductor/160327 2025-09-07T07:55:57.3711993Z * [new tag] ciflow/inductor/160328 -> ciflow/inductor/160328 2025-09-07T07:55:57.3713057Z * [new tag] ciflow/inductor/160329 -> ciflow/inductor/160329 2025-09-07T07:55:57.3714187Z * [new tag] ciflow/inductor/160480 -> ciflow/inductor/160480 2025-09-07T07:55:57.3715462Z * [new tag] ciflow/inductor/160532 -> ciflow/inductor/160532 2025-09-07T07:55:57.3716910Z * [new tag] ciflow/inductor/160539 -> ciflow/inductor/160539 2025-09-07T07:55:57.3717897Z * [new tag] ciflow/inductor/160580 -> ciflow/inductor/160580 2025-09-07T07:55:57.3718825Z * [new tag] ciflow/inductor/160685 -> ciflow/inductor/160685 2025-09-07T07:55:57.3719710Z * [new tag] ciflow/inductor/160686 -> ciflow/inductor/160686 2025-09-07T07:55:57.3720619Z * [new tag] ciflow/inductor/160687 -> ciflow/inductor/160687 2025-09-07T07:55:57.3721584Z * [new tag] ciflow/inductor/160688 -> ciflow/inductor/160688 2025-09-07T07:55:57.3722475Z * [new tag] ciflow/inductor/160690 -> ciflow/inductor/160690 2025-09-07T07:55:57.3723399Z * [new tag] ciflow/inductor/160706 -> ciflow/inductor/160706 2025-09-07T07:55:57.3724838Z * [new tag] ciflow/inductor/160729 -> ciflow/inductor/160729 2025-09-07T07:55:57.3725780Z * [new tag] ciflow/inductor/160798 -> ciflow/inductor/160798 2025-09-07T07:55:57.3726819Z * [new tag] ciflow/inductor/160836 -> ciflow/inductor/160836 2025-09-07T07:55:57.3727756Z * [new tag] ciflow/inductor/160843 -> ciflow/inductor/160843 2025-09-07T07:55:57.3729060Z * [new tag] ciflow/inductor/160869 -> ciflow/inductor/160869 2025-09-07T07:55:57.3729987Z * [new tag] ciflow/inductor/160920 -> ciflow/inductor/160920 2025-09-07T07:55:57.3730872Z * [new tag] ciflow/inductor/160928 -> ciflow/inductor/160928 2025-09-07T07:55:57.3731834Z * [new tag] ciflow/inductor/160943 -> ciflow/inductor/160943 2025-09-07T07:55:57.3732770Z * [new tag] ciflow/inductor/161092 -> ciflow/inductor/161092 2025-09-07T07:55:57.3733826Z * [new tag] ciflow/inductor/161093 -> ciflow/inductor/161093 2025-09-07T07:55:57.3735235Z * [new tag] ciflow/inductor/161109 -> ciflow/inductor/161109 2025-09-07T07:55:57.3735979Z * [new tag] ciflow/inductor/161118 -> ciflow/inductor/161118 2025-09-07T07:55:57.3737467Z * [new tag] ciflow/inductor/161178 -> ciflow/inductor/161178 2025-09-07T07:55:57.3738182Z * [new tag] ciflow/inductor/161246 -> ciflow/inductor/161246 2025-09-07T07:55:57.3739131Z * [new tag] ciflow/inductor/161349 -> ciflow/inductor/161349 2025-09-07T07:55:57.3740046Z * [new tag] ciflow/inductor/161350 -> ciflow/inductor/161350 2025-09-07T07:55:57.3740968Z * [new tag] ciflow/inductor/161351 -> ciflow/inductor/161351 2025-09-07T07:55:57.3742038Z * [new tag] ciflow/inductor/161397 -> ciflow/inductor/161397 2025-09-07T07:55:57.3742998Z * [new tag] ciflow/inductor/161404 -> ciflow/inductor/161404 2025-09-07T07:55:57.3744210Z * [new tag] ciflow/inductor/161405 -> ciflow/inductor/161405 2025-09-07T07:55:57.3745286Z * [new tag] ciflow/inductor/161406 -> ciflow/inductor/161406 2025-09-07T07:55:57.3746514Z * [new tag] ciflow/inductor/161410 -> ciflow/inductor/161410 2025-09-07T07:55:57.3747371Z * [new tag] ciflow/inductor/161414 -> ciflow/inductor/161414 2025-09-07T07:55:57.3748728Z * [new tag] ciflow/inductor/161442 -> ciflow/inductor/161442 2025-09-07T07:55:57.3749567Z * [new tag] ciflow/inductor/161458 -> ciflow/inductor/161458 2025-09-07T07:55:57.3750542Z * [new tag] ciflow/inductor/161468 -> ciflow/inductor/161468 2025-09-07T07:55:57.3751517Z * [new tag] ciflow/inductor/161469 -> ciflow/inductor/161469 2025-09-07T07:55:57.3752750Z * [new tag] ciflow/inductor/161485 -> ciflow/inductor/161485 2025-09-07T07:55:57.3753620Z * [new tag] ciflow/inductor/161499 -> ciflow/inductor/161499 2025-09-07T07:55:57.3755018Z * [new tag] ciflow/inductor/161534 -> ciflow/inductor/161534 2025-09-07T07:55:57.3755832Z * [new tag] ciflow/inductor/161595 -> ciflow/inductor/161595 2025-09-07T07:55:57.3756804Z * [new tag] ciflow/inductor/161596 -> ciflow/inductor/161596 2025-09-07T07:55:57.3758430Z * [new tag] ciflow/inductor/161630 -> ciflow/inductor/161630 2025-09-07T07:55:57.3759334Z * [new tag] ciflow/inductor/161667 -> ciflow/inductor/161667 2025-09-07T07:55:57.3760299Z * [new tag] ciflow/inductor/161670 -> ciflow/inductor/161670 2025-09-07T07:55:57.3761270Z * [new tag] ciflow/inductor/161673 -> ciflow/inductor/161673 2025-09-07T07:55:57.3762219Z * [new tag] ciflow/inductor/161674 -> ciflow/inductor/161674 2025-09-07T07:55:57.3763223Z * [new tag] ciflow/inductor/161675 -> ciflow/inductor/161675 2025-09-07T07:55:57.3764620Z * [new tag] ciflow/inductor/161693 -> ciflow/inductor/161693 2025-09-07T07:55:57.3765436Z * [new tag] ciflow/inductor/161695 -> ciflow/inductor/161695 2025-09-07T07:55:57.3766435Z * [new tag] ciflow/inductor/161715 -> ciflow/inductor/161715 2025-09-07T07:55:57.3767413Z * [new tag] ciflow/inductor/161730 -> ciflow/inductor/161730 2025-09-07T07:55:57.3768400Z * [new tag] ciflow/inductor/161732 -> ciflow/inductor/161732 2025-09-07T07:55:57.3769611Z * [new tag] ciflow/inductor/161744 -> ciflow/inductor/161744 2025-09-07T07:55:57.3770572Z * [new tag] ciflow/inductor/161746 -> ciflow/inductor/161746 2025-09-07T07:55:57.3771576Z * [new tag] ciflow/inductor/161747 -> ciflow/inductor/161747 2025-09-07T07:55:57.3772534Z * [new tag] ciflow/inductor/161819 -> ciflow/inductor/161819 2025-09-07T07:55:57.3773489Z * [new tag] ciflow/inductor/161821 -> ciflow/inductor/161821 2025-09-07T07:55:57.3775076Z * [new tag] ciflow/inductor/161828 -> ciflow/inductor/161828 2025-09-07T07:55:57.3775776Z * [new tag] ciflow/inductor/161879 -> ciflow/inductor/161879 2025-09-07T07:55:57.3776758Z * [new tag] ciflow/inductor/161880 -> ciflow/inductor/161880 2025-09-07T07:55:57.3777753Z * [new tag] ciflow/inductor/161881 -> ciflow/inductor/161881 2025-09-07T07:55:57.3779019Z * [new tag] ciflow/inductor/161907 -> ciflow/inductor/161907 2025-09-07T07:55:57.3779935Z * [new tag] ciflow/inductor/161914 -> ciflow/inductor/161914 2025-09-07T07:55:57.3781261Z * [new tag] ciflow/inductor/161924 -> ciflow/inductor/161924 2025-09-07T07:55:57.3782273Z * [new tag] ciflow/inductor/161936 -> ciflow/inductor/161936 2025-09-07T07:55:57.3783531Z * [new tag] ciflow/inductor/161938 -> ciflow/inductor/161938 2025-09-07T07:55:57.3784671Z * [new tag] ciflow/inductor/161939 -> ciflow/inductor/161939 2025-09-07T07:55:57.3785682Z * [new tag] ciflow/inductor/161940 -> ciflow/inductor/161940 2025-09-07T07:55:57.3786650Z * [new tag] ciflow/inductor/161955 -> ciflow/inductor/161955 2025-09-07T07:55:57.3787702Z * [new tag] ciflow/inductor/161957 -> ciflow/inductor/161957 2025-09-07T07:55:57.3788706Z * [new tag] ciflow/inductor/161975 -> ciflow/inductor/161975 2025-09-07T07:55:57.3794227Z * [new tag] ciflow/inductor/161977 -> ciflow/inductor/161977 2025-09-07T07:55:57.3794607Z * [new tag] ciflow/inductor/161978 -> ciflow/inductor/161978 2025-09-07T07:55:57.3794958Z * [new tag] ciflow/inductor/161979 -> ciflow/inductor/161979 2025-09-07T07:55:57.3795310Z * [new tag] ciflow/inductor/161980 -> ciflow/inductor/161980 2025-09-07T07:55:57.3795641Z * [new tag] ciflow/inductor/161988 -> ciflow/inductor/161988 2025-09-07T07:55:57.3795989Z * [new tag] ciflow/inductor/161994 -> ciflow/inductor/161994 2025-09-07T07:55:57.3796337Z * [new tag] ciflow/inductor/162013 -> ciflow/inductor/162013 2025-09-07T07:55:57.3797457Z * [new tag] ciflow/inductor/162014 -> ciflow/inductor/162014 2025-09-07T07:55:57.3798490Z * [new tag] ciflow/inductor/162017 -> ciflow/inductor/162017 2025-09-07T07:55:57.3799507Z * [new tag] ciflow/inductor/162021 -> ciflow/inductor/162021 2025-09-07T07:55:57.3800532Z * [new tag] ciflow/inductor/162023 -> ciflow/inductor/162023 2025-09-07T07:55:57.3801563Z * [new tag] ciflow/inductor/162027 -> ciflow/inductor/162027 2025-09-07T07:55:57.3802588Z * [new tag] ciflow/inductor/162029 -> ciflow/inductor/162029 2025-09-07T07:55:57.3803632Z * [new tag] ciflow/inductor/162030 -> ciflow/inductor/162030 2025-09-07T07:55:57.3805113Z * [new tag] ciflow/inductor/162031 -> ciflow/inductor/162031 2025-09-07T07:55:57.3805991Z * [new tag] ciflow/inductor/162033 -> ciflow/inductor/162033 2025-09-07T07:55:57.3807320Z * [new tag] ciflow/inductor/162052 -> ciflow/inductor/162052 2025-09-07T07:55:57.3808291Z * [new tag] ciflow/inductor/162053 -> ciflow/inductor/162053 2025-09-07T07:55:57.3809330Z * [new tag] ciflow/inductor/162056 -> ciflow/inductor/162056 2025-09-07T07:55:57.3810583Z * [new tag] ciflow/inductor/162063 -> ciflow/inductor/162063 2025-09-07T07:55:57.3811500Z * [new tag] ciflow/inductor/162066 -> ciflow/inductor/162066 2025-09-07T07:55:57.3812551Z * [new tag] ciflow/inductor/162068 -> ciflow/inductor/162068 2025-09-07T07:55:57.3814028Z * [new tag] ciflow/inductor/162081 -> ciflow/inductor/162081 2025-09-07T07:55:57.3815457Z * [new tag] ciflow/inductor/162088 -> ciflow/inductor/162088 2025-09-07T07:55:57.3816202Z * [new tag] ciflow/inductor/162089 -> ciflow/inductor/162089 2025-09-07T07:55:57.3817277Z * [new tag] ciflow/inductor/162094 -> ciflow/inductor/162094 2025-09-07T07:55:57.3818323Z * [new tag] ciflow/inductor/162098 -> ciflow/inductor/162098 2025-09-07T07:55:57.3819360Z * [new tag] ciflow/inductor/162101 -> ciflow/inductor/162101 2025-09-07T07:55:57.3820405Z * [new tag] ciflow/inductor/162102 -> ciflow/inductor/162102 2025-09-07T07:55:57.3821462Z * [new tag] ciflow/inductor/162104 -> ciflow/inductor/162104 2025-09-07T07:55:57.3822726Z * [new tag] ciflow/inductor/162106 -> ciflow/inductor/162106 2025-09-07T07:55:57.3823565Z * [new tag] ciflow/inductor/162108 -> ciflow/inductor/162108 2025-09-07T07:55:57.3825095Z * [new tag] ciflow/inductor/162126 -> ciflow/inductor/162126 2025-09-07T07:55:57.3826047Z * [new tag] ciflow/inductor/162149 -> ciflow/inductor/162149 2025-09-07T07:55:57.3827115Z * [new tag] ciflow/inductor/162164 -> ciflow/inductor/162164 2025-09-07T07:55:57.3828306Z * [new tag] ciflow/inductor/162166 -> ciflow/inductor/162166 2025-09-07T07:55:57.3829309Z * [new tag] ciflow/inductor/162169 -> ciflow/inductor/162169 2025-09-07T07:55:57.3830356Z * [new tag] ciflow/inductor/162170 -> ciflow/inductor/162170 2025-09-07T07:55:57.3831605Z * [new tag] ciflow/inductor/162171 -> ciflow/inductor/162171 2025-09-07T07:55:57.3832536Z * [new tag] ciflow/inductor/162183 -> ciflow/inductor/162183 2025-09-07T07:55:57.3833554Z * [new tag] ciflow/inductor/162189 -> ciflow/inductor/162189 2025-09-07T07:55:57.3834997Z * [new tag] ciflow/inductor/162190 -> ciflow/inductor/162190 2025-09-07T07:55:57.3835962Z * [new tag] ciflow/inductor/162191 -> ciflow/inductor/162191 2025-09-07T07:55:57.3837024Z * [new tag] ciflow/inductor/162194 -> ciflow/inductor/162194 2025-09-07T07:55:57.3838511Z * [new tag] ciflow/inductor/162200 -> ciflow/inductor/162200 2025-09-07T07:55:57.3839560Z * [new tag] ciflow/inductor/162201 -> ciflow/inductor/162201 2025-09-07T07:55:57.3840754Z * [new tag] ciflow/inductor/162208 -> ciflow/inductor/162208 2025-09-07T07:55:57.3841977Z * [new tag] ciflow/inductor/162211 -> ciflow/inductor/162211 2025-09-07T07:55:57.3842989Z * [new tag] ciflow/inductor/162216 -> ciflow/inductor/162216 2025-09-07T07:55:57.3844382Z * [new tag] ciflow/inductor/162220 -> ciflow/inductor/162220 2025-09-07T07:55:57.3845685Z * [new tag] ciflow/inductor/162222 -> ciflow/inductor/162222 2025-09-07T07:55:57.3846687Z * [new tag] ciflow/inductor/162227 -> ciflow/inductor/162227 2025-09-07T07:55:57.3847698Z * [new tag] ciflow/inductor/162238 -> ciflow/inductor/162238 2025-09-07T07:55:57.3848876Z * [new tag] ciflow/inductor/162239 -> ciflow/inductor/162239 2025-09-07T07:55:57.3849896Z * [new tag] ciflow/inductor/162240 -> ciflow/inductor/162240 2025-09-07T07:55:57.3851124Z * [new tag] ciflow/inductor/162244 -> ciflow/inductor/162244 2025-09-07T07:55:57.3852153Z * [new tag] ciflow/inductor/162245 -> ciflow/inductor/162245 2025-09-07T07:55:57.3853348Z * [new tag] ciflow/inductor/162262 -> ciflow/inductor/162262 2025-09-07T07:55:57.3854739Z * [new tag] ciflow/inductor/162275 -> ciflow/inductor/162275 2025-09-07T07:55:57.3855684Z * [new tag] ciflow/inductor/162278 -> ciflow/inductor/162278 2025-09-07T07:55:57.3857061Z * [new tag] ciflow/inductor/162284 -> ciflow/inductor/162284 2025-09-07T07:55:57.3857917Z * [new tag] ciflow/inductor/162286 -> ciflow/inductor/162286 2025-09-07T07:55:57.3859159Z * [new tag] ciflow/inductor/162288 -> ciflow/inductor/162288 2025-09-07T07:55:57.3860096Z * [new tag] ciflow/inductor/162293 -> ciflow/inductor/162293 2025-09-07T07:55:57.3861300Z * [new tag] ciflow/inductor/162294 -> ciflow/inductor/162294 2025-09-07T07:55:57.3862332Z * [new tag] ciflow/inductor/162295 -> ciflow/inductor/162295 2025-09-07T07:55:57.3863528Z * [new tag] ciflow/inductor/162296 -> ciflow/inductor/162296 2025-09-07T07:55:57.3864985Z * [new tag] ciflow/inductor/162298 -> ciflow/inductor/162298 2025-09-07T07:55:57.3865997Z * [new tag] ciflow/inductor/162307 -> ciflow/inductor/162307 2025-09-07T07:55:57.3867227Z * [new tag] ciflow/inductor/162309 -> ciflow/inductor/162309 2025-09-07T07:55:57.3868264Z * [new tag] ciflow/inductor/162311 -> ciflow/inductor/162311 2025-09-07T07:55:57.3869494Z * [new tag] ciflow/inductor/162312 -> ciflow/inductor/162312 2025-09-07T07:55:57.3870508Z * [new tag] ciflow/inductor/162315 -> ciflow/inductor/162315 2025-09-07T07:55:57.3871707Z * [new tag] ciflow/inductor/162316 -> ciflow/inductor/162316 2025-09-07T07:55:57.3872905Z * [new tag] ciflow/inductor/162318 -> ciflow/inductor/162318 2025-09-07T07:55:57.3874085Z * [new tag] ciflow/inductor/162323 -> ciflow/inductor/162323 2025-09-07T07:55:57.3875439Z * [new tag] ciflow/inductor/162341 -> ciflow/inductor/162341 2025-09-07T07:55:57.3876401Z * [new tag] ciflow/inductor/162345 -> ciflow/inductor/162345 2025-09-07T07:55:57.3877929Z * [new tag] ciflow/inductor/3b9a386 -> ciflow/inductor/3b9a386 2025-09-07T07:55:57.3879217Z * [new tag] ciflow/inductor/3d4b92b -> ciflow/inductor/3d4b92b 2025-09-07T07:55:57.3880311Z * [new tag] ciflow/inductor/d224ac7 -> ciflow/inductor/d224ac7 2025-09-07T07:55:57.3881618Z * [new tag] ciflow/linux-aarch64/157994 -> ciflow/linux-aarch64/157994 2025-09-07T07:55:57.3882408Z * [new tag] ciflow/linux-aarch64/159737 -> ciflow/linux-aarch64/159737 2025-09-07T07:55:57.3883239Z * [new tag] ciflow/linux-aarch64/160078 -> ciflow/linux-aarch64/160078 2025-09-07T07:55:57.3884832Z * [new tag] ciflow/mps/157553 -> ciflow/mps/157553 2025-09-07T07:55:57.3885595Z * [new tag] ciflow/mps/157635 -> ciflow/mps/157635 2025-09-07T07:55:57.3886403Z * [new tag] ciflow/mps/161988 -> ciflow/mps/161988 2025-09-07T07:55:57.3887226Z * [new tag] ciflow/mps/162108 -> ciflow/mps/162108 2025-09-07T07:55:57.3888087Z * [new tag] ciflow/mps/162153 -> ciflow/mps/162153 2025-09-07T07:55:57.3888921Z * [new tag] ciflow/mps/162281 -> ciflow/mps/162281 2025-09-07T07:55:57.3890217Z * [new tag] ciflow/nightly/156049 -> ciflow/nightly/156049 2025-09-07T07:55:57.3890935Z * [new tag] ciflow/nightly/158104 -> ciflow/nightly/158104 2025-09-07T07:55:57.3892278Z * [new tag] ciflow/op-benchmark/157994 -> ciflow/op-benchmark/157994 2025-09-07T07:55:57.3893679Z * [new tag] ciflow/periodic-rocm-mi300/161529 -> ciflow/periodic-rocm-mi300/161529 2025-09-07T07:55:57.3894784Z * [new tag] ciflow/periodic-rocm-mi300/161715 -> ciflow/periodic-rocm-mi300/161715 2025-09-07T07:55:57.3896270Z * [new tag] ciflow/periodic/054a2fd -> ciflow/periodic/054a2fd 2025-09-07T07:55:57.3897236Z * [new tag] ciflow/periodic/156703 -> ciflow/periodic/156703 2025-09-07T07:55:57.3897893Z * [new tag] ciflow/periodic/161715 -> ciflow/periodic/161715 2025-09-07T07:55:57.3898745Z * [new tag] ciflow/periodic/162021 -> ciflow/periodic/162021 2025-09-07T07:55:57.3899566Z * [new tag] ciflow/periodic/162323 -> ciflow/periodic/162323 2025-09-07T07:55:57.3900843Z * [new tag] ciflow/periodic/2a6d37d -> ciflow/periodic/2a6d37d 2025-09-07T07:55:57.3901623Z * [new tag] ciflow/periodic/317eeb8 -> ciflow/periodic/317eeb8 2025-09-07T07:55:57.3902545Z * [new tag] ciflow/periodic/3c32 -> ciflow/periodic/3c32 2025-09-07T07:55:57.3903595Z * [new tag] ciflow/periodic/3e98831 -> ciflow/periodic/3e98831 2025-09-07T07:55:57.3905247Z * [new tag] ciflow/periodic/94512-point -> ciflow/periodic/94512-point 2025-09-07T07:55:57.3906604Z * [new tag] ciflow/periodic/csl/test87519 -> ciflow/periodic/csl/test87519 2025-09-07T07:55:57.3907606Z * [new tag] ciflow/periodic/csltest88275 -> ciflow/periodic/csltest88275 2025-09-07T07:55:57.3908684Z * [new tag] ciflow/periodic/csltest88761 -> ciflow/periodic/csltest88761 2025-09-07T07:55:57.3909804Z * [new tag] ciflow/periodic/release_1.12 -> ciflow/periodic/release_1.12 2025-09-07T07:55:57.3911014Z * [new tag] ciflow/periodic/release_1.12.0 -> ciflow/periodic/release_1.12.0 2025-09-07T07:55:57.3912128Z * [new tag] ciflow/periodic/sha-ec5b83 -> ciflow/periodic/sha-ec5b83 2025-09-07T07:55:57.3913461Z * [new tag] ciflow/rocm-mi300/154170 -> ciflow/rocm-mi300/154170 2025-09-07T07:55:57.3914690Z * [new tag] ciflow/rocm-mi300/158747 -> ciflow/rocm-mi300/158747 2025-09-07T07:55:57.3915477Z * [new tag] ciflow/rocm-mi300/159146 -> ciflow/rocm-mi300/159146 2025-09-07T07:55:57.3916339Z * [new tag] ciflow/rocm-mi300/159158 -> ciflow/rocm-mi300/159158 2025-09-07T07:55:57.3917276Z * [new tag] ciflow/rocm-mi300/161715 -> ciflow/rocm-mi300/161715 2025-09-07T07:55:57.3918168Z * [new tag] ciflow/rocm-mi300/161957 -> ciflow/rocm-mi300/161957 2025-09-07T07:55:57.3919001Z * [new tag] ciflow/rocm-mi300/162053 -> ciflow/rocm-mi300/162053 2025-09-07T07:55:57.3919834Z * [new tag] ciflow/rocm-mi300/162056 -> ciflow/rocm-mi300/162056 2025-09-07T07:55:57.3921138Z * [new tag] ciflow/rocm-mi300/162112 -> ciflow/rocm-mi300/162112 2025-09-07T07:55:57.3921826Z * [new tag] ciflow/rocm-mi300/162245 -> ciflow/rocm-mi300/162245 2025-09-07T07:55:57.3922692Z * [new tag] ciflow/rocm-mi300/162278 -> ciflow/rocm-mi300/162278 2025-09-07T07:55:57.3923534Z * [new tag] ciflow/rocm-mi300/162288 -> ciflow/rocm-mi300/162288 2025-09-07T07:55:57.3925177Z * [new tag] ciflow/rocm-mi355/162053 -> ciflow/rocm-mi355/162053 2025-09-07T07:55:57.3925956Z * [new tag] ciflow/rocm-mi355/162056 -> ciflow/rocm-mi355/162056 2025-09-07T07:55:57.3927124Z * [new tag] ciflow/rocm/148492 -> ciflow/rocm/148492 2025-09-07T07:55:57.3927936Z * [new tag] ciflow/rocm/154170 -> ciflow/rocm/154170 2025-09-07T07:55:57.3928948Z * [new tag] ciflow/rocm/156491 -> ciflow/rocm/156491 2025-09-07T07:55:57.3929818Z * [new tag] ciflow/rocm/156592 -> ciflow/rocm/156592 2025-09-07T07:55:57.3930633Z * [new tag] ciflow/rocm/158747 -> ciflow/rocm/158747 2025-09-07T07:55:57.3931504Z * [new tag] ciflow/rocm/159146 -> ciflow/rocm/159146 2025-09-07T07:55:57.3932568Z * [new tag] ciflow/rocm/159158 -> ciflow/rocm/159158 2025-09-07T07:55:57.3933650Z * [new tag] ciflow/rocm/161715 -> ciflow/rocm/161715 2025-09-07T07:55:57.3934670Z * [new tag] ciflow/rocm/161972 -> ciflow/rocm/161972 2025-09-07T07:55:57.3935500Z * [new tag] ciflow/rocm/162052 -> ciflow/rocm/162052 2025-09-07T07:55:57.3936348Z * [new tag] ciflow/rocm/162053 -> ciflow/rocm/162053 2025-09-07T07:55:57.3937240Z * [new tag] ciflow/rocm/162056 -> ciflow/rocm/162056 2025-09-07T07:55:57.3938107Z * [new tag] ciflow/rocm/162112 -> ciflow/rocm/162112 2025-09-07T07:55:57.3938948Z * [new tag] ciflow/rocm/162278 -> ciflow/rocm/162278 2025-09-07T07:55:57.3939801Z * [new tag] ciflow/rocm/162288 -> ciflow/rocm/162288 2025-09-07T07:55:57.3940673Z * [new tag] ciflow/rocm/162305 -> ciflow/rocm/162305 2025-09-07T07:55:57.3942155Z * [new tag] ciflow/slow/01c7106 -> ciflow/slow/01c7106 2025-09-07T07:55:57.3943075Z * [new tag] ciflow/slow/0577043 -> ciflow/slow/0577043 2025-09-07T07:55:57.3944811Z * [new tag] ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym -> ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym 2025-09-07T07:55:57.3945552Z * [new tag] ciflow/slow/0e81104 -> ciflow/slow/0e81104 2025-09-07T07:55:57.3946434Z * [new tag] ciflow/slow/161395 -> ciflow/slow/161395 2025-09-07T07:55:57.3947455Z * [new tag] ciflow/slow/1732077 -> ciflow/slow/1732077 2025-09-07T07:55:57.3948464Z * [new tag] ciflow/slow/187eb7c -> ciflow/slow/187eb7c 2025-09-07T07:55:57.3949495Z * [new tag] ciflow/slow/1faef89 -> ciflow/slow/1faef89 2025-09-07T07:55:57.3950516Z * [new tag] ciflow/slow/3920ec1 -> ciflow/slow/3920ec1 2025-09-07T07:55:57.3951438Z * [new tag] ciflow/slow/3b7c6b2 -> ciflow/slow/3b7c6b2 2025-09-07T07:55:57.3952427Z * [new tag] ciflow/slow/59a3759 -> ciflow/slow/59a3759 2025-09-07T07:55:57.3953396Z * [new tag] ciflow/slow/70ef0bb -> ciflow/slow/70ef0bb 2025-09-07T07:55:57.3954907Z * [new tag] ciflow/slow/788ff06 -> ciflow/slow/788ff06 2025-09-07T07:55:57.3956076Z * [new tag] ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym -> ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym 2025-09-07T07:55:57.3956889Z * [new tag] ciflow/slow/9d85864 -> ciflow/slow/9d85864 2025-09-07T07:55:57.3957991Z * [new tag] ciflow/slow/9ffad5b -> ciflow/slow/9ffad5b 2025-09-07T07:55:57.3959004Z * [new tag] ciflow/slow/a206e8b -> ciflow/slow/a206e8b 2025-09-07T07:55:57.3959994Z * [new tag] ciflow/slow/a837609 -> ciflow/slow/a837609 2025-09-07T07:55:57.3961257Z * [new tag] ciflow/slow/af841f3 -> ciflow/slow/af841f3 2025-09-07T07:55:57.3962464Z * [new tag] ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym -> ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym 2025-09-07T07:55:57.3963427Z * [new tag] ciflow/triton_binaries/162329 -> ciflow/triton_binaries/162329 2025-09-07T07:55:57.3965009Z * [new tag] ciflow/trunk/113258 -> ciflow/trunk/113258 2025-09-07T07:55:57.3965733Z * [new tag] ciflow/trunk/137400 -> ciflow/trunk/137400 2025-09-07T07:55:57.3966522Z * [new tag] ciflow/trunk/148180 -> ciflow/trunk/148180 2025-09-07T07:55:57.3967375Z * [new tag] ciflow/trunk/148328 -> ciflow/trunk/148328 2025-09-07T07:55:57.3968182Z * [new tag] ciflow/trunk/148492 -> ciflow/trunk/148492 2025-09-07T07:55:57.3969405Z * [new tag] ciflow/trunk/148919 -> ciflow/trunk/148919 2025-09-07T07:55:57.3970321Z * [new tag] ciflow/trunk/152624 -> ciflow/trunk/152624 2025-09-07T07:55:57.3971039Z * [new tag] ciflow/trunk/154170 -> ciflow/trunk/154170 2025-09-07T07:55:57.3971892Z * [new tag] ciflow/trunk/154694 -> ciflow/trunk/154694 2025-09-07T07:55:57.3972794Z * [new tag] ciflow/trunk/156049 -> ciflow/trunk/156049 2025-09-07T07:55:57.3973612Z * [new tag] ciflow/trunk/156703 -> ciflow/trunk/156703 2025-09-07T07:55:57.3974738Z * [new tag] ciflow/trunk/156711 -> ciflow/trunk/156711 2025-09-07T07:55:57.3975583Z * [new tag] ciflow/trunk/157432 -> ciflow/trunk/157432 2025-09-07T07:55:57.3976479Z * [new tag] ciflow/trunk/157685 -> ciflow/trunk/157685 2025-09-07T07:55:57.3977335Z * [new tag] ciflow/trunk/157689 -> ciflow/trunk/157689 2025-09-07T07:55:57.3978168Z * [new tag] ciflow/trunk/157699 -> ciflow/trunk/157699 2025-09-07T07:55:57.3979023Z * [new tag] ciflow/trunk/157813 -> ciflow/trunk/157813 2025-09-07T07:55:57.3979917Z * [new tag] ciflow/trunk/157994 -> ciflow/trunk/157994 2025-09-07T07:55:57.3980744Z * [new tag] ciflow/trunk/158091 -> ciflow/trunk/158091 2025-09-07T07:55:57.3981607Z * [new tag] ciflow/trunk/158104 -> ciflow/trunk/158104 2025-09-07T07:55:57.3982574Z * [new tag] ciflow/trunk/158404 -> ciflow/trunk/158404 2025-09-07T07:55:57.3983495Z * [new tag] ciflow/trunk/158647 -> ciflow/trunk/158647 2025-09-07T07:55:57.3985035Z * [new tag] ciflow/trunk/158846 -> ciflow/trunk/158846 2025-09-07T07:55:57.3985701Z * [new tag] ciflow/trunk/159158 -> ciflow/trunk/159158 2025-09-07T07:55:57.3986716Z * [new tag] ciflow/trunk/159682 -> ciflow/trunk/159682 2025-09-07T07:55:57.3987655Z * [new tag] ciflow/trunk/159835 -> ciflow/trunk/159835 2025-09-07T07:55:57.3988517Z * [new tag] ciflow/trunk/160161 -> ciflow/trunk/160161 2025-09-07T07:55:57.3989362Z * [new tag] ciflow/trunk/160236 -> ciflow/trunk/160236 2025-09-07T07:55:57.3990222Z * [new tag] ciflow/trunk/160329 -> ciflow/trunk/160329 2025-09-07T07:55:57.3991102Z * [new tag] ciflow/trunk/160480 -> ciflow/trunk/160480 2025-09-07T07:55:57.3991974Z * [new tag] ciflow/trunk/160532 -> ciflow/trunk/160532 2025-09-07T07:55:57.3992860Z * [new tag] ciflow/trunk/160836 -> ciflow/trunk/160836 2025-09-07T07:55:57.3993904Z * [new tag] ciflow/trunk/160843 -> ciflow/trunk/160843 2025-09-07T07:55:57.3995138Z * [new tag] ciflow/trunk/160869 -> ciflow/trunk/160869 2025-09-07T07:55:57.3995808Z * [new tag] ciflow/trunk/160928 -> ciflow/trunk/160928 2025-09-07T07:55:57.3996841Z * [new tag] ciflow/trunk/160940 -> ciflow/trunk/160940 2025-09-07T07:55:57.3997848Z * [new tag] ciflow/trunk/160943 -> ciflow/trunk/160943 2025-09-07T07:55:57.3999020Z * [new tag] ciflow/trunk/160953 -> ciflow/trunk/160953 2025-09-07T07:55:57.4000023Z * [new tag] ciflow/trunk/161035 -> ciflow/trunk/161035 2025-09-07T07:55:57.4000931Z * [new tag] ciflow/trunk/161178 -> ciflow/trunk/161178 2025-09-07T07:55:57.4001787Z * [new tag] ciflow/trunk/161349 -> ciflow/trunk/161349 2025-09-07T07:55:57.4002702Z * [new tag] ciflow/trunk/161350 -> ciflow/trunk/161350 2025-09-07T07:55:57.4003594Z * [new tag] ciflow/trunk/161351 -> ciflow/trunk/161351 2025-09-07T07:55:57.4005136Z * [new tag] ciflow/trunk/161395 -> ciflow/trunk/161395 2025-09-07T07:55:57.4005695Z * [new tag] ciflow/trunk/161405 -> ciflow/trunk/161405 2025-09-07T07:55:57.4006566Z * [new tag] ciflow/trunk/161406 -> ciflow/trunk/161406 2025-09-07T07:55:57.4007489Z * [new tag] ciflow/trunk/161410 -> ciflow/trunk/161410 2025-09-07T07:55:57.4008391Z * [new tag] ciflow/trunk/161468 -> ciflow/trunk/161468 2025-09-07T07:55:57.4009297Z * [new tag] ciflow/trunk/161499 -> ciflow/trunk/161499 2025-09-07T07:55:57.4010525Z * [new tag] ciflow/trunk/161527 -> ciflow/trunk/161527 2025-09-07T07:55:57.4011366Z * [new tag] ciflow/trunk/161534 -> ciflow/trunk/161534 2025-09-07T07:55:57.4012235Z * [new tag] ciflow/trunk/161591 -> ciflow/trunk/161591 2025-09-07T07:55:57.4013179Z * [new tag] ciflow/trunk/161595 -> ciflow/trunk/161595 2025-09-07T07:55:57.4014594Z * [new tag] ciflow/trunk/161596 -> ciflow/trunk/161596 2025-09-07T07:55:57.4015353Z * [new tag] ciflow/trunk/161633 -> ciflow/trunk/161633 2025-09-07T07:55:57.4016228Z * [new tag] ciflow/trunk/161634 -> ciflow/trunk/161634 2025-09-07T07:55:57.4017144Z * [new tag] ciflow/trunk/161635 -> ciflow/trunk/161635 2025-09-07T07:55:57.4018086Z * [new tag] ciflow/trunk/161667 -> ciflow/trunk/161667 2025-09-07T07:55:57.4018992Z * [new tag] ciflow/trunk/161670 -> ciflow/trunk/161670 2025-09-07T07:55:57.4019985Z * [new tag] ciflow/trunk/161692 -> ciflow/trunk/161692 2025-09-07T07:55:57.4020965Z * [new tag] ciflow/trunk/161693 -> ciflow/trunk/161693 2025-09-07T07:55:57.4021902Z * [new tag] ciflow/trunk/161695 -> ciflow/trunk/161695 2025-09-07T07:55:57.4022754Z * [new tag] ciflow/trunk/161730 -> ciflow/trunk/161730 2025-09-07T07:55:57.4023653Z * [new tag] ciflow/trunk/161744 -> ciflow/trunk/161744 2025-09-07T07:55:57.4025004Z * [new tag] ciflow/trunk/161749 -> ciflow/trunk/161749 2025-09-07T07:55:57.4025799Z * [new tag] ciflow/trunk/161881 -> ciflow/trunk/161881 2025-09-07T07:55:57.4026709Z * [new tag] ciflow/trunk/161924 -> ciflow/trunk/161924 2025-09-07T07:55:57.4027924Z * [new tag] ciflow/trunk/161926 -> ciflow/trunk/161926 2025-09-07T07:55:57.4028770Z * [new tag] ciflow/trunk/161936 -> ciflow/trunk/161936 2025-09-07T07:55:57.4029711Z * [new tag] ciflow/trunk/161952 -> ciflow/trunk/161952 2025-09-07T07:55:57.4030670Z * [new tag] ciflow/trunk/161955 -> ciflow/trunk/161955 2025-09-07T07:55:57.4031578Z * [new tag] ciflow/trunk/161957 -> ciflow/trunk/161957 2025-09-07T07:55:57.4032568Z * [new tag] ciflow/trunk/161959 -> ciflow/trunk/161959 2025-09-07T07:55:57.4033521Z * [new tag] ciflow/trunk/161977 -> ciflow/trunk/161977 2025-09-07T07:55:57.4034967Z * [new tag] ciflow/trunk/161988 -> ciflow/trunk/161988 2025-09-07T07:55:57.4035798Z * [new tag] ciflow/trunk/161994 -> ciflow/trunk/161994 2025-09-07T07:55:57.4036855Z * [new tag] ciflow/trunk/162007 -> ciflow/trunk/162007 2025-09-07T07:55:57.4037939Z * [new tag] ciflow/trunk/162013 -> ciflow/trunk/162013 2025-09-07T07:55:57.4038877Z * [new tag] ciflow/trunk/162017 -> ciflow/trunk/162017 2025-09-07T07:55:57.4039819Z * [new tag] ciflow/trunk/162021 -> ciflow/trunk/162021 2025-09-07T07:55:57.4040753Z * [new tag] ciflow/trunk/162022 -> ciflow/trunk/162022 2025-09-07T07:55:57.4041872Z * [new tag] ciflow/trunk/162040 -> ciflow/trunk/162040 2025-09-07T07:55:57.4042673Z * [new tag] ciflow/trunk/162041 -> ciflow/trunk/162041 2025-09-07T07:55:57.4044034Z * [new tag] ciflow/trunk/162062 -> ciflow/trunk/162062 2025-09-07T07:55:57.4045065Z * [new tag] ciflow/trunk/162066 -> ciflow/trunk/162066 2025-09-07T07:55:57.4046030Z * [new tag] ciflow/trunk/162089 -> ciflow/trunk/162089 2025-09-07T07:55:57.4047015Z * [new tag] ciflow/trunk/162099 -> ciflow/trunk/162099 2025-09-07T07:55:57.4048018Z * [new tag] ciflow/trunk/162104 -> ciflow/trunk/162104 2025-09-07T07:55:57.4048995Z * [new tag] ciflow/trunk/162106 -> ciflow/trunk/162106 2025-09-07T07:55:57.4049957Z * [new tag] ciflow/trunk/162112 -> ciflow/trunk/162112 2025-09-07T07:55:57.4050940Z * [new tag] ciflow/trunk/162119 -> ciflow/trunk/162119 2025-09-07T07:55:57.4051895Z * [new tag] ciflow/trunk/162142 -> ciflow/trunk/162142 2025-09-07T07:55:57.4052876Z * [new tag] ciflow/trunk/162169 -> ciflow/trunk/162169 2025-09-07T07:55:57.4053986Z * [new tag] ciflow/trunk/162183 -> ciflow/trunk/162183 2025-09-07T07:55:57.4055073Z * [new tag] ciflow/trunk/162190 -> ciflow/trunk/162190 2025-09-07T07:55:57.4056039Z * [new tag] ciflow/trunk/162194 -> ciflow/trunk/162194 2025-09-07T07:55:57.4056990Z * [new tag] ciflow/trunk/162200 -> ciflow/trunk/162200 2025-09-07T07:55:57.4058187Z * [new tag] ciflow/trunk/162206 -> ciflow/trunk/162206 2025-09-07T07:55:57.4059061Z * [new tag] ciflow/trunk/162208 -> ciflow/trunk/162208 2025-09-07T07:55:57.4060021Z * [new tag] ciflow/trunk/162222 -> ciflow/trunk/162222 2025-09-07T07:55:57.4060994Z * [new tag] ciflow/trunk/162238 -> ciflow/trunk/162238 2025-09-07T07:55:57.4061977Z * [new tag] ciflow/trunk/162244 -> ciflow/trunk/162244 2025-09-07T07:55:57.4063344Z * [new tag] ciflow/trunk/162267 -> ciflow/trunk/162267 2025-09-07T07:55:57.4064752Z * [new tag] ciflow/trunk/162269 -> ciflow/trunk/162269 2025-09-07T07:55:57.4065666Z * [new tag] ciflow/trunk/162278 -> ciflow/trunk/162278 2025-09-07T07:55:57.4066679Z * [new tag] ciflow/trunk/162286 -> ciflow/trunk/162286 2025-09-07T07:55:57.4067643Z * [new tag] ciflow/trunk/162288 -> ciflow/trunk/162288 2025-09-07T07:55:57.4068628Z * [new tag] ciflow/trunk/162293 -> ciflow/trunk/162293 2025-09-07T07:55:57.4069847Z * [new tag] ciflow/trunk/162310 -> ciflow/trunk/162310 2025-09-07T07:55:57.4070673Z * [new tag] ciflow/trunk/162311 -> ciflow/trunk/162311 2025-09-07T07:55:57.4071652Z * [new tag] ciflow/trunk/162315 -> ciflow/trunk/162315 2025-09-07T07:55:57.4072653Z * [new tag] ciflow/trunk/162325 -> ciflow/trunk/162325 2025-09-07T07:55:57.4074034Z * [new tag] ciflow/trunk/162328 -> ciflow/trunk/162328 2025-09-07T07:55:57.4075277Z * [new tag] ciflow/trunk/162329 -> ciflow/trunk/162329 2025-09-07T07:55:57.4076660Z * [new tag] ciflow/unstable/123 -> ciflow/unstable/123 2025-09-07T07:55:57.4077991Z * [new tag] ciflow/vllm/162292 -> ciflow/vllm/162292 2025-09-07T07:55:57.4079240Z * [new tag] ciflow/win-arm64/156049 -> ciflow/win-arm64/156049 2025-09-07T07:55:57.4079970Z * [new tag] ciflow/win-arm64/158104 -> ciflow/win-arm64/158104 2025-09-07T07:55:57.4081432Z * [new tag] ciflow/xpu/157699 -> ciflow/xpu/157699 2025-09-07T07:55:57.4082050Z * [new tag] ciflow/xpu/157994 -> ciflow/xpu/157994 2025-09-07T07:55:57.4082963Z * [new tag] ciflow/xpu/159459 -> ciflow/xpu/159459 2025-09-07T07:55:57.4083923Z * [new tag] ciflow/xpu/159718 -> ciflow/xpu/159718 2025-09-07T07:55:57.4084919Z * [new tag] ciflow/xpu/159944 -> ciflow/xpu/159944 2025-09-07T07:55:57.4085849Z * [new tag] ciflow/xpu/160867 -> ciflow/xpu/160867 2025-09-07T07:55:57.4086824Z * [new tag] ciflow/xpu/160938 -> ciflow/xpu/160938 2025-09-07T07:55:57.4087648Z * [new tag] ciflow/xpu/160940 -> ciflow/xpu/160940 2025-09-07T07:55:57.4088482Z * [new tag] ciflow/xpu/160953 -> ciflow/xpu/160953 2025-09-07T07:55:57.4089462Z * [new tag] ciflow/xpu/161045 -> ciflow/xpu/161045 2025-09-07T07:55:57.4090708Z * [new tag] ciflow/xpu/161058 -> ciflow/xpu/161058 2025-09-07T07:55:57.4091462Z * [new tag] ciflow/xpu/161246 -> ciflow/xpu/161246 2025-09-07T07:55:57.4092301Z * [new tag] ciflow/xpu/161397 -> ciflow/xpu/161397 2025-09-07T07:55:57.4093133Z * [new tag] ciflow/xpu/161485 -> ciflow/xpu/161485 2025-09-07T07:55:57.4094271Z * [new tag] ciflow/xpu/161988 -> ciflow/xpu/161988 2025-09-07T07:55:57.4095203Z * [new tag] ciflow/xpu/162062 -> ciflow/xpu/162062 2025-09-07T07:55:57.4096120Z * [new tag] cslpull75 -> cslpull75 2025-09-07T07:55:57.4097000Z * [new tag] cslpull76 -> cslpull76 2025-09-07T07:55:57.4097910Z * [new tag] cslpull77 -> cslpull77 2025-09-07T07:55:57.4099125Z * [new tag] cslpull78 -> cslpull78 2025-09-07T07:55:57.4099935Z * [new tag] cslpull79 -> cslpull79 2025-09-07T07:55:57.4100901Z * [new tag] cslpull80 -> cslpull80 2025-09-07T07:55:57.4101811Z * [new tag] cslpull81 -> cslpull81 2025-09-07T07:55:57.4102712Z * [new tag] cslpull82 -> cslpull82 2025-09-07T07:55:57.4103689Z * [new tag] cslpull83 -> cslpull83 2025-09-07T07:55:57.4105018Z * [new tag] cslpull84 -> cslpull84 2025-09-07T07:55:57.4105817Z * [new tag] cslpull85 -> cslpull85 2025-09-07T07:55:57.4106753Z * [new tag] cslpull86 -> cslpull86 2025-09-07T07:55:57.4107677Z * [new tag] cslpull87 -> cslpull87 2025-09-07T07:55:57.4108672Z * [new tag] cslpull88 -> cslpull88 2025-09-07T07:55:57.4109680Z * [new tag] cslpull89 -> cslpull89 2025-09-07T07:55:57.4110474Z * [new tag] cslpull90 -> cslpull90 2025-09-07T07:55:57.4111987Z * [new tag] cslpull91 -> cslpull91 2025-09-07T07:55:57.4112798Z * [new tag] cslpull92 -> cslpull92 2025-09-07T07:55:57.4113897Z * [new tag] flight_5 -> flight_5 2025-09-07T07:55:57.4115306Z * [new tag] flight_5.1 -> flight_5.1 2025-09-07T07:55:57.4116127Z * [new tag] flight_5.2 -> flight_5.2 2025-09-07T07:55:57.4116976Z * [new tag] flight_5.3 -> flight_5.3 2025-09-07T07:55:57.4118043Z * [new tag] forpull1 -> forpull1 2025-09-07T07:55:57.4119460Z * [new tag] malfet/tag-2ef5611 -> malfet/tag-2ef5611 2025-09-07T07:55:57.4120539Z * [new tag] malfet/tag-317b1a0 -> malfet/tag-317b1a0 2025-09-07T07:55:57.4121412Z * [new tag] malfet/tag-ec6f767 -> malfet/tag-ec6f767 2025-09-07T07:55:57.4122380Z * [new tag] nightly-binary -> nightly-binary 2025-09-07T07:55:57.4123236Z * [new tag] sqzhang_flight4_plus -> sqzhang_flight4_plus 2025-09-07T07:55:57.4125288Z * [new tag] sqzhang_flight_3 -> sqzhang_flight_3 2025-09-07T07:55:57.4126804Z * [new tag] trunk/00636e0171e7e733628c408084805442270cf608 -> trunk/00636e0171e7e733628c408084805442270cf608 2025-09-07T07:55:57.4127819Z * [new tag] trunk/019fed39aa6b2dd8c69347378d53423e5efae8d4 -> trunk/019fed39aa6b2dd8c69347378d53423e5efae8d4 2025-09-07T07:55:57.4128796Z * [new tag] trunk/01ab325cc2e0dc221af4d710974e1b9175066544 -> trunk/01ab325cc2e0dc221af4d710974e1b9175066544 2025-09-07T07:55:57.4129782Z * [new tag] trunk/01edcd4df8bf0c7b4cc2d3ec868bd2059eeea83b -> trunk/01edcd4df8bf0c7b4cc2d3ec868bd2059eeea83b 2025-09-07T07:55:57.4130763Z * [new tag] trunk/040d00af048967dde7938d358d7f5988cbd18388 -> trunk/040d00af048967dde7938d358d7f5988cbd18388 2025-09-07T07:55:57.4134768Z * [new tag] trunk/0447f2d99b4351b2ff129dce6eebb371024f73e5 -> trunk/0447f2d99b4351b2ff129dce6eebb371024f73e5 2025-09-07T07:55:57.4135731Z * [new tag] trunk/047603d35bdc70046216384838d6340feab79bf4 -> trunk/047603d35bdc70046216384838d6340feab79bf4 2025-09-07T07:55:57.4136778Z * [new tag] trunk/06da7c0730b3764f178ec3a90dedf4ffa4202d81 -> trunk/06da7c0730b3764f178ec3a90dedf4ffa4202d81 2025-09-07T07:55:57.4137903Z * [new tag] trunk/081cab045472ce045634548cc6c14a4870641e23 -> trunk/081cab045472ce045634548cc6c14a4870641e23 2025-09-07T07:55:57.4138831Z * [new tag] trunk/09587daf8c9f21f5340f73921ce5f23d1a4a4572 -> trunk/09587daf8c9f21f5340f73921ce5f23d1a4a4572 2025-09-07T07:55:57.4139797Z * [new tag] trunk/09be1890d72cc34fc946965dc4a27736bf0ca8c6 -> trunk/09be1890d72cc34fc946965dc4a27736bf0ca8c6 2025-09-07T07:55:57.4140827Z * [new tag] trunk/09d2f1b6315d6d416fbf452793d65795863ebc66 -> trunk/09d2f1b6315d6d416fbf452793d65795863ebc66 2025-09-07T07:55:57.4141844Z * [new tag] trunk/0af70e2353e1dcda83175fd4834ecb7b63e009e0 -> trunk/0af70e2353e1dcda83175fd4834ecb7b63e009e0 2025-09-07T07:55:57.4143400Z * [new tag] trunk/0c0e056a9e20c17271a6144dd32c0c7e3ba26736 -> trunk/0c0e056a9e20c17271a6144dd32c0c7e3ba26736 2025-09-07T07:55:57.4144512Z * [new tag] trunk/0cd6c56bdfa9178ff61be82ce3b178926ddb64a9 -> trunk/0cd6c56bdfa9178ff61be82ce3b178926ddb64a9 2025-09-07T07:55:57.4145706Z * [new tag] trunk/0d421ace32c1605ee8e452ee1eeb03bd243dd96c -> trunk/0d421ace32c1605ee8e452ee1eeb03bd243dd96c 2025-09-07T07:55:57.4146857Z * [new tag] trunk/0d71a9dd5b4b6d1dde58d91c9b71d96bc6a6a171 -> trunk/0d71a9dd5b4b6d1dde58d91c9b71d96bc6a6a171 2025-09-07T07:55:57.4147761Z * [new tag] trunk/0d84ff3b78f55492d3d4708458c92d776274939e -> trunk/0d84ff3b78f55492d3d4708458c92d776274939e 2025-09-07T07:55:57.4148716Z * [new tag] trunk/0f45aaf4414048b17d720d0915ce221a8de8ec63 -> trunk/0f45aaf4414048b17d720d0915ce221a8de8ec63 2025-09-07T07:55:57.4149917Z * [new tag] trunk/0ff8eabf1387de5acd6712a03bda61f1a3dfa27f -> trunk/0ff8eabf1387de5acd6712a03bda61f1a3dfa27f 2025-09-07T07:55:57.4150764Z * [new tag] trunk/104f2680e03d13a4765ca69f905d8f16fc0c822f -> trunk/104f2680e03d13a4765ca69f905d8f16fc0c822f 2025-09-07T07:55:57.4151890Z * [new tag] trunk/12814701555d3e41dfcdf8f9273af5821e322df0 -> trunk/12814701555d3e41dfcdf8f9273af5821e322df0 2025-09-07T07:55:57.4152822Z * [new tag] trunk/13b65196db422bdb394cb482e208c61ed448898c -> trunk/13b65196db422bdb394cb482e208c61ed448898c 2025-09-07T07:55:57.4154060Z * [new tag] trunk/13d66e2a66eceed14b8a8f5a971087df4f688a46 -> trunk/13d66e2a66eceed14b8a8f5a971087df4f688a46 2025-09-07T07:55:57.4155286Z * [new tag] trunk/145a3a7bda15e3963a33eb1b54bba5d4a270b225 -> trunk/145a3a7bda15e3963a33eb1b54bba5d4a270b225 2025-09-07T07:55:57.4156049Z * [new tag] trunk/146371483318e17929daefd37c8e459d9d6d47bb -> trunk/146371483318e17929daefd37c8e459d9d6d47bb 2025-09-07T07:55:57.4157328Z * [new tag] trunk/15c77a8cfd341e74fd124b077492ef2bfa51b339 -> trunk/15c77a8cfd341e74fd124b077492ef2bfa51b339 2025-09-07T07:55:57.4158289Z * [new tag] trunk/17fa8eec4a1e32939ab4d364ee6e75487a79b654 -> trunk/17fa8eec4a1e32939ab4d364ee6e75487a79b654 2025-09-07T07:55:57.4160395Z * [new tag] trunk/190c391a28845a14df26abb228d26aa813efb20c -> trunk/190c391a28845a14df26abb228d26aa813efb20c 2025-09-07T07:55:57.4161582Z * [new tag] trunk/1a588ace4667bde1331fbd8ed957157dca5cee68 -> trunk/1a588ace4667bde1331fbd8ed957157dca5cee68 2025-09-07T07:55:57.4162021Z * [new tag] trunk/1aa7476885e8f6e7b0ec3a5b6383aad9d3f343e7 -> trunk/1aa7476885e8f6e7b0ec3a5b6383aad9d3f343e7 2025-09-07T07:55:57.4162882Z * [new tag] trunk/1aeb421c342c9e9607842f4c87cb46e8e816ee53 -> trunk/1aeb421c342c9e9607842f4c87cb46e8e816ee53 2025-09-07T07:55:57.4163972Z * [new tag] trunk/1c1b28d5b6a942fafe23b2f09302d93c25226d4a -> trunk/1c1b28d5b6a942fafe23b2f09302d93c25226d4a 2025-09-07T07:55:57.4165231Z * [new tag] trunk/1ebd70d0c0d562d3be9abdee2a21906584af7d99 -> trunk/1ebd70d0c0d562d3be9abdee2a21906584af7d99 2025-09-07T07:55:57.4166223Z * [new tag] trunk/1ec2c15914da4ef7bd926ed9aebc8671c75fe965 -> trunk/1ec2c15914da4ef7bd926ed9aebc8671c75fe965 2025-09-07T07:55:57.4167242Z * [new tag] trunk/1f51056bd64e73d1aa81321bc3c098575b1bc78a -> trunk/1f51056bd64e73d1aa81321bc3c098575b1bc78a 2025-09-07T07:55:57.4168269Z * [new tag] trunk/1f820de639c75a1562d3fb03f160439f853ae07b -> trunk/1f820de639c75a1562d3fb03f160439f853ae07b 2025-09-07T07:55:57.4169299Z * [new tag] trunk/204697f0e695d82894c5010fbec664c4391f90cc -> trunk/204697f0e695d82894c5010fbec664c4391f90cc 2025-09-07T07:55:57.4170915Z * [new tag] trunk/20629b1619fe636227d01fc85ba221daa7185a05 -> trunk/20629b1619fe636227d01fc85ba221daa7185a05 2025-09-07T07:55:57.4171857Z * [new tag] trunk/20b47acef845e9c4f71da9429a396d293f50ebe7 -> trunk/20b47acef845e9c4f71da9429a396d293f50ebe7 2025-09-07T07:55:57.4172886Z * [new tag] trunk/20bfb2539d7c5250379648eda35f80b8a7d642dd -> trunk/20bfb2539d7c5250379648eda35f80b8a7d642dd 2025-09-07T07:55:57.4174237Z * [new tag] trunk/21fae99c180d17def562797ea0fb154d8fdf88e3 -> trunk/21fae99c180d17def562797ea0fb154d8fdf88e3 2025-09-07T07:55:57.4175401Z * [new tag] trunk/248355faf53f9f7ba2fd0a367d59600c6d991e7f -> trunk/248355faf53f9f7ba2fd0a367d59600c6d991e7f 2025-09-07T07:55:57.4176356Z * [new tag] trunk/25f4aaed9ec26f39c13862323ff8582006473d23 -> trunk/25f4aaed9ec26f39c13862323ff8582006473d23 2025-09-07T07:55:57.4177370Z * [new tag] trunk/261a84a1764412f8e659c956e3f81997ec3de9d5 -> trunk/261a84a1764412f8e659c956e3f81997ec3de9d5 2025-09-07T07:55:57.4178560Z * [new tag] trunk/28f4ab0737937858730f29f5c4e601e109cf9d5f -> trunk/28f4ab0737937858730f29f5c4e601e109cf9d5f 2025-09-07T07:55:57.4179582Z * [new tag] trunk/291cd11f2d5df6f48d348cce0e4e762f274f4dc4 -> trunk/291cd11f2d5df6f48d348cce0e4e762f274f4dc4 2025-09-07T07:55:57.4180738Z * [new tag] trunk/29280864d941e6108ab57f7298f520c0cf9696e9 -> trunk/29280864d941e6108ab57f7298f520c0cf9696e9 2025-09-07T07:55:57.4181715Z * [new tag] trunk/2a45837e98c63cae9d1a2e2133a727b829e549d5 -> trunk/2a45837e98c63cae9d1a2e2133a727b829e549d5 2025-09-07T07:55:57.4182859Z * [new tag] trunk/2a5c0785e2f975697fd7bdf1411de6e03dcaa1ef -> trunk/2a5c0785e2f975697fd7bdf1411de6e03dcaa1ef 2025-09-07T07:55:57.4184345Z * [new tag] trunk/2b8a83901c58a0858ea9e4ce00055f48e6ed164c -> trunk/2b8a83901c58a0858ea9e4ce00055f48e6ed164c 2025-09-07T07:55:57.4185296Z * [new tag] trunk/2ba65472dd54488a86a50326ea990195fc6732d6 -> trunk/2ba65472dd54488a86a50326ea990195fc6732d6 2025-09-07T07:55:57.4186261Z * [new tag] trunk/2c03f0acc53ed13fe8ebfe809129f25996e009a0 -> trunk/2c03f0acc53ed13fe8ebfe809129f25996e009a0 2025-09-07T07:55:57.4187430Z * [new tag] trunk/2dd529df0092799f68ee7afcf52338276906706a -> trunk/2dd529df0092799f68ee7afcf52338276906706a 2025-09-07T07:55:57.4188409Z * [new tag] trunk/2f6b4b1ad3f82bb3bd984f6e65744ea339ffb8b5 -> trunk/2f6b4b1ad3f82bb3bd984f6e65744ea339ffb8b5 2025-09-07T07:55:57.4189582Z * [new tag] trunk/2fa0520a64ed8aa734a56c4d124958f0b5711ca8 -> trunk/2fa0520a64ed8aa734a56c4d124958f0b5711ca8 2025-09-07T07:55:57.4190426Z * [new tag] trunk/302df2ac5dc4222294c09d48804a2dddb8f4bad8 -> trunk/302df2ac5dc4222294c09d48804a2dddb8f4bad8 2025-09-07T07:55:57.4191334Z * [new tag] trunk/33028597bfa2e0178e28c8cce33cb9b3800cac43 -> trunk/33028597bfa2e0178e28c8cce33cb9b3800cac43 2025-09-07T07:55:57.4192607Z * [new tag] trunk/34aa78274d6770086025a967fa63a86830e08176 -> trunk/34aa78274d6770086025a967fa63a86830e08176 2025-09-07T07:55:57.4193581Z * [new tag] trunk/3559c354ce6a14d11fe29fb12fa2747a2f2af449 -> trunk/3559c354ce6a14d11fe29fb12fa2747a2f2af449 2025-09-07T07:55:57.4194964Z * [new tag] trunk/36d207fcaaede0d1e58a5168084c307b32b6fd8b -> trunk/36d207fcaaede0d1e58a5168084c307b32b6fd8b 2025-09-07T07:55:57.4195686Z * [new tag] trunk/377033757ae5ca524ea842f1b0a5f446ed3d8fe0 -> trunk/377033757ae5ca524ea842f1b0a5f446ed3d8fe0 2025-09-07T07:55:57.4196869Z * [new tag] trunk/3771380f83fcac154a7c89ad679311d8c4818287 -> trunk/3771380f83fcac154a7c89ad679311d8c4818287 2025-09-07T07:55:57.4198071Z * [new tag] trunk/3a207816cc569f78863d86c01f2a3d265350e39f -> trunk/3a207816cc569f78863d86c01f2a3d265350e39f 2025-09-07T07:55:57.4199016Z * [new tag] trunk/3a20a20e7065ec927fdd216d4da3b04f879b3c67 -> trunk/3a20a20e7065ec927fdd216d4da3b04f879b3c67 2025-09-07T07:55:57.4200282Z * [new tag] trunk/3bbc2e3e4f025523eaa5dbff220b3e96bca608d0 -> trunk/3bbc2e3e4f025523eaa5dbff220b3e96bca608d0 2025-09-07T07:55:57.4201161Z * [new tag] trunk/3c0ff1b569c45cfa6935ad8031a9d4cf1551aa3f -> trunk/3c0ff1b569c45cfa6935ad8031a9d4cf1551aa3f 2025-09-07T07:55:57.4202350Z * [new tag] trunk/3c45af079afc92a03b03ddf4f9198902ffcf30cf -> trunk/3c45af079afc92a03b03ddf4f9198902ffcf30cf 2025-09-07T07:55:57.4203278Z * [new tag] trunk/3dde5d7f9bf80dd6623a712bc429e9e4302464b5 -> trunk/3dde5d7f9bf80dd6623a712bc429e9e4302464b5 2025-09-07T07:55:57.4204792Z * [new tag] trunk/403a3a393cda7e60f503f3b04b8805a845dcf45d -> trunk/403a3a393cda7e60f503f3b04b8805a845dcf45d 2025-09-07T07:55:57.4205801Z * [new tag] trunk/420c52ecf36f86d32da0853bfbe074b682b070aa -> trunk/420c52ecf36f86d32da0853bfbe074b682b070aa 2025-09-07T07:55:57.4206772Z * [new tag] trunk/43b7c86a2c0f91320f5c5f4827b111edff06fdb6 -> trunk/43b7c86a2c0f91320f5c5f4827b111edff06fdb6 2025-09-07T07:55:57.4207905Z * [new tag] trunk/451ed931562ec8b46d1f7e6c266a68132a119336 -> trunk/451ed931562ec8b46d1f7e6c266a68132a119336 2025-09-07T07:55:57.4209033Z * [new tag] trunk/480c7391126656154318fabf1d57ebc01e196e63 -> trunk/480c7391126656154318fabf1d57ebc01e196e63 2025-09-07T07:55:57.4210212Z * [new tag] trunk/48bedd753da22634aa94fbafeb731e82025404f3 -> trunk/48bedd753da22634aa94fbafeb731e82025404f3 2025-09-07T07:55:57.4211174Z * [new tag] trunk/494878a11b79071ada0b98f34042d47155be6d1c -> trunk/494878a11b79071ada0b98f34042d47155be6d1c 2025-09-07T07:55:57.4212545Z * [new tag] trunk/4ae57d448c0a7d37e4cfd5c27d977fad2cef4051 -> trunk/4ae57d448c0a7d37e4cfd5c27d977fad2cef4051 2025-09-07T07:55:57.4213349Z * [new tag] trunk/4cdaf8265d86f984254b62052da8c26ef61ef1cf -> trunk/4cdaf8265d86f984254b62052da8c26ef61ef1cf 2025-09-07T07:55:57.4214678Z * [new tag] trunk/4d4abec80f03cd8fdefe1d9cb3a60d3690cd777e -> trunk/4d4abec80f03cd8fdefe1d9cb3a60d3690cd777e 2025-09-07T07:55:57.4215822Z * [new tag] trunk/4e42aa8ffc44b8340eb0eeaf80a2cafc4763a186 -> trunk/4e42aa8ffc44b8340eb0eeaf80a2cafc4763a186 2025-09-07T07:55:57.4216990Z * [new tag] trunk/4f72d932feee0749397fec876dcd43994f50b215 -> trunk/4f72d932feee0749397fec876dcd43994f50b215 2025-09-07T07:55:57.4218130Z * [new tag] trunk/50fc22dedf3c4a27be61fa05551c4f320281b42d -> trunk/50fc22dedf3c4a27be61fa05551c4f320281b42d 2025-09-07T07:55:57.4219127Z * [new tag] trunk/5211f1f908907ffc064b56e43cf8659f7fc22aa9 -> trunk/5211f1f908907ffc064b56e43cf8659f7fc22aa9 2025-09-07T07:55:57.4220854Z * [new tag] trunk/524b78d4f67045b83bb69edc56ab16efe282971c -> trunk/524b78d4f67045b83bb69edc56ab16efe282971c 2025-09-07T07:55:57.4222015Z * [new tag] trunk/54e275e0d81fe1e1ccfa4fb5f2a5a9aaca00ca15 -> trunk/54e275e0d81fe1e1ccfa4fb5f2a5a9aaca00ca15 2025-09-07T07:55:57.4222909Z * [new tag] trunk/5561e45758d59c94605873d5db48ed459c004c3b -> trunk/5561e45758d59c94605873d5db48ed459c004c3b 2025-09-07T07:55:57.4224410Z * [new tag] trunk/57278d45f046d4f89f45d373b1af4dd56934ff24 -> trunk/57278d45f046d4f89f45d373b1af4dd56934ff24 2025-09-07T07:55:57.4225559Z * [new tag] trunk/5927a70934ccf7b70182d364c23245a7dd685503 -> trunk/5927a70934ccf7b70182d364c23245a7dd685503 2025-09-07T07:55:57.4226726Z * [new tag] trunk/5985e28912aeb40b103ebfcf2fd0665eb4a50599 -> trunk/5985e28912aeb40b103ebfcf2fd0665eb4a50599 2025-09-07T07:55:57.4227871Z * [new tag] trunk/5a2da090ed6db88bb657c4e51ec0b310cd08bff6 -> trunk/5a2da090ed6db88bb657c4e51ec0b310cd08bff6 2025-09-07T07:55:57.4228841Z * [new tag] trunk/5c473e9f5ee0ef0fc38e6cf34a95b547f8cdc8d5 -> trunk/5c473e9f5ee0ef0fc38e6cf34a95b547f8cdc8d5 2025-09-07T07:55:57.4230022Z * [new tag] trunk/5c67426d6847667a7c55a2dd01f470fa37238c18 -> trunk/5c67426d6847667a7c55a2dd01f470fa37238c18 2025-09-07T07:55:57.4231027Z * [new tag] trunk/5da573c42c332bc68d4b7946c69f690a876d951a -> trunk/5da573c42c332bc68d4b7946c69f690a876d951a 2025-09-07T07:55:57.4232242Z * [new tag] trunk/5e5870e858f60ff4bf87d03f3592097e934a9580 -> trunk/5e5870e858f60ff4bf87d03f3592097e934a9580 2025-09-07T07:55:57.4233218Z * [new tag] trunk/5f3cbc9442aa55b5afb29f4ac8ca9be569003e84 -> trunk/5f3cbc9442aa55b5afb29f4ac8ca9be569003e84 2025-09-07T07:55:57.4234703Z * [new tag] trunk/600c25e9a17fe56e3dee872be8854db08916ba0c -> trunk/600c25e9a17fe56e3dee872be8854db08916ba0c 2025-09-07T07:55:57.4235848Z * [new tag] trunk/601ae8e4831fc8123fffcfb8fd2e6b6381b42e14 -> trunk/601ae8e4831fc8123fffcfb8fd2e6b6381b42e14 2025-09-07T07:55:57.4236845Z * [new tag] trunk/6087ef41e54c2494b117ffd923faf20f515a6806 -> trunk/6087ef41e54c2494b117ffd923faf20f515a6806 2025-09-07T07:55:57.4238169Z * [new tag] trunk/626cb7df8161dd4ecb4fe43b60f37ce9076f56b1 -> trunk/626cb7df8161dd4ecb4fe43b60f37ce9076f56b1 2025-09-07T07:55:57.4239169Z * [new tag] trunk/62c3f9a97fd3dea7132a93066d32d893ffe101e6 -> trunk/62c3f9a97fd3dea7132a93066d32d893ffe101e6 2025-09-07T07:55:57.4240384Z * [new tag] trunk/63a9c23fe99eacfd09610c36dfe8f01b053c1a35 -> trunk/63a9c23fe99eacfd09610c36dfe8f01b053c1a35 2025-09-07T07:55:57.4241517Z * [new tag] trunk/65985937d97505f648b6ed852c3129f2dd08b251 -> trunk/65985937d97505f648b6ed852c3129f2dd08b251 2025-09-07T07:55:57.4243232Z * [new tag] trunk/66f3b4a682a6153517dd23369fdc3289b6494b07 -> trunk/66f3b4a682a6153517dd23369fdc3289b6494b07 2025-09-07T07:55:57.4244155Z * [new tag] trunk/6737e2c996990024187ba620d2764f3b6f6add2c -> trunk/6737e2c996990024187ba620d2764f3b6f6add2c 2025-09-07T07:55:57.4245564Z * [new tag] trunk/67c31dcd364f10072a55f4a30ffd1151c686283a -> trunk/67c31dcd364f10072a55f4a30ffd1151c686283a 2025-09-07T07:55:57.4246562Z * [new tag] trunk/68738beff73e9c3512e18b4edea811a897ce42db -> trunk/68738beff73e9c3512e18b4edea811a897ce42db 2025-09-07T07:55:57.4247793Z * [new tag] trunk/69a25f68884a168550695fdb1a7c310c54d29536 -> trunk/69a25f68884a168550695fdb1a7c310c54d29536 2025-09-07T07:55:57.4249040Z * [new tag] trunk/6b1900c22f1a07b9519346898d4c71d8a2b0f12f -> trunk/6b1900c22f1a07b9519346898d4c71d8a2b0f12f 2025-09-07T07:55:57.4250166Z * [new tag] trunk/6b8b3ac4403f771bd4a8f9a45d93347304148774 -> trunk/6b8b3ac4403f771bd4a8f9a45d93347304148774 2025-09-07T07:55:57.4251288Z * [new tag] trunk/6f7608d603834d6068b2e7a5d59bec3973b6bb1b -> trunk/6f7608d603834d6068b2e7a5d59bec3973b6bb1b 2025-09-07T07:55:57.4252491Z * [new tag] trunk/70d36e047dfb3488fd6335016711a784d810ebda -> trunk/70d36e047dfb3488fd6335016711a784d810ebda 2025-09-07T07:55:57.4253816Z * [new tag] trunk/71992dd805ff9d6763f77214dfe8b0465e88c87b -> trunk/71992dd805ff9d6763f77214dfe8b0465e88c87b 2025-09-07T07:55:57.4255142Z * [new tag] trunk/734ce8eba9c69381f187359bf0fef1d71d84cd20 -> trunk/734ce8eba9c69381f187359bf0fef1d71d84cd20 2025-09-07T07:55:57.4256313Z * [new tag] trunk/73eb4511fb863a37944342b7e92aae706de603c8 -> trunk/73eb4511fb863a37944342b7e92aae706de603c8 2025-09-07T07:55:57.4257510Z * [new tag] trunk/75bc23cfc345bd4c05e7f97c416c4b3d2d1fa64b -> trunk/75bc23cfc345bd4c05e7f97c416c4b3d2d1fa64b 2025-09-07T07:55:57.4258623Z * [new tag] trunk/771f369448321a387f2018535bc8b8b6e5f12fab -> trunk/771f369448321a387f2018535bc8b8b6e5f12fab 2025-09-07T07:55:57.4259809Z * [new tag] trunk/789d4942127143f2adcb53612c058ce4c9a2cf20 -> trunk/789d4942127143f2adcb53612c058ce4c9a2cf20 2025-09-07T07:55:57.4260729Z * [new tag] trunk/791eff96c85678c950888f9da24650083ee673fe -> trunk/791eff96c85678c950888f9da24650083ee673fe 2025-09-07T07:55:57.4261910Z * [new tag] trunk/793fc12aff1f69fbbf9f4278182fb52bbe350fc9 -> trunk/793fc12aff1f69fbbf9f4278182fb52bbe350fc9 2025-09-07T07:55:57.4262861Z * [new tag] trunk/79fcd5247a9a129eee526a14df30bfc6a22b3f01 -> trunk/79fcd5247a9a129eee526a14df30bfc6a22b3f01 2025-09-07T07:55:57.4264289Z * [new tag] trunk/7f4ff79210eb06924f223ae3a1941ee0e2635348 -> trunk/7f4ff79210eb06924f223ae3a1941ee0e2635348 2025-09-07T07:55:57.4265520Z * [new tag] trunk/8076a185c85112be62be292eb47409c88a585b1c -> trunk/8076a185c85112be62be292eb47409c88a585b1c 2025-09-07T07:55:57.4266705Z * [new tag] trunk/80dd397f1979371a5583fa3d5c7352029522a78d -> trunk/80dd397f1979371a5583fa3d5c7352029522a78d 2025-09-07T07:55:57.4267557Z * [new tag] trunk/8171d6052ec12628eb67e0040839314056014429 -> trunk/8171d6052ec12628eb67e0040839314056014429 2025-09-07T07:55:57.4268786Z * [new tag] trunk/81aeefa657b7ccc26b275c50a9f33b2f056e8071 -> trunk/81aeefa657b7ccc26b275c50a9f33b2f056e8071 2025-09-07T07:55:57.4269948Z * [new tag] trunk/81b7b16618bda250ce55982894a83dc0805eb64c -> trunk/81b7b16618bda250ce55982894a83dc0805eb64c 2025-09-07T07:55:57.4271112Z * [new tag] trunk/827f0d405448de31f79d1089f7d7fceab2f87895 -> trunk/827f0d405448de31f79d1089f7d7fceab2f87895 2025-09-07T07:55:57.4272118Z * [new tag] trunk/82f63c8f6de63c30132a8ac299b6e8c2fd0d3fe8 -> trunk/82f63c8f6de63c30132a8ac299b6e8c2fd0d3fe8 2025-09-07T07:55:57.4273414Z * [new tag] trunk/850e1382a9c56bfde18af09d3e72352d775e9435 -> trunk/850e1382a9c56bfde18af09d3e72352d775e9435 2025-09-07T07:55:57.4274988Z * [new tag] trunk/8678d831c48e616b717bff50f2d03141d2e9f965 -> trunk/8678d831c48e616b717bff50f2d03141d2e9f965 2025-09-07T07:55:57.4275942Z * [new tag] trunk/869cbcc16e489a4f5a14a93d5779b0ea86061c60 -> trunk/869cbcc16e489a4f5a14a93d5779b0ea86061c60 2025-09-07T07:55:57.4277318Z * [new tag] trunk/8703debf669bc2238211bfd039f4ecdd8228b7f7 -> trunk/8703debf669bc2238211bfd039f4ecdd8228b7f7 2025-09-07T07:55:57.4278559Z * [new tag] trunk/874069fbe46e82da5cfa405e6c0deb12e89ff608 -> trunk/874069fbe46e82da5cfa405e6c0deb12e89ff608 2025-09-07T07:55:57.4279755Z * [new tag] trunk/8875d6e394da2fffd04f31b28bf258c94d4776a3 -> trunk/8875d6e394da2fffd04f31b28bf258c94d4776a3 2025-09-07T07:55:57.4281048Z * [new tag] trunk/88d94d17e8c5155451393afa6eb3bab48ab61c16 -> trunk/88d94d17e8c5155451393afa6eb3bab48ab61c16 2025-09-07T07:55:57.4282244Z * [new tag] trunk/890626632def7e0ef95a2d01e87a0e4627824a9f -> trunk/890626632def7e0ef95a2d01e87a0e4627824a9f 2025-09-07T07:55:57.4283507Z * [new tag] trunk/8975cda2520b7b1b5bc3b4d8213edf261fa82570 -> trunk/8975cda2520b7b1b5bc3b4d8213edf261fa82570 2025-09-07T07:55:57.4285017Z * [new tag] trunk/89d41d3f61d04f14730ec26f008a59bef6624610 -> trunk/89d41d3f61d04f14730ec26f008a59bef6624610 2025-09-07T07:55:57.4286190Z * [new tag] trunk/8bb213b6d599ef1273fe52f9b1f6d476056c3a41 -> trunk/8bb213b6d599ef1273fe52f9b1f6d476056c3a41 2025-09-07T07:55:57.4287197Z * [new tag] trunk/8e23a1227b5fb2e39afaa7d57c075a75b640a5af -> trunk/8e23a1227b5fb2e39afaa7d57c075a75b640a5af 2025-09-07T07:55:57.4288727Z * [new tag] trunk/8ec551bb354ab2b85fbbba9d461740a20366d248 -> trunk/8ec551bb354ab2b85fbbba9d461740a20366d248 2025-09-07T07:55:57.4289933Z * [new tag] trunk/8fd3c9ce919c8d5c645fd348bba517e948cbc29d -> trunk/8fd3c9ce919c8d5c645fd348bba517e948cbc29d 2025-09-07T07:55:57.4291082Z * [new tag] trunk/90f50f7e68e120d9574e6e3189e37b4280010ad9 -> trunk/90f50f7e68e120d9574e6e3189e37b4280010ad9 2025-09-07T07:55:57.4292356Z * [new tag] trunk/91f0bcf43fc0bc743350d491ac63b77e92054ac9 -> trunk/91f0bcf43fc0bc743350d491ac63b77e92054ac9 2025-09-07T07:55:57.4293575Z * [new tag] trunk/92576a594b8121f6b0b1b5a3ea16d08792fc68ab -> trunk/92576a594b8121f6b0b1b5a3ea16d08792fc68ab 2025-09-07T07:55:57.4295101Z * [new tag] trunk/92a43025e0baa1f2ce345f28d22913b518a1ab9d -> trunk/92a43025e0baa1f2ce345f28d22913b518a1ab9d 2025-09-07T07:55:57.4296380Z * [new tag] trunk/93fb23d6fae7c4e82c4239a1033e522088742634 -> trunk/93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T07:55:57.4297535Z * [new tag] trunk/9458d1ac3bd70c2af316a8ba95d2c6c9c1199c9c -> trunk/9458d1ac3bd70c2af316a8ba95d2c6c9c1199c9c 2025-09-07T07:55:57.4298710Z * [new tag] trunk/9480cdc0b61488c89a23c2f64f43b2dcedc8728e -> trunk/9480cdc0b61488c89a23c2f64f43b2dcedc8728e 2025-09-07T07:55:57.4299972Z * [new tag] trunk/9491d289b329e4ba4a9f5f5b1be7960671bb7840 -> trunk/9491d289b329e4ba4a9f5f5b1be7960671bb7840 2025-09-07T07:55:57.4301105Z * [new tag] trunk/9499c8761cd2067feb9877414e818f6fd00290f1 -> trunk/9499c8761cd2067feb9877414e818f6fd00290f1 2025-09-07T07:55:57.4302262Z * [new tag] trunk/95ee0bfea99d3d346d6502b91b497d2b35795504 -> trunk/95ee0bfea99d3d346d6502b91b497d2b35795504 2025-09-07T07:55:57.4303418Z * [new tag] trunk/98374612fc2febd686be20761e56bdc2424bc36a -> trunk/98374612fc2febd686be20761e56bdc2424bc36a 2025-09-07T07:55:57.4305047Z * [new tag] trunk/98efc9e93d8fc61eb53cb91378443617cb550500 -> trunk/98efc9e93d8fc61eb53cb91378443617cb550500 2025-09-07T07:55:57.4306011Z * [new tag] trunk/994f2a5dbcbdc915da39bf6f6ce4d1f5e74835c9 -> trunk/994f2a5dbcbdc915da39bf6f6ce4d1f5e74835c9 2025-09-07T07:55:57.4307453Z * [new tag] trunk/99f356fa58c8d726cef022d8710f5491291158f6 -> trunk/99f356fa58c8d726cef022d8710f5491291158f6 2025-09-07T07:55:57.4308851Z * [new tag] trunk/9a1c5c0a078b94d13ac5c1ae0d754d19fb73bf99 -> trunk/9a1c5c0a078b94d13ac5c1ae0d754d19fb73bf99 2025-09-07T07:55:57.4310002Z * [new tag] trunk/9a665ca3c472384e9d722bddba79e5a7680f1abd -> trunk/9a665ca3c472384e9d722bddba79e5a7680f1abd 2025-09-07T07:55:57.4311156Z * [new tag] trunk/9aedb3cd87b52160872173c177f61053d97bed57 -> trunk/9aedb3cd87b52160872173c177f61053d97bed57 2025-09-07T07:55:57.4312321Z * [new tag] trunk/9b81fe281da41f2421506339d26b027a468902f4 -> trunk/9b81fe281da41f2421506339d26b027a468902f4 2025-09-07T07:55:57.4313462Z * [new tag] trunk/9bdcee01f86e2969cff1140cdecfca13cb51816e -> trunk/9bdcee01f86e2969cff1140cdecfca13cb51816e 2025-09-07T07:55:57.4314891Z * [new tag] trunk/9c03d6be87eedc06e524e202e07a7e776551a839 -> trunk/9c03d6be87eedc06e524e202e07a7e776551a839 2025-09-07T07:55:57.4315857Z * [new tag] trunk/9c957723a0fedd9c637e63e023a613019e2cab60 -> trunk/9c957723a0fedd9c637e63e023a613019e2cab60 2025-09-07T07:55:57.4317143Z * [new tag] trunk/9e5247f51d81735e5f1e65e80588985fa93bccc5 -> trunk/9e5247f51d81735e5f1e65e80588985fa93bccc5 2025-09-07T07:55:57.4318372Z * [new tag] trunk/9eadb37cdd699f7e8e8177a5227bfeb16184ef26 -> trunk/9eadb37cdd699f7e8e8177a5227bfeb16184ef26 2025-09-07T07:55:57.4319532Z * [new tag] trunk/a00cdc1e4159db73c9ffb3f25e93e55877709a29 -> trunk/a00cdc1e4159db73c9ffb3f25e93e55877709a29 2025-09-07T07:55:57.4320775Z * [new tag] trunk/a02ee4a816d11380c6f564c1aba64d56af5ba705 -> trunk/a02ee4a816d11380c6f564c1aba64d56af5ba705 2025-09-07T07:55:57.4321756Z * [new tag] trunk/a3c7f77e50f900721817934120d60c2361b3c40d -> trunk/a3c7f77e50f900721817934120d60c2361b3c40d 2025-09-07T07:55:57.4323051Z * [new tag] trunk/a3d72b09ae12126a2b7d4a63a45ac100a882a802 -> trunk/a3d72b09ae12126a2b7d4a63a45ac100a882a802 2025-09-07T07:55:57.4324114Z * [new tag] trunk/a3e5466002791da609fcb069155d8ee347baee92 -> trunk/a3e5466002791da609fcb069155d8ee347baee92 2025-09-07T07:55:57.4325526Z * [new tag] trunk/a714437093ed196eee28f7de454cf4c41badc098 -> trunk/a714437093ed196eee28f7de454cf4c41badc098 2025-09-07T07:55:57.4326691Z * [new tag] trunk/a75e8cd27098f290de0b7439685d05ce02e91356 -> trunk/a75e8cd27098f290de0b7439685d05ce02e91356 2025-09-07T07:55:57.4327576Z * [new tag] trunk/a8d6943d36c1c2a5f90d3573460695bad4b623ae -> trunk/a8d6943d36c1c2a5f90d3573460695bad4b623ae 2025-09-07T07:55:57.4328751Z * [new tag] trunk/a918bbad6ab20649ff82eefb48417ecbe96bcb34 -> trunk/a918bbad6ab20649ff82eefb48417ecbe96bcb34 2025-09-07T07:55:57.4329936Z * [new tag] trunk/a99d8d39bc842d6ebc3e368b178e4884d24b056e -> trunk/a99d8d39bc842d6ebc3e368b178e4884d24b056e 2025-09-07T07:55:57.4331098Z * [new tag] trunk/aac1a50a191b4102d566c9c1ea22f06d6c2e3f02 -> trunk/aac1a50a191b4102d566c9c1ea22f06d6c2e3f02 2025-09-07T07:55:57.4332516Z * [new tag] trunk/aad96a202244c7d0d120c04ba8db593edd8c0f92 -> trunk/aad96a202244c7d0d120c04ba8db593edd8c0f92 2025-09-07T07:55:57.4333686Z * [new tag] trunk/ab643e4dbbaf7b663d4237514cbf01af9b11565c -> trunk/ab643e4dbbaf7b663d4237514cbf01af9b11565c 2025-09-07T07:55:57.4335029Z * [new tag] trunk/abc447174cd2cf8591edbc70a9f836f9a5779f47 -> trunk/abc447174cd2cf8591edbc70a9f836f9a5779f47 2025-09-07T07:55:57.4336051Z * [new tag] trunk/acece97c3a9dceb63194e314da93fdf37cf15a0d -> trunk/acece97c3a9dceb63194e314da93fdf37cf15a0d 2025-09-07T07:55:57.4336981Z * [new tag] trunk/ada43ed39c80b746b4822c92640a1882619e2795 -> trunk/ada43ed39c80b746b4822c92640a1882619e2795 2025-09-07T07:55:57.4338288Z * [new tag] trunk/adae7f66aacf3f248c3101b858cf98d5809119fa -> trunk/adae7f66aacf3f248c3101b858cf98d5809119fa 2025-09-07T07:55:57.4339653Z * [new tag] trunk/ae0edc133e61e3b16caf0b2ee0ff3f33ab72af4c -> trunk/ae0edc133e61e3b16caf0b2ee0ff3f33ab72af4c 2025-09-07T07:55:57.4340460Z * [new tag] trunk/aed33a8fcbd60b052d4559d261390c5797129c6d -> trunk/aed33a8fcbd60b052d4559d261390c5797129c6d 2025-09-07T07:55:57.4341647Z * [new tag] trunk/b04e922712080a3652e438d05e8bb74e0cd2d238 -> trunk/b04e922712080a3652e438d05e8bb74e0cd2d238 2025-09-07T07:55:57.4342826Z * [new tag] trunk/b0a3e58dd71c1a039ac0ef51e5bd8f704f632f6f -> trunk/b0a3e58dd71c1a039ac0ef51e5bd8f704f632f6f 2025-09-07T07:55:57.4343938Z * [new tag] trunk/b16d3f4c8c01d461c2f01064e9ca5fa2b33f5cf1 -> trunk/b16d3f4c8c01d461c2f01064e9ca5fa2b33f5cf1 2025-09-07T07:55:57.4345292Z * [new tag] trunk/b18bb6796f210a183e687d9d64984a5a9d13cf09 -> trunk/b18bb6796f210a183e687d9d64984a5a9d13cf09 2025-09-07T07:55:57.4346428Z * [new tag] trunk/b1bb98ddebdd3e41bf7987372409bdce96ae55de -> trunk/b1bb98ddebdd3e41bf7987372409bdce96ae55de 2025-09-07T07:55:57.4347423Z * [new tag] trunk/b2b4add0e754411372060e1d7b4057a66439172b -> trunk/b2b4add0e754411372060e1d7b4057a66439172b 2025-09-07T07:55:57.4348601Z * [new tag] trunk/b2c7b9ad2dc5a7c0b61febd307761bd5bc2f0f05 -> trunk/b2c7b9ad2dc5a7c0b61febd307761bd5bc2f0f05 2025-09-07T07:55:57.4349693Z * [new tag] trunk/b40d9432be44a6b5974ee62e7d19c3c61c5ece37 -> trunk/b40d9432be44a6b5974ee62e7d19c3c61c5ece37 2025-09-07T07:55:57.4350896Z * [new tag] trunk/b4ad38279b178b7bd14355123c1101e2e853e77b -> trunk/b4ad38279b178b7bd14355123c1101e2e853e77b 2025-09-07T07:55:57.4352023Z * [new tag] trunk/b67c41039835bd9b20b83cd6233e86baaa5f5dde -> trunk/b67c41039835bd9b20b83cd6233e86baaa5f5dde 2025-09-07T07:55:57.4353281Z * [new tag] trunk/b6d0a9ea9056ede4f7024dbf3bd6c43be3aff49c -> trunk/b6d0a9ea9056ede4f7024dbf3bd6c43be3aff49c 2025-09-07T07:55:57.4354597Z * [new tag] trunk/b7dad7dd49448c88d0751fa2e29c70afe985f734 -> trunk/b7dad7dd49448c88d0751fa2e29c70afe985f734 2025-09-07T07:55:57.4355830Z * [new tag] trunk/b7e207ca9f046ddd716076965a0cce403ba99052 -> trunk/b7e207ca9f046ddd716076965a0cce403ba99052 2025-09-07T07:55:57.4356963Z * [new tag] trunk/b919560c4a7010e2d89facee25586269a994746e -> trunk/b919560c4a7010e2d89facee25586269a994746e 2025-09-07T07:55:57.4358383Z * [new tag] trunk/b9ba612f7a968f7b27e121ca8f4d0a4d954f5354 -> trunk/b9ba612f7a968f7b27e121ca8f4d0a4d954f5354 2025-09-07T07:55:57.4359535Z * [new tag] trunk/ba7f546ccccb5e0b36d9070dc25f26a9647f89f8 -> trunk/ba7f546ccccb5e0b36d9070dc25f26a9647f89f8 2025-09-07T07:55:57.4360519Z * [new tag] trunk/bb950284c7e72905994bc25dd436c10e48088d85 -> trunk/bb950284c7e72905994bc25dd436c10e48088d85 2025-09-07T07:55:57.4361735Z * [new tag] trunk/bbedc71fd3267c639c38b4ec25eaa22f973d9c4d -> trunk/bbedc71fd3267c639c38b4ec25eaa22f973d9c4d 2025-09-07T07:55:57.4362652Z * [new tag] trunk/bc4db2c27fce6ff1648bdc5af31ec225d2a31f37 -> trunk/bc4db2c27fce6ff1648bdc5af31ec225d2a31f37 2025-09-07T07:55:57.4363917Z * [new tag] trunk/bc505977fb66677a09c31155c987330fbb18a865 -> trunk/bc505977fb66677a09c31155c987330fbb18a865 2025-09-07T07:55:57.4365381Z * [new tag] trunk/bd39e47feea7326afb5bbb67fcb1e69279239527 -> trunk/bd39e47feea7326afb5bbb67fcb1e69279239527 2025-09-07T07:55:57.4366527Z * [new tag] trunk/be5b03dde96638f25ffd732a4fed7e41b4cf40e1 -> trunk/be5b03dde96638f25ffd732a4fed7e41b4cf40e1 2025-09-07T07:55:57.4367728Z * [new tag] trunk/bffc7dd1f374d8408911cd22c6b3d6df39ded9b3 -> trunk/bffc7dd1f374d8408911cd22c6b3d6df39ded9b3 2025-09-07T07:55:57.4368884Z * [new tag] trunk/c024b1f5a18d5c5aee5cc2acdd4c52b24b93ffcf -> trunk/c024b1f5a18d5c5aee5cc2acdd4c52b24b93ffcf 2025-09-07T07:55:57.4370216Z * [new tag] trunk/c0983e6cc0acf71689e1851d12609e00b3f59371 -> trunk/c0983e6cc0acf71689e1851d12609e00b3f59371 2025-09-07T07:55:57.4371027Z * [new tag] trunk/c10195e723eeeedd099ed8b73eda7184ca618fad -> trunk/c10195e723eeeedd099ed8b73eda7184ca618fad 2025-09-07T07:55:57.4372262Z * [new tag] trunk/c157cf6488ade6a7ee2ce2d25b059e1335630a99 -> trunk/c157cf6488ade6a7ee2ce2d25b059e1335630a99 2025-09-07T07:55:57.4373428Z * [new tag] trunk/c2a30246172fd71d56529907ffd3c27b76b1f3a7 -> trunk/c2a30246172fd71d56529907ffd3c27b76b1f3a7 2025-09-07T07:55:57.4374856Z * [new tag] trunk/c32111149921b48bfef909293f1049e21619ed76 -> trunk/c32111149921b48bfef909293f1049e21619ed76 2025-09-07T07:55:57.4375807Z * [new tag] trunk/c37103234afc832dcad307e9016230810957c9d5 -> trunk/c37103234afc832dcad307e9016230810957c9d5 2025-09-07T07:55:57.4376990Z * [new tag] trunk/c3ceca2995cd35e1376c4b0704669bff1a81e836 -> trunk/c3ceca2995cd35e1376c4b0704669bff1a81e836 2025-09-07T07:55:57.4378248Z * [new tag] trunk/c3d54dea9febb1236d48d19e5d4876a63f2e20fd -> trunk/c3d54dea9febb1236d48d19e5d4876a63f2e20fd 2025-09-07T07:55:57.4379416Z * [new tag] trunk/c465b3d52c5687fe910d35a5c75341b77f821741 -> trunk/c465b3d52c5687fe910d35a5c75341b77f821741 2025-09-07T07:55:57.4380533Z * [new tag] trunk/c5b8a10be5e89396da916d1069ffcb7135f0372b -> trunk/c5b8a10be5e89396da916d1069ffcb7135f0372b 2025-09-07T07:55:57.4381470Z * [new tag] trunk/c7e41071a08f4045bc11ab60ec366d7357d56e30 -> trunk/c7e41071a08f4045bc11ab60ec366d7357d56e30 2025-09-07T07:55:57.4382677Z * [new tag] trunk/c98ddaca6d2e19ca37aff00c4ff0cda1e9a6ff65 -> trunk/c98ddaca6d2e19ca37aff00c4ff0cda1e9a6ff65 2025-09-07T07:55:57.4383650Z * [new tag] trunk/cb1e31362c7b53acf4ac95b9f8878064c184f03b -> trunk/cb1e31362c7b53acf4ac95b9f8878064c184f03b 2025-09-07T07:55:57.4385253Z * [new tag] trunk/cbfb005f7cce79974795b148e265f594f59477c8 -> trunk/cbfb005f7cce79974795b148e265f594f59477c8 2025-09-07T07:55:57.4386465Z * [new tag] trunk/cc5bdd12401bda835291d2f3cb297132ebdbf358 -> trunk/cc5bdd12401bda835291d2f3cb297132ebdbf358 2025-09-07T07:55:57.4387725Z * [new tag] trunk/cd529b686d54bbaa443f5b310140de48422d96c7 -> trunk/cd529b686d54bbaa443f5b310140de48422d96c7 2025-09-07T07:55:57.4388722Z * [new tag] trunk/cec0ff122815582af5302360aff03676558c5c87 -> trunk/cec0ff122815582af5302360aff03676558c5c87 2025-09-07T07:55:57.4389951Z * [new tag] trunk/d11720efdb563d02cf4f7d324311fb15a755268e -> trunk/d11720efdb563d02cf4f7d324311fb15a755268e 2025-09-07T07:55:57.4391096Z * [new tag] trunk/d1706d9128ae24d9048167e80d3fe5196d19035e -> trunk/d1706d9128ae24d9048167e80d3fe5196d19035e 2025-09-07T07:55:57.4392300Z * [new tag] trunk/d1a15abfdcaef138f2d9e93a9f46be44f30b766d -> trunk/d1a15abfdcaef138f2d9e93a9f46be44f30b766d 2025-09-07T07:55:57.4393500Z * [new tag] trunk/d232a95d4a79404ca05c1f52d37fde7339dcdf49 -> trunk/d232a95d4a79404ca05c1f52d37fde7339dcdf49 2025-09-07T07:55:57.4394931Z * [new tag] trunk/d2d4c8e9b2371c9aacfb771d9402ac7427b9778e -> trunk/d2d4c8e9b2371c9aacfb771d9402ac7427b9778e 2025-09-07T07:55:57.4395895Z * [new tag] trunk/d33840c542b387ab08ba49aa6c45aa9567fd9be7 -> trunk/d33840c542b387ab08ba49aa6c45aa9567fd9be7 2025-09-07T07:55:57.4397159Z * [new tag] trunk/d5643e8f3a648a99636bfa1f2a41d54bd3c0d0f1 -> trunk/d5643e8f3a648a99636bfa1f2a41d54bd3c0d0f1 2025-09-07T07:55:57.4398476Z * [new tag] trunk/d5b38410b5b6cf75c7a7389972777a6497926ee7 -> trunk/d5b38410b5b6cf75c7a7389972777a6497926ee7 2025-09-07T07:55:57.4399380Z * [new tag] trunk/d5e0f4202ba14632e4d14862ace096609e763462 -> trunk/d5e0f4202ba14632e4d14862ace096609e763462 2025-09-07T07:55:57.4400631Z * [new tag] trunk/d636c181f9140a7b59be10b36eae23039fc2bb72 -> trunk/d636c181f9140a7b59be10b36eae23039fc2bb72 2025-09-07T07:55:57.4402339Z * [new tag] trunk/d64718503728001a1e78168fd7f2d4ff23e57285 -> trunk/d64718503728001a1e78168fd7f2d4ff23e57285 2025-09-07T07:55:57.4403343Z * [new tag] trunk/d67c29ad22670320d676b02e394274af34e8e643 -> trunk/d67c29ad22670320d676b02e394274af34e8e643 2025-09-07T07:55:57.4404850Z * [new tag] trunk/d6b74568e2c98ce58ecc145b72ac66d4caf7ce95 -> trunk/d6b74568e2c98ce58ecc145b72ac66d4caf7ce95 2025-09-07T07:55:57.4405841Z * [new tag] trunk/d711f27845abd45007ccab6076649ebd896c2661 -> trunk/d711f27845abd45007ccab6076649ebd896c2661 2025-09-07T07:55:57.4407080Z * [new tag] trunk/d9d6dde0f42d4bcc8c97671ac50d5096c7e500ab -> trunk/d9d6dde0f42d4bcc8c97671ac50d5096c7e500ab 2025-09-07T07:55:57.4408228Z * [new tag] trunk/da4db4b33d1fdd046650cf19fdbac581a19bf2f9 -> trunk/da4db4b33d1fdd046650cf19fdbac581a19bf2f9 2025-09-07T07:55:57.4409129Z * [new tag] trunk/dac8a4b91c01c3bbc96f54e621b1ea4ffdbd29d1 -> trunk/dac8a4b91c01c3bbc96f54e621b1ea4ffdbd29d1 2025-09-07T07:55:57.4410598Z * [new tag] trunk/dbec08729fb9848bebed6048c63831b87170d061 -> trunk/dbec08729fb9848bebed6048c63831b87170d061 2025-09-07T07:55:57.4411516Z * [new tag] trunk/dcf385395d838f38c8dca25913578230dd43099a -> trunk/dcf385395d838f38c8dca25913578230dd43099a 2025-09-07T07:55:57.4412679Z * [new tag] trunk/dd2519abe83ec3c40d4797492434e41fe3b47e17 -> trunk/dd2519abe83ec3c40d4797492434e41fe3b47e17 2025-09-07T07:55:57.4414016Z * [new tag] trunk/dec72ea4b006dd0fbcaaaa106ad273d73807ab9d -> trunk/dec72ea4b006dd0fbcaaaa106ad273d73807ab9d 2025-09-07T07:55:57.4415244Z * [new tag] trunk/e0a62b266c021b910ce6dc02a6c9429210487717 -> trunk/e0a62b266c021b910ce6dc02a6c9429210487717 2025-09-07T07:55:57.4416432Z * [new tag] trunk/e19e02c84c9dcc408375e5cae3b0709c18b99228 -> trunk/e19e02c84c9dcc408375e5cae3b0709c18b99228 2025-09-07T07:55:57.4417599Z * [new tag] trunk/e304ea4e69d3a7deeb7e48c7450c214a4c953937 -> trunk/e304ea4e69d3a7deeb7e48c7450c214a4c953937 2025-09-07T07:55:57.4418767Z * [new tag] trunk/e3068cdb446adefb5a875616ba37a60235391439 -> trunk/e3068cdb446adefb5a875616ba37a60235391439 2025-09-07T07:55:57.4419893Z * [new tag] trunk/e381d4b0205d5f126c1de534f867ba776f7c3ee6 -> trunk/e381d4b0205d5f126c1de534f867ba776f7c3ee6 2025-09-07T07:55:57.4421100Z * [new tag] trunk/e4bd0ff4f8981b805df32ea5b3550621965ea4f2 -> trunk/e4bd0ff4f8981b805df32ea5b3550621965ea4f2 2025-09-07T07:55:57.4421990Z * [new tag] trunk/e532c9d4f1cdcbc1ea9628f55b9813e77847bdc7 -> trunk/e532c9d4f1cdcbc1ea9628f55b9813e77847bdc7 2025-09-07T07:55:57.4423165Z * [new tag] trunk/e92cd9415377403b6e90585e764639e2e0b5973b -> trunk/e92cd9415377403b6e90585e764639e2e0b5973b 2025-09-07T07:55:57.4424485Z * [new tag] trunk/e9481b6617b5576b099d8ca5798111592e9ad090 -> trunk/e9481b6617b5576b099d8ca5798111592e9ad090 2025-09-07T07:55:57.4425667Z * [new tag] trunk/ea1883dfd3e42defe37b11202b878bb76defa087 -> trunk/ea1883dfd3e42defe37b11202b878bb76defa087 2025-09-07T07:55:57.4426825Z * [new tag] trunk/eac3d6f04cfbbebe3d470dacd216da7d4b1f95a8 -> trunk/eac3d6f04cfbbebe3d470dacd216da7d4b1f95a8 2025-09-07T07:55:57.4427831Z * [new tag] trunk/eb18d32bda75189494d955aa001ade15f10333de -> trunk/eb18d32bda75189494d955aa001ade15f10333de 2025-09-07T07:55:57.4428997Z * [new tag] trunk/ef3be6726f7ff4b77c22db10cec5b686f9107ea9 -> trunk/ef3be6726f7ff4b77c22db10cec5b686f9107ea9 2025-09-07T07:55:57.4430140Z * [new tag] trunk/ef8aabd42422725026cb4dbf48aafa9efa226a04 -> trunk/ef8aabd42422725026cb4dbf48aafa9efa226a04 2025-09-07T07:55:57.4431366Z * [new tag] trunk/f00445b43eee57e20bb9316fa796ca23bf73373b -> trunk/f00445b43eee57e20bb9316fa796ca23bf73373b 2025-09-07T07:55:57.4432675Z * [new tag] trunk/f0c391102b754e3b145e8c59231d2df563487e37 -> trunk/f0c391102b754e3b145e8c59231d2df563487e37 2025-09-07T07:55:57.4433693Z * [new tag] trunk/f27985b7e796fb66a1b476284ba42d8cb360a751 -> trunk/f27985b7e796fb66a1b476284ba42d8cb360a751 2025-09-07T07:55:57.4435276Z * [new tag] trunk/f36f285953700f971552083a5da9d0ceacb63bbd -> trunk/f36f285953700f971552083a5da9d0ceacb63bbd 2025-09-07T07:55:57.4436463Z * [new tag] trunk/f3cebec39ebc110e1c8b06e741896585f7892dbb -> trunk/f3cebec39ebc110e1c8b06e741896585f7892dbb 2025-09-07T07:55:57.4437422Z * [new tag] trunk/f4c33cd44acac92c0b451a04da20ebe9370e5b0c -> trunk/f4c33cd44acac92c0b451a04da20ebe9370e5b0c 2025-09-07T07:55:57.4438830Z * [new tag] trunk/f612045ce105f008b2b675e2fc870163babeb2e8 -> trunk/f612045ce105f008b2b675e2fc870163babeb2e8 2025-09-07T07:55:57.4439961Z * [new tag] trunk/f8746b878dfc1e9639d42cbde832e9b9e792c86c -> trunk/f8746b878dfc1e9639d42cbde832e9b9e792c86c 2025-09-07T07:55:57.4441139Z * [new tag] trunk/f8ffa9194e26523e5f976d4a824d5cc58922727c -> trunk/f8ffa9194e26523e5f976d4a824d5cc58922727c 2025-09-07T07:55:57.4442307Z * [new tag] trunk/f981a7fa5230b98974291fdde32fe8488bc5d469 -> trunk/f981a7fa5230b98974291fdde32fe8488bc5d469 2025-09-07T07:55:57.4443444Z * [new tag] trunk/fbf3d2027daabbcb44d0af274b139be2a248a4f7 -> trunk/fbf3d2027daabbcb44d0af274b139be2a248a4f7 2025-09-07T07:55:57.4445148Z * [new tag] trunk/fca2601c9d628e1bd2d75c7318cd22c4e8c832aa -> trunk/fca2601c9d628e1bd2d75c7318cd22c4e8c832aa 2025-09-07T07:55:57.4446317Z * [new tag] trunk/fea20775ad96bdca972a1811d7d3372f368614ab -> trunk/fea20775ad96bdca972a1811d7d3372f368614ab 2025-09-07T07:55:57.4447186Z * [new tag] trunk/fefee081642f87419a21dc852f7167d4640443cd -> trunk/fefee081642f87419a21dc852f7167d4640443cd 2025-09-07T07:55:57.4448147Z * [new tag] v0.1.1 -> v0.1.1 2025-09-07T07:55:57.4449184Z * [new tag] v0.1.10 -> v0.1.10 2025-09-07T07:55:57.4450195Z * [new tag] v0.1.11 -> v0.1.11 2025-09-07T07:55:57.4451199Z * [new tag] v0.1.12 -> v0.1.12 2025-09-07T07:55:57.4452177Z * [new tag] v0.1.2 -> v0.1.2 2025-09-07T07:55:57.4453155Z * [new tag] v0.1.3 -> v0.1.3 2025-09-07T07:55:57.4454265Z * [new tag] v0.1.4 -> v0.1.4 2025-09-07T07:55:57.4455306Z * [new tag] v0.1.5 -> v0.1.5 2025-09-07T07:55:57.4456324Z * [new tag] v0.1.6 -> v0.1.6 2025-09-07T07:55:57.4457143Z * [new tag] v0.1.7 -> v0.1.7 2025-09-07T07:55:57.4458183Z * [new tag] v0.1.8 -> v0.1.8 2025-09-07T07:55:57.4459193Z * [new tag] v0.1.9 -> v0.1.9 2025-09-07T07:55:57.4460135Z * [new tag] v0.2.0 -> v0.2.0 2025-09-07T07:55:57.4461214Z * [new tag] v0.3.0 -> v0.3.0 2025-09-07T07:55:57.4462305Z * [new tag] v0.3.1 -> v0.3.1 2025-09-07T07:55:57.4463279Z * [new tag] v0.4.0 -> v0.4.0 2025-09-07T07:55:57.4464563Z * [new tag] v0.4.1 -> v0.4.1 2025-09-07T07:55:57.4465627Z * [new tag] v1.0.0 -> v1.0.0 2025-09-07T07:55:57.4466652Z * [new tag] v1.0.0a0 -> v1.0.0a0 2025-09-07T07:55:57.4467674Z * [new tag] v1.0.1 -> v1.0.1 2025-09-07T07:55:57.4468761Z * [new tag] v1.0rc0 -> v1.0rc0 2025-09-07T07:55:57.4469596Z * [new tag] v1.0rc1 -> v1.0rc1 2025-09-07T07:55:57.4470956Z * [new tag] v1.1.0 -> v1.1.0 2025-09-07T07:55:57.4471785Z * [new tag] v1.1.0a0 -> v1.1.0a0 2025-09-07T07:55:57.4473082Z * [new tag] v1.10.0 -> v1.10.0 2025-09-07T07:55:57.4474582Z * [new tag] v1.10.0-rc1 -> v1.10.0-rc1 2025-09-07T07:55:57.4475590Z * [new tag] v1.10.0-rc2 -> v1.10.0-rc2 2025-09-07T07:55:57.4476368Z * [new tag] v1.10.0-rc3 -> v1.10.0-rc3 2025-09-07T07:55:57.4477678Z * [new tag] v1.10.1 -> v1.10.1 2025-09-07T07:55:57.4478521Z * [new tag] v1.10.1-rc1 -> v1.10.1-rc1 2025-09-07T07:55:57.4479523Z * [new tag] v1.10.2 -> v1.10.2 2025-09-07T07:55:57.4480243Z * [new tag] v1.10.2-rc1 -> v1.10.2-rc1 2025-09-07T07:55:57.4481434Z * [new tag] v1.11.0 -> v1.11.0 2025-09-07T07:55:57.4482585Z * [new tag] v1.11.0-rc1 -> v1.11.0-rc1 2025-09-07T07:55:57.4483848Z * [new tag] v1.11.0-rc2 -> v1.11.0-rc2 2025-09-07T07:55:57.4485109Z * [new tag] v1.11.0-rc3 -> v1.11.0-rc3 2025-09-07T07:55:57.4486255Z * [new tag] v1.11.0-rc4 -> v1.11.0-rc4 2025-09-07T07:55:57.4487268Z * [new tag] v1.11.0-rc5 -> v1.11.0-rc5 2025-09-07T07:55:57.4488046Z * [new tag] v1.11.0-rc6 -> v1.11.0-rc6 2025-09-07T07:55:57.4489061Z * [new tag] v1.11.0-rc7 -> v1.11.0-rc7 2025-09-07T07:55:57.4490132Z * [new tag] v1.12.0 -> v1.12.0 2025-09-07T07:55:57.4491216Z * [new tag] v1.12.0-rc1 -> v1.12.0-rc1 2025-09-07T07:55:57.4492381Z * [new tag] v1.12.0-rc2 -> v1.12.0-rc2 2025-09-07T07:55:57.4493471Z * [new tag] v1.12.0-rc3 -> v1.12.0-rc3 2025-09-07T07:55:57.4494805Z * [new tag] v1.12.0-rc4 -> v1.12.0-rc4 2025-09-07T07:55:57.4495916Z * [new tag] v1.12.0-rc5 -> v1.12.0-rc5 2025-09-07T07:55:57.4497052Z * [new tag] v1.12.0-rc6 -> v1.12.0-rc6 2025-09-07T07:55:57.4497876Z * [new tag] v1.12.0-rc7 -> v1.12.0-rc7 2025-09-07T07:55:57.4498851Z * [new tag] v1.12.0-rc8 -> v1.12.0-rc8 2025-09-07T07:55:57.4499805Z * [new tag] v1.12.1 -> v1.12.1 2025-09-07T07:55:57.4501152Z * [new tag] v1.12.1-rc1 -> v1.12.1-rc1 2025-09-07T07:55:57.4502240Z * [new tag] v1.12.1-rc2 -> v1.12.1-rc2 2025-09-07T07:55:57.4503438Z * [new tag] v1.12.1-rc3 -> v1.12.1-rc3 2025-09-07T07:55:57.4504902Z * [new tag] v1.12.1-rc4 -> v1.12.1-rc4 2025-09-07T07:55:57.4505691Z * [new tag] v1.12.1-rc5 -> v1.12.1-rc5 2025-09-07T07:55:57.4506899Z * [new tag] v1.13.0 -> v1.13.0 2025-09-07T07:55:57.4507999Z * [new tag] v1.13.0-rc1 -> v1.13.0-rc1 2025-09-07T07:55:57.4509026Z * [new tag] v1.13.0-rc2 -> v1.13.0-rc2 2025-09-07T07:55:57.4510147Z * [new tag] v1.13.0-rc3 -> v1.13.0-rc3 2025-09-07T07:55:57.4511364Z * [new tag] v1.13.0-rc4 -> v1.13.0-rc4 2025-09-07T07:55:57.4512201Z * [new tag] v1.13.0-rc5 -> v1.13.0-rc5 2025-09-07T07:55:57.4513275Z * [new tag] v1.13.0-rc6 -> v1.13.0-rc6 2025-09-07T07:55:57.4514664Z * [new tag] v1.13.1 -> v1.13.1 2025-09-07T07:55:57.4515788Z * [new tag] v1.13.1-rc1 -> v1.13.1-rc1 2025-09-07T07:55:57.4516685Z * [new tag] v1.2.0 -> v1.2.0 2025-09-07T07:55:57.4517959Z * [new tag] v1.2.0a0 -> v1.2.0a0 2025-09-07T07:55:57.4519044Z * [new tag] v1.3.0 -> v1.3.0 2025-09-07T07:55:57.4520156Z * [new tag] v1.3.0a0 -> v1.3.0a0 2025-09-07T07:55:57.4520980Z * [new tag] v1.3.1 -> v1.3.1 2025-09-07T07:55:57.4522150Z * [new tag] v1.4.0 -> v1.4.0 2025-09-07T07:55:57.4523170Z * [new tag] v1.4.0a0 -> v1.4.0a0 2025-09-07T07:55:57.4524132Z * [new tag] v1.4.1 -> v1.4.1 2025-09-07T07:55:57.4525541Z * [new tag] v1.5.0 -> v1.5.0 2025-09-07T07:55:57.4526718Z * [new tag] v1.5.0-rc1 -> v1.5.0-rc1 2025-09-07T07:55:57.4527836Z * [new tag] v1.5.0-rc2 -> v1.5.0-rc2 2025-09-07T07:55:57.4529009Z * [new tag] v1.5.0-rc3 -> v1.5.0-rc3 2025-09-07T07:55:57.4530187Z * [new tag] v1.5.0-rc4 -> v1.5.0-rc4 2025-09-07T07:55:57.4531155Z * [new tag] v1.5.0-rc5 -> v1.5.0-rc5 2025-09-07T07:55:57.4532360Z * [new tag] v1.5.1 -> v1.5.1 2025-09-07T07:55:57.4533189Z * [new tag] v1.5.1-rc1 -> v1.5.1-rc1 2025-09-07T07:55:57.4534538Z * [new tag] v1.6.0 -> v1.6.0 2025-09-07T07:55:57.4535736Z * [new tag] v1.6.0-rc1 -> v1.6.0-rc1 2025-09-07T07:55:57.4536889Z * [new tag] v1.6.0-rc2 -> v1.6.0-rc2 2025-09-07T07:55:57.4538023Z * [new tag] v1.6.0-rc3 -> v1.6.0-rc3 2025-09-07T07:55:57.4539171Z * [new tag] v1.6.0-rc4 -> v1.6.0-rc4 2025-09-07T07:55:57.4540267Z * [new tag] v1.6.0-rc5 -> v1.6.0-rc5 2025-09-07T07:55:57.4541377Z * [new tag] v1.6.0-rc6 -> v1.6.0-rc6 2025-09-07T07:55:57.4542334Z * [new tag] v1.6.0-rc7 -> v1.6.0-rc7 2025-09-07T07:55:57.4543495Z * [new tag] v1.7.0 -> v1.7.0 2025-09-07T07:55:57.4544969Z * [new tag] v1.7.0-rc1 -> v1.7.0-rc1 2025-09-07T07:55:57.4546125Z * [new tag] v1.7.0-rc2 -> v1.7.0-rc2 2025-09-07T07:55:57.4547276Z * [new tag] v1.7.0-rc3 -> v1.7.0-rc3 2025-09-07T07:55:57.4548118Z * [new tag] v1.7.0-rc4 -> v1.7.0-rc4 2025-09-07T07:55:57.4549395Z * [new tag] v1.7.1 -> v1.7.1 2025-09-07T07:55:57.4550659Z * [new tag] v1.7.1-rc1 -> v1.7.1-rc1 2025-09-07T07:55:57.4551851Z * [new tag] v1.7.1-rc2 -> v1.7.1-rc2 2025-09-07T07:55:57.4552862Z * [new tag] v1.7.1-rc3 -> v1.7.1-rc3 2025-09-07T07:55:57.4554120Z * [new tag] v1.8.0 -> v1.8.0 2025-09-07T07:55:57.4555191Z * [new tag] v1.8.0-rc1 -> v1.8.0-rc1 2025-09-07T07:55:57.4556358Z * [new tag] v1.8.0-rc2 -> v1.8.0-rc2 2025-09-07T07:55:57.4557582Z * [new tag] v1.8.0-rc3 -> v1.8.0-rc3 2025-09-07T07:55:57.4558751Z * [new tag] v1.8.0-rc4 -> v1.8.0-rc4 2025-09-07T07:55:57.4559761Z * [new tag] v1.8.0-rc5 -> v1.8.0-rc5 2025-09-07T07:55:57.4560731Z * [new tag] v1.8.1 -> v1.8.1 2025-09-07T07:55:57.4562159Z * [new tag] v1.8.1-rc1 -> v1.8.1-rc1 2025-09-07T07:55:57.4562806Z * [new tag] v1.8.1-rc2 -> v1.8.1-rc2 2025-09-07T07:55:57.4564060Z * [new tag] v1.8.1-rc3 -> v1.8.1-rc3 2025-09-07T07:55:57.4565801Z * [new tag] v1.8.2 -> v1.8.2 2025-09-07T07:55:57.4566613Z * [new tag] v1.8.2-rc1 -> v1.8.2-rc1 2025-09-07T07:55:57.4567925Z * [new tag] v1.9.0 -> v1.9.0 2025-09-07T07:55:57.4569069Z * [new tag] v1.9.0-rc1 -> v1.9.0-rc1 2025-09-07T07:55:57.4570288Z * [new tag] v1.9.0-rc2 -> v1.9.0-rc2 2025-09-07T07:55:57.4571561Z * [new tag] v1.9.0-rc3 -> v1.9.0-rc3 2025-09-07T07:55:57.4572582Z * [new tag] v1.9.0-rc4 -> v1.9.0-rc4 2025-09-07T07:55:57.4573827Z * [new tag] v1.9.1 -> v1.9.1 2025-09-07T07:55:57.4575276Z * [new tag] v1.9.1-rc1 -> v1.9.1-rc1 2025-09-07T07:55:57.4576113Z * [new tag] v1.9.1-rc2 -> v1.9.1-rc2 2025-09-07T07:55:57.4577410Z * [new tag] v2.0.0 -> v2.0.0 2025-09-07T07:55:57.4578534Z * [new tag] v2.0.0-rc1 -> v2.0.0-rc1 2025-09-07T07:55:57.4579757Z * [new tag] v2.0.0-rc2 -> v2.0.0-rc2 2025-09-07T07:55:57.4580987Z * [new tag] v2.0.0-rc3 -> v2.0.0-rc3 2025-09-07T07:55:57.4582103Z * [new tag] v2.0.0-rc4 -> v2.0.0-rc4 2025-09-07T07:55:57.4583318Z * [new tag] v2.0.0-rc5 -> v2.0.0-rc5 2025-09-07T07:55:57.4584503Z * [new tag] v2.0.0-rc6 -> v2.0.0-rc6 2025-09-07T07:55:57.4585896Z * [new tag] v2.0.1 -> v2.0.1 2025-09-07T07:55:57.4587117Z * [new tag] v2.0.1-rc1 -> v2.0.1-rc1 2025-09-07T07:55:57.4588113Z * [new tag] v2.0.1-rc2 -> v2.0.1-rc2 2025-09-07T07:55:57.4589151Z * [new tag] v2.0.1-rc3 -> v2.0.1-rc3 2025-09-07T07:55:57.4590109Z * [new tag] v2.0.1-rc4 -> v2.0.1-rc4 2025-09-07T07:55:57.4591695Z * [new tag] v2.1.0 -> v2.1.0 2025-09-07T07:55:57.4592895Z * [new tag] v2.1.0-rc1 -> v2.1.0-rc1 2025-09-07T07:55:57.4594224Z * [new tag] v2.1.0-rc2 -> v2.1.0-rc2 2025-09-07T07:55:57.4595531Z * [new tag] v2.1.0-rc3 -> v2.1.0-rc3 2025-09-07T07:55:57.4596781Z * [new tag] v2.1.0-rc4 -> v2.1.0-rc4 2025-09-07T07:55:57.4598103Z * [new tag] v2.1.0-rc5 -> v2.1.0-rc5 2025-09-07T07:55:57.4599144Z * [new tag] v2.1.0-rc6 -> v2.1.0-rc6 2025-09-07T07:55:57.4600305Z * [new tag] v2.1.1 -> v2.1.1 2025-09-07T07:55:57.4601498Z * [new tag] v2.1.1-rc1 -> v2.1.1-rc1 2025-09-07T07:55:57.4602729Z * [new tag] v2.1.1-rc2 -> v2.1.1-rc2 2025-09-07T07:55:57.4604106Z * [new tag] v2.1.1-rc3 -> v2.1.1-rc3 2025-09-07T07:55:57.4605466Z * [new tag] v2.1.1-rc4 -> v2.1.1-rc4 2025-09-07T07:55:57.4606638Z * [new tag] v2.1.1-rc5 -> v2.1.1-rc5 2025-09-07T07:55:57.4607668Z * [new tag] v2.1.1-rc6 -> v2.1.1-rc6 2025-09-07T07:55:57.4608803Z * [new tag] v2.1.2 -> v2.1.2 2025-09-07T07:55:57.4610003Z * [new tag] v2.1.2-rc1 -> v2.1.2-rc1 2025-09-07T07:55:57.4611266Z * [new tag] v2.1.2-rc2 -> v2.1.2-rc2 2025-09-07T07:55:57.4612488Z * [new tag] v2.1.2-rc3 -> v2.1.2-rc3 2025-09-07T07:55:57.4613532Z * [new tag] v2.2.0 -> v2.2.0 2025-09-07T07:55:57.4615575Z * [new tag] v2.2.0-rc1 -> v2.2.0-rc1 2025-09-07T07:55:57.4616684Z * [new tag] v2.2.0-rc2 -> v2.2.0-rc2 2025-09-07T07:55:57.4617851Z * [new tag] v2.2.0-rc3 -> v2.2.0-rc3 2025-09-07T07:55:57.4619059Z * [new tag] v2.2.0-rc4 -> v2.2.0-rc4 2025-09-07T07:55:57.4620199Z * [new tag] v2.2.0-rc5 -> v2.2.0-rc5 2025-09-07T07:55:57.4621360Z * [new tag] v2.2.0-rc6 -> v2.2.0-rc6 2025-09-07T07:55:57.4622363Z * [new tag] v2.2.0-rc7 -> v2.2.0-rc7 2025-09-07T07:55:57.4623386Z * [new tag] v2.2.0-rc8 -> v2.2.0-rc8 2025-09-07T07:55:57.4625008Z * [new tag] v2.2.1 -> v2.2.1 2025-09-07T07:55:57.4626281Z * [new tag] v2.2.1-rc1 -> v2.2.1-rc1 2025-09-07T07:55:57.4627337Z * [new tag] v2.2.1-rc2 -> v2.2.1-rc2 2025-09-07T07:55:57.4628316Z * [new tag] v2.2.1-rc3 -> v2.2.1-rc3 2025-09-07T07:55:57.4629255Z * [new tag] v2.2.2 -> v2.2.2 2025-09-07T07:55:57.4630591Z * [new tag] v2.2.2-rc1 -> v2.2.2-rc1 2025-09-07T07:55:57.4631604Z * [new tag] v2.2.2-rc2 -> v2.2.2-rc2 2025-09-07T07:55:57.4632613Z * [new tag] v2.2.2-rc3 -> v2.2.2-rc3 2025-09-07T07:55:57.4633974Z * [new tag] v2.3.0 -> v2.3.0 2025-09-07T07:55:57.4635264Z * [new tag] v2.3.0-rc1 -> v2.3.0-rc1 2025-09-07T07:55:57.4636460Z * [new tag] v2.3.0-rc10 -> v2.3.0-rc10 2025-09-07T07:55:57.4637843Z * [new tag] v2.3.0-rc11 -> v2.3.0-rc11 2025-09-07T07:55:57.4638898Z * [new tag] v2.3.0-rc12 -> v2.3.0-rc12 2025-09-07T07:55:57.4640098Z * [new tag] v2.3.0-rc2 -> v2.3.0-rc2 2025-09-07T07:55:57.4641378Z * [new tag] v2.3.0-rc3 -> v2.3.0-rc3 2025-09-07T07:55:57.4642572Z * [new tag] v2.3.0-rc4 -> v2.3.0-rc4 2025-09-07T07:55:57.4643953Z * [new tag] v2.3.0-rc5 -> v2.3.0-rc5 2025-09-07T07:55:57.4645136Z * [new tag] v2.3.0-rc6 -> v2.3.0-rc6 2025-09-07T07:55:57.4646378Z * [new tag] v2.3.0-rc7 -> v2.3.0-rc7 2025-09-07T07:55:57.4647582Z * [new tag] v2.3.0-rc8 -> v2.3.0-rc8 2025-09-07T07:55:57.4648566Z * [new tag] v2.3.0-rc9 -> v2.3.0-rc9 2025-09-07T07:55:57.4649596Z * [new tag] v2.3.1 -> v2.3.1 2025-09-07T07:55:57.4650839Z * [new tag] v2.3.1-rc1 -> v2.3.1-rc1 2025-09-07T07:55:57.4652023Z * [new tag] v2.3.1-rc2 -> v2.3.1-rc2 2025-09-07T07:55:57.4654142Z * [new tag] v2.3.1-rc3 -> v2.3.1-rc3 2025-09-07T07:55:57.4655485Z * [new tag] v2.4.0 -> v2.4.0 2025-09-07T07:55:57.4656662Z * [new tag] v2.4.0-rc1 -> v2.4.0-rc1 2025-09-07T07:55:57.4657949Z * [new tag] v2.4.0-rc2 -> v2.4.0-rc2 2025-09-07T07:55:57.4659170Z * [new tag] v2.4.0-rc3 -> v2.4.0-rc3 2025-09-07T07:55:57.4660346Z * [new tag] v2.4.0-rc4 -> v2.4.0-rc4 2025-09-07T07:55:57.4661637Z * [new tag] v2.4.0-rc5 -> v2.4.0-rc5 2025-09-07T07:55:57.4663006Z * [new tag] v2.4.0-rc6 -> v2.4.0-rc6 2025-09-07T07:55:57.4664324Z * [new tag] v2.4.0-rc7 -> v2.4.0-rc7 2025-09-07T07:55:57.4665558Z * [new tag] v2.4.0-rc8 -> v2.4.0-rc8 2025-09-07T07:55:57.4666748Z * [new tag] v2.4.0-rc9 -> v2.4.0-rc9 2025-09-07T07:55:57.4667882Z * [new tag] v2.4.1 -> v2.4.1 2025-09-07T07:55:57.4669170Z * [new tag] v2.4.1-rc1 -> v2.4.1-rc1 2025-09-07T07:55:57.4670365Z * [new tag] v2.4.1-rc2 -> v2.4.1-rc2 2025-09-07T07:55:57.4671665Z * [new tag] v2.4.1-rc3 -> v2.4.1-rc3 2025-09-07T07:55:57.4672823Z * [new tag] v2.5.0 -> v2.5.0 2025-09-07T07:55:57.4674234Z * [new tag] v2.5.0-rc1 -> v2.5.0-rc1 2025-09-07T07:55:57.4675316Z * [new tag] v2.5.0-rc10 -> v2.5.0-rc10 2025-09-07T07:55:57.4676500Z * [new tag] v2.5.0-rc2 -> v2.5.0-rc2 2025-09-07T07:55:57.4677768Z * [new tag] v2.5.0-rc3 -> v2.5.0-rc3 2025-09-07T07:55:57.4679049Z * [new tag] v2.5.0-rc4 -> v2.5.0-rc4 2025-09-07T07:55:57.4680276Z * [new tag] v2.5.0-rc5 -> v2.5.0-rc5 2025-09-07T07:55:57.4681603Z * [new tag] v2.5.0-rc6 -> v2.5.0-rc6 2025-09-07T07:55:57.4682748Z * [new tag] v2.5.0-rc7 -> v2.5.0-rc7 2025-09-07T07:55:57.4684100Z * [new tag] v2.5.0-rc8 -> v2.5.0-rc8 2025-09-07T07:55:57.4685508Z * [new tag] v2.5.0-rc9 -> v2.5.0-rc9 2025-09-07T07:55:57.4686534Z * [new tag] v2.5.1 -> v2.5.1 2025-09-07T07:55:57.4687564Z * [new tag] v2.5.1-rc1 -> v2.5.1-rc1 2025-09-07T07:55:57.4688558Z * [new tag] v2.6.0 -> v2.6.0 2025-09-07T07:55:57.4689856Z * [new tag] v2.6.0-rc1 -> v2.6.0-rc1 2025-09-07T07:55:57.4691092Z * [new tag] v2.6.0-rc2 -> v2.6.0-rc2 2025-09-07T07:55:57.4692415Z * [new tag] v2.6.0-rc3 -> v2.6.0-rc3 2025-09-07T07:55:57.4693585Z * [new tag] v2.6.0-rc4 -> v2.6.0-rc4 2025-09-07T07:55:57.4695473Z * [new tag] v2.6.0-rc5 -> v2.6.0-rc5 2025-09-07T07:55:57.4696789Z * [new tag] v2.6.0-rc6 -> v2.6.0-rc6 2025-09-07T07:55:57.4698090Z * [new tag] v2.6.0-rc7 -> v2.6.0-rc7 2025-09-07T07:55:57.4699377Z * [new tag] v2.6.0-rc8 -> v2.6.0-rc8 2025-09-07T07:55:57.4700643Z * [new tag] v2.6.0-rc9 -> v2.6.0-rc9 2025-09-07T07:55:57.4702049Z * [new tag] v2.7.0 -> v2.7.0 2025-09-07T07:55:57.4703282Z * [new tag] v2.7.0-rc1 -> v2.7.0-rc1 2025-09-07T07:55:57.4704618Z * [new tag] v2.7.0-rc10 -> v2.7.0-rc10 2025-09-07T07:55:57.4705985Z * [new tag] v2.7.0-rc2 -> v2.7.0-rc2 2025-09-07T07:55:57.4707197Z * [new tag] v2.7.0-rc3 -> v2.7.0-rc3 2025-09-07T07:55:57.4708491Z * [new tag] v2.7.0-rc4 -> v2.7.0-rc4 2025-09-07T07:55:57.4709672Z * [new tag] v2.7.0-rc5 -> v2.7.0-rc5 2025-09-07T07:55:57.4710852Z * [new tag] v2.7.0-rc6 -> v2.7.0-rc6 2025-09-07T07:55:57.4712103Z * [new tag] v2.7.0-rc7 -> v2.7.0-rc7 2025-09-07T07:55:57.4713393Z * [new tag] v2.7.0-rc8 -> v2.7.0-rc8 2025-09-07T07:55:57.4715129Z * [new tag] v2.7.0-rc9 -> v2.7.0-rc9 2025-09-07T07:55:57.4715824Z * [new tag] v2.7.1 -> v2.7.1 2025-09-07T07:55:57.4717301Z * [new tag] v2.7.1-rc1 -> v2.7.1-rc1 2025-09-07T07:55:57.4718572Z * [new tag] v2.7.1-rc2 -> v2.7.1-rc2 2025-09-07T07:55:57.4719864Z * [new tag] v2.7.1-rc3 -> v2.7.1-rc3 2025-09-07T07:55:57.4721174Z * [new tag] v2.7.1-rc4 -> v2.7.1-rc4 2025-09-07T07:55:57.4722233Z * [new tag] v2.7.1-rc5 -> v2.7.1-rc5 2025-09-07T07:55:57.4723238Z * [new tag] v2.8.0 -> v2.8.0 2025-09-07T07:55:57.4724769Z * [new tag] v2.8.0-rc1 -> v2.8.0-rc1 2025-09-07T07:55:57.4726004Z * [new tag] v2.8.0-rc2 -> v2.8.0-rc2 2025-09-07T07:55:57.4727346Z * [new tag] v2.8.0-rc3 -> v2.8.0-rc3 2025-09-07T07:55:57.4728649Z * [new tag] v2.8.0-rc4 -> v2.8.0-rc4 2025-09-07T07:55:57.4729958Z * [new tag] v2.8.0-rc5 -> v2.8.0-rc5 2025-09-07T07:55:57.4731228Z * [new tag] v2.8.0-rc6 -> v2.8.0-rc6 2025-09-07T07:55:57.4732478Z * [new tag] v2.8.0-rc7 -> v2.8.0-rc7 2025-09-07T07:55:57.4733652Z * [new tag] v2.8.0-rc8 -> v2.8.0-rc8 2025-09-07T07:55:57.4735194Z * [new tag] whc_flight_1 -> whc_flight_1 2025-09-07T07:55:57.4736396Z * [new tag] whc_flight_2 -> whc_flight_2 2025-09-07T07:55:57.4737520Z * [new tag] whc_flight_4 -> whc_flight_4 2025-09-07T07:55:57.5578468Z [command]/usr/bin/git rev-parse --verify --quiet 93fb23d6fae7c4e82c4239a1033e522088742634^{object} 2025-09-07T07:55:57.5609056Z 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T07:55:57.5613631Z ##[endgroup] 2025-09-07T07:55:57.5614000Z ##[group]Determining the checkout info 2025-09-07T07:55:57.5615549Z ##[endgroup] 2025-09-07T07:55:57.5619917Z [command]/usr/bin/git sparse-checkout disable 2025-09-07T07:55:57.6887728Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-09-07T07:55:57.7207723Z ##[group]Checking out the ref 2025-09-07T07:55:57.7210900Z [command]/usr/bin/git checkout --progress --force 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T07:55:58.8737357Z Updating files: 61% (11911/19405) 2025-09-07T07:55:58.8779630Z Updating files: 62% (12032/19405) 2025-09-07T07:55:58.8819872Z Updating files: 63% (12226/19405) 2025-09-07T07:55:58.8861085Z Updating files: 64% (12420/19405) 2025-09-07T07:55:58.8902242Z Updating files: 65% (12614/19405) 2025-09-07T07:55:58.8942731Z Updating files: 66% (12808/19405) 2025-09-07T07:55:58.8982574Z Updating files: 67% (13002/19405) 2025-09-07T07:55:58.9021715Z Updating files: 68% (13196/19405) 2025-09-07T07:55:58.9185052Z Updating files: 69% (13390/19405) 2025-09-07T07:55:58.9410991Z Updating files: 70% (13584/19405) 2025-09-07T07:55:58.9452183Z Updating files: 71% (13778/19405) 2025-09-07T07:55:58.9520654Z Updating files: 72% (13972/19405) 2025-09-07T07:55:58.9714349Z Updating files: 73% (14166/19405) 2025-09-07T07:55:58.9906912Z Updating files: 74% (14360/19405) 2025-09-07T07:55:59.0363182Z Updating files: 75% (14554/19405) 2025-09-07T07:55:59.0503050Z Updating files: 76% (14748/19405) 2025-09-07T07:55:59.0610698Z Updating files: 77% (14942/19405) 2025-09-07T07:55:59.0832322Z Updating files: 78% (15136/19405) 2025-09-07T07:55:59.1047734Z Updating files: 79% (15330/19405) 2025-09-07T07:55:59.1302017Z Updating files: 80% (15524/19405) 2025-09-07T07:55:59.1534882Z Updating files: 81% (15719/19405) 2025-09-07T07:55:59.1748010Z Updating files: 82% (15913/19405) 2025-09-07T07:55:59.1865831Z Updating files: 83% (16107/19405) 2025-09-07T07:55:59.1996753Z Updating files: 84% (16301/19405) 2025-09-07T07:55:59.2147899Z Updating files: 85% (16495/19405) 2025-09-07T07:55:59.2279492Z Updating files: 86% (16689/19405) 2025-09-07T07:55:59.2409360Z Updating files: 87% (16883/19405) 2025-09-07T07:55:59.2511778Z Updating files: 88% (17077/19405) 2025-09-07T07:55:59.2647045Z Updating files: 89% (17271/19405) 2025-09-07T07:55:59.2810766Z Updating files: 90% (17465/19405) 2025-09-07T07:55:59.2922819Z Updating files: 91% (17659/19405) 2025-09-07T07:55:59.3061166Z Updating files: 92% (17853/19405) 2025-09-07T07:55:59.3238653Z Updating files: 93% (18047/19405) 2025-09-07T07:55:59.3431232Z Updating files: 94% (18241/19405) 2025-09-07T07:55:59.3582213Z Updating files: 95% (18435/19405) 2025-09-07T07:55:59.3736603Z Updating files: 96% (18629/19405) 2025-09-07T07:55:59.3907577Z Updating files: 97% (18823/19405) 2025-09-07T07:55:59.4154092Z Updating files: 98% (19017/19405) 2025-09-07T07:55:59.4302632Z Updating files: 99% (19211/19405) 2025-09-07T07:55:59.4303132Z Updating files: 100% (19405/19405) 2025-09-07T07:55:59.4303609Z Updating files: 100% (19405/19405), done. 2025-09-07T07:55:59.4709512Z Note: switching to '93fb23d6fae7c4e82c4239a1033e522088742634'. 2025-09-07T07:55:59.4709972Z 2025-09-07T07:55:59.4710315Z You are in 'detached HEAD' state. You can look around, make experimental 2025-09-07T07:55:59.4711102Z changes and commit them, and you can discard any commits you make in this 2025-09-07T07:55:59.4712416Z state without impacting any branches by switching back to a branch. 2025-09-07T07:55:59.4712883Z 2025-09-07T07:55:59.4713181Z If you want to create a new branch to retain commits you create, you may 2025-09-07T07:55:59.4714432Z do so (now or later) by using -c with the switch command. Example: 2025-09-07T07:55:59.4714892Z 2025-09-07T07:55:59.4715070Z git switch -c 2025-09-07T07:55:59.4715364Z 2025-09-07T07:55:59.4715522Z Or undo this operation with: 2025-09-07T07:55:59.4715790Z 2025-09-07T07:55:59.4715919Z git switch - 2025-09-07T07:55:59.4716119Z 2025-09-07T07:55:59.4716470Z Turn off this advice by setting config variable advice.detachedHead to false 2025-09-07T07:55:59.4716979Z 2025-09-07T07:55:59.4717385Z HEAD is now at 93fb23d6fae Build vLLM nightly wheels (#162000) 2025-09-07T07:55:59.4838630Z ##[endgroup] 2025-09-07T07:55:59.4839283Z ##[group]Setting up auth for fetching submodules 2025-09-07T07:55:59.4845031Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-09-07T07:55:59.6570251Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-09-07T07:55:59.6606124Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-09-07T07:55:59.7516287Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-09-07T07:55:59.7944837Z ##[endgroup] 2025-09-07T07:55:59.7945146Z ##[group]Fetching submodules 2025-09-07T07:55:59.7951017Z [command]/usr/bin/git submodule sync --recursive 2025-09-07T07:55:59.8244874Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2025-09-07T07:56:00.0018350Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni' 2025-09-07T07:56:00.1215733Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16' 2025-09-07T07:56:00.2421827Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv' 2025-09-07T07:56:00.3718843Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK' 2025-09-07T07:56:00.4993383Z Submodule 'third_party/NVTX' (https://github.com/NVIDIA/NVTX.git) registered for path 'third_party/NVTX' 2025-09-07T07:56:00.6176298Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator' 2025-09-07T07:56:00.7396788Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK' 2025-09-07T07:56:00.8702137Z Submodule 'third_party/aiter' (https://github.com/ROCm/aiter.git) registered for path 'third_party/aiter' 2025-09-07T07:56:00.9944145Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark' 2025-09-07T07:56:01.1176273Z Submodule 'third_party/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/composable_kernel' 2025-09-07T07:56:01.2476035Z Submodule 'third_party/cpp-httplib' (https://github.com/yhirose/cpp-httplib.git) registered for path 'third_party/cpp-httplib' 2025-09-07T07:56:01.3681175Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo' 2025-09-07T07:56:01.4967262Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend' 2025-09-07T07:56:01.6206269Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass' 2025-09-07T07:56:01.7496336Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm' 2025-09-07T07:56:01.8800762Z Submodule 'third_party/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'third_party/flash-attention' 2025-09-07T07:56:02.0099163Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers' 2025-09-07T07:56:02.1371855Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt' 2025-09-07T07:56:02.2410108Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp' 2025-09-07T07:56:02.3433084Z Submodule 'third_party/gloo' (https://github.com/pytorch/gloo) registered for path 'third_party/gloo' 2025-09-07T07:56:02.4745836Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest' 2025-09-07T07:56:02.5981098Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep' 2025-09-07T07:56:02.7322130Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi' 2025-09-07T07:56:02.8630759Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto' 2025-09-07T07:56:03.0662323Z Submodule 'third_party/kleidiai' (https://github.com/ARM-software/kleidiai.git) registered for path 'third_party/kleidiai' 2025-09-07T07:56:03.1890897Z Submodule 'third_party/mimalloc' (https://github.com/microsoft/mimalloc.git) registered for path 'third_party/mimalloc' 2025-09-07T07:56:03.3321864Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann' 2025-09-07T07:56:03.5364875Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx' 2025-09-07T07:56:03.7370139Z Submodule 'third_party/opentelemetry-cpp' (https://github.com/open-telemetry/opentelemetry-cpp.git) registered for path 'third_party/opentelemetry-cpp' 2025-09-07T07:56:03.8673423Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft' 2025-09-07T07:56:03.9889303Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf' 2025-09-07T07:56:04.1107037Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd' 2025-09-07T07:56:04.2338380Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool' 2025-09-07T07:56:04.3608072Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11' 2025-09-07T07:56:04.4841065Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy' 2025-09-07T07:56:04.6047348Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef' 2025-09-07T07:56:04.7259190Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe' 2025-09-07T07:56:04.7306920Z Cloning into '/home/charlie/_work/pytorch/pytorch/android/libs/fbjni'... 2025-09-07T07:56:07.1252636Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/FP16'... 2025-09-07T07:56:09.4088993Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/FXdiv'... 2025-09-07T07:56:11.6483681Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/NNPACK'... 2025-09-07T07:56:14.1888184Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/NVTX'... 2025-09-07T07:56:17.3919024Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'... 2025-09-07T07:56:23.3405833Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/XNNPACK'... 2025-09-07T07:56:48.7841979Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/aiter'... 2025-09-07T07:57:01.3086195Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/benchmark'... 2025-09-07T07:57:05.0528869Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/composable_kernel'... 2025-09-07T07:57:15.1178125Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/cpp-httplib'... 2025-09-07T07:57:18.4170110Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/cpuinfo'... 2025-09-07T07:57:21.9898490Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/cudnn_frontend'... 2025-09-07T07:57:27.7846396Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/cutlass'... 2025-09-07T07:57:36.3253085Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/fbgemm'... 2025-09-07T07:57:42.6861326Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/flash-attention'... 2025-09-07T07:57:46.1132269Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/flatbuffers'... 2025-09-07T07:57:51.3364916Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/fmt'... 2025-09-07T07:57:56.1602595Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'... 2025-09-07T07:57:58.8601952Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/gloo'... 2025-09-07T07:58:01.4887113Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/googletest'... 2025-09-07T07:58:05.6169610Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/ideep'... 2025-09-07T07:58:08.2309293Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/ittapi'... 2025-09-07T07:58:11.2666348Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/kineto'... 2025-09-07T07:58:18.0676036Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/kleidiai'... 2025-09-07T07:58:20.7293553Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/mimalloc'... 2025-09-07T07:58:24.4862944Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/nlohmann'... 2025-09-07T07:58:47.7717551Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/onnx'... 2025-09-07T07:58:57.4467628Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/opentelemetry-cpp'... 2025-09-07T07:59:08.1602737Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/pocketfft'... 2025-09-07T07:59:10.7676702Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/protobuf'... 2025-09-07T07:59:31.3455827Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/psimd'... 2025-09-07T07:59:33.6415200Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/pthreadpool'... 2025-09-07T07:59:35.9330291Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/pybind11'... 2025-09-07T07:59:40.2317749Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/python-peachpy'... 2025-09-07T07:59:42.5416235Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/sleef'... 2025-09-07T07:59:45.9217733Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/tensorpipe'... 2025-09-07T07:59:48.8614921Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2025-09-07T07:59:48.9412521Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2025-09-07T07:59:49.0301762Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2025-09-07T07:59:49.0833362Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2025-09-07T07:59:49.2140830Z Submodule path 'third_party/NVTX': checked out '2942f167cc30c5e3a44a2aecd5b0d9c07ff61a07' 2025-09-07T07:59:49.3458554Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1' 2025-09-07T07:59:50.1688360Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883' 2025-09-07T07:59:50.3805275Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150' 2025-09-07T07:59:50.4695135Z Submodule '3rdparty/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T07:59:50.4734152Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/aiter/3rdparty/composable_kernel'... 2025-09-07T08:00:00.0151280Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf' 2025-09-07T08:00:00.1050493Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f' 2025-09-07T08:00:00.5144739Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-09-07T08:00:00.6452496Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246' 2025-09-07T08:00:00.8251079Z Submodule path 'third_party/cpuinfo': checked out '5e3d2445e6a84d9599bee2bf78edbb4d80865e1d' 2025-09-07T08:00:00.9148621Z Submodule path 'third_party/cudnn_frontend': checked out 'f937055efc6d414d11f4c6577e3977fe74f35fb6' 2025-09-07T08:00:01.6197521Z Submodule path 'third_party/cutlass': checked out 'e51efbfe18fe4f4cbb66ab814c55bf4aa0185491' 2025-09-07T08:00:02.2310600Z Submodule path 'third_party/fbgemm': checked out '4b39c551efe15e6bbade20565b0ceb2d8ce3352d' 2025-09-07T08:00:02.3229728Z Submodule 'external/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/external/asmjit' 2025-09-07T08:00:02.4532211Z Submodule 'external/composable_kernel' (https://github.com/jwfromm/composable_kernel.git) registered for path 'third_party/fbgemm/external/composable_kernel' 2025-09-07T08:00:02.5388597Z Submodule 'external/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/external/cpuinfo' 2025-09-07T08:00:02.6749600Z Submodule 'external/cutlass' (https://github.com/jwfromm/cutlass) registered for path 'third_party/fbgemm/external/cutlass' 2025-09-07T08:00:02.8053882Z Submodule 'external/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/external/googletest' 2025-09-07T08:00:02.8928011Z Submodule 'external/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/external/hipify_torch' 2025-09-07T08:00:03.0267584Z Submodule 'external/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/fbgemm/external/json' 2025-09-07T08:00:03.0309569Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/fbgemm/external/asmjit'... 2025-09-07T08:00:07.0890840Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/fbgemm/external/composable_kernel'... 2025-09-07T08:00:11.8595868Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/fbgemm/external/cpuinfo'... 2025-09-07T08:00:15.0727004Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/fbgemm/external/cutlass'... 2025-09-07T08:00:22.9971828Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/fbgemm/external/googletest'... 2025-09-07T08:00:27.3734214Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/fbgemm/external/hipify_torch'... 2025-09-07T08:00:29.5186483Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/fbgemm/external/json'... 2025-09-07T08:00:53.3545424Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea' 2025-09-07T08:00:53.6985905Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out 'b1281b8b08d973a7064f864f47eeb30f3e2596e9' 2025-09-07T08:00:53.8704184Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-09-07T08:00:54.5865199Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '311f3c8e51dc0eb56310cfc6980bf63d0fbd7917' 2025-09-07T08:00:54.6453626Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-09-07T08:00:54.7319872Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691' 2025-09-07T08:00:54.9115660Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-09-07T08:00:55.0061239Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2025-09-07T08:00:55.1000544Z Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T08:00:55.2468495Z Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/flash-attention/csrc/cutlass' 2025-09-07T08:00:55.2503335Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/flash-attention/csrc/composable_kernel'... 2025-09-07T08:01:04.9271878Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/flash-attention/csrc/cutlass'... 2025-09-07T08:01:13.8593009Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2025-09-07T08:01:14.4788779Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2025-09-07T08:01:14.6958719Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757' 2025-09-07T08:01:14.7797339Z Submodule path 'third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21' 2025-09-07T08:01:14.8393284Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2025-09-07T08:01:14.9225691Z Submodule path 'third_party/gloo': checked out 'c7b7b022c124d9643957d9bd55f57ac59fce8fa2' 2025-09-07T08:01:15.0032780Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-09-07T08:01:15.0903301Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3' 2025-09-07T08:01:15.1840250Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn' 2025-09-07T08:01:15.1870023Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'... 2025-09-07T08:01:43.1343107Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d' 2025-09-07T08:01:43.2171634Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959' 2025-09-07T08:01:43.3495855Z Submodule path 'third_party/kineto': checked out '5e7501833f1021ce6f618572d3baf657b6319658' 2025-09-07T08:01:43.5358862Z Submodule 'libkineto/third_party/dynolog' (https://github.com/facebookincubator/dynolog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T08:01:43.6279665Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T08:01:43.7921937Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T08:01:43.7962317Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog'... 2025-09-07T08:01:46.9496599Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'... 2025-09-07T08:01:51.3872770Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'... 2025-09-07T08:01:55.5367698Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out '7d04a0053a845370ae06ce317a22a48e9edcc74e' 2025-09-07T08:01:55.6222752Z Submodule 'third_party/DCGM' (https://github.com/NVIDIA/DCGM.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T08:01:55.7266660Z Submodule 'third_party/cpr' (https://github.com/libcpr/cpr.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T08:01:55.8115704Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T08:01:55.8985227Z Submodule 'third_party/gflags' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T08:01:55.9903170Z Submodule 'third_party/glog' (https://github.com/google/glog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T08:01:56.1244576Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T08:01:56.2569173Z Submodule 'third_party/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T08:01:56.3405158Z Submodule 'third_party/pfs' (https://github.com/dtrugman/pfs.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T08:01:56.3448320Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'... 2025-09-07T08:02:00.5082787Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'... 2025-09-07T08:02:03.4267412Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'... 2025-09-07T08:02:08.6604398Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'... 2025-09-07T08:02:10.3846682Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/glog'... 2025-09-07T08:02:13.7690542Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'... 2025-09-07T08:02:18.5977756Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json'... 2025-09-07T08:02:44.7537511Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'... 2025-09-07T08:02:48.0883083Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9' 2025-09-07T08:02:48.1717543Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400' 2025-09-07T08:02:48.2626964Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2025-09-07T08:02:48.3515250Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067' 2025-09-07T08:02:48.5205308Z Submodule 'doc' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T08:02:48.5244502Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'... 2025-09-07T08:02:51.4327148Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4' 2025-09-07T08:02:51.5210661Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446' 2025-09-07T08:02:51.6129721Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '58d77fa8070e8cec2dc1ed015d66b454c8d78850' 2025-09-07T08:02:51.7676881Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5' 2025-09-07T08:02:51.8521927Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150' 2025-09-07T08:02:51.9389755Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '0041a40c1350ba702d475b9c4ad62da77caea164' 2025-09-07T08:02:52.0645550Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347' 2025-09-07T08:02:52.1943450Z Submodule path 'third_party/kleidiai': checked out 'cca02c2f69dd18e1f12647c1c0bdc8cf90e680c7' 2025-09-07T08:02:52.2832450Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e' 2025-09-07T08:02:52.4630735Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72' 2025-09-07T08:02:53.4952082Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83' 2025-09-07T08:02:53.6444359Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11' 2025-09-07T08:02:53.6486502Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'... 2025-09-07T08:02:58.6022713Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4' 2025-09-07T08:02:58.7097734Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878' 2025-09-07T08:02:58.8759734Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark) registered for path 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T08:02:59.0869290Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T08:02:59.1829691Z Submodule 'third_party/ms-gsl' (https://github.com/microsoft/GSL) registered for path 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T08:02:59.3761755Z Submodule 'third_party/nlohmann-json' (https://github.com/nlohmann/json) registered for path 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T08:02:59.4725266Z Submodule 'third_party/opentelemetry-proto' (https://github.com/open-telemetry/opentelemetry-proto) registered for path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T08:02:59.5707062Z Submodule 'third_party/opentracing-cpp' (https://github.com/opentracing/opentracing-cpp.git) registered for path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T08:02:59.7598178Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T08:02:59.8734308Z Submodule 'tools/vcpkg' (https://github.com/Microsoft/vcpkg) registered for path 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T08:02:59.8780821Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/benchmark'... 2025-09-07T08:03:02.6639654Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/googletest'... 2025-09-07T08:03:06.4586975Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/ms-gsl'... 2025-09-07T08:03:08.7882038Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/nlohmann-json'... 2025-09-07T08:03:33.8614531Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentelemetry-proto'... 2025-09-07T08:03:36.2689316Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp'... 2025-09-07T08:03:38.6729175Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp'... 2025-09-07T08:03:41.1395859Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/opentelemetry-cpp/tools/vcpkg'... 2025-09-07T08:03:57.6235022Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2' 2025-09-07T08:03:57.7041516Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1' 2025-09-07T08:03:57.7616886Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa' 2025-09-07T08:03:57.9168309Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d' 2025-09-07T08:03:57.9994626Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce' 2025-09-07T08:03:58.0884560Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5' 2025-09-07T08:03:58.1733406Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d' 2025-09-07T08:03:58.2761398Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T08:03:58.3554310Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T08:03:58.3598898Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'... 2025-09-07T08:04:04.4295422Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'... 2025-09-07T08:04:08.8535979Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4' 2025-09-07T08:04:08.9389763Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-09-07T08:04:09.9319597Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50' 2025-09-07T08:04:09.9955286Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa' 2025-09-07T08:04:10.3239035Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2025-09-07T08:04:10.4505776Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark' 2025-09-07T08:04:10.9620587Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest' 2025-09-07T08:04:10.9660726Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'... 2025-09-07T08:04:13.6674625Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'... 2025-09-07T08:04:17.5792884Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2025-09-07T08:04:17.7144344Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2025-09-07T08:04:17.8003080Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2025-09-07T08:04:17.8840534Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8' 2025-09-07T08:04:17.9700759Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8' 2025-09-07T08:04:18.0574479Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2025-09-07T08:04:18.1482782Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68' 2025-09-07T08:04:18.2276288Z Submodule path 'third_party/tensorpipe': checked out 'af0118d13e52f5a08841464a768e01a0bf3e3075' 2025-09-07T08:04:18.3432114Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest' 2025-09-07T08:04:18.4825134Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop' 2025-09-07T08:04:18.5704808Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv' 2025-09-07T08:04:18.6571340Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T08:04:18.6610248Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'... 2025-09-07T08:04:22.8818398Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'... 2025-09-07T08:04:25.1016007Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'... 2025-09-07T08:04:29.7872026Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'... 2025-09-07T08:04:34.0326458Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2025-09-07T08:04:34.1160011Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2025-09-07T08:04:34.2451892Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b' 2025-09-07T08:04:34.3332768Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2025-09-07T08:04:34.4128596Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T08:04:34.4162154Z Cloning into '/home/charlie/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'... 2025-09-07T08:04:36.9193655Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2025-09-07T08:04:36.9240546Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2025-09-07T08:04:36.9521514Z Entering 'android/libs/fbjni' 2025-09-07T08:04:36.9615942Z Entering 'third_party/FP16' 2025-09-07T08:04:37.0017879Z Entering 'third_party/FXdiv' 2025-09-07T08:04:37.0459008Z Entering 'third_party/NNPACK' 2025-09-07T08:04:37.0864050Z Entering 'third_party/NVTX' 2025-09-07T08:04:37.1348493Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T08:04:37.1740602Z Entering 'third_party/XNNPACK' 2025-09-07T08:04:37.2201675Z Entering 'third_party/aiter' 2025-09-07T08:04:37.2572712Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T08:04:37.3049665Z Entering 'third_party/benchmark' 2025-09-07T08:04:37.3466263Z Entering 'third_party/composable_kernel' 2025-09-07T08:04:37.3913378Z Entering 'third_party/cpp-httplib' 2025-09-07T08:04:37.4382126Z Entering 'third_party/cpuinfo' 2025-09-07T08:04:37.4780161Z Entering 'third_party/cudnn_frontend' 2025-09-07T08:04:37.5262322Z Entering 'third_party/cutlass' 2025-09-07T08:04:37.5654103Z Entering 'third_party/fbgemm' 2025-09-07T08:04:37.6098507Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T08:04:37.6514504Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T08:04:37.6976767Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T08:04:37.7379420Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T08:04:37.7794822Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T08:04:37.8188388Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T08:04:37.8378368Z Entering 'third_party/fbgemm/external/json' 2025-09-07T08:04:37.8799268Z Entering 'third_party/flash-attention' 2025-09-07T08:04:37.9181987Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T08:04:37.9655463Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T08:04:38.0034101Z Entering 'third_party/flatbuffers' 2025-09-07T08:04:38.0496811Z Entering 'third_party/fmt' 2025-09-07T08:04:38.0902847Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T08:04:38.1321166Z Entering 'third_party/gloo' 2025-09-07T08:04:38.1796411Z Entering 'third_party/googletest' 2025-09-07T08:04:38.2177291Z Entering 'third_party/ideep' 2025-09-07T08:04:38.2637807Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T08:04:38.3178829Z Entering 'third_party/ittapi' 2025-09-07T08:04:38.3656888Z Entering 'third_party/kineto' 2025-09-07T08:04:38.4054211Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T08:04:38.4497074Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T08:04:38.4897604Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T08:04:38.5365759Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T08:04:38.5807633Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T08:04:38.5994087Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T08:04:38.6387103Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T08:04:38.6860979Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T08:04:38.7201520Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T08:04:38.7663260Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T08:04:38.8064086Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T08:04:38.8503258Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T08:04:38.8925070Z Entering 'third_party/kleidiai' 2025-09-07T08:04:38.9381833Z Entering 'third_party/mimalloc' 2025-09-07T08:04:38.9790938Z Entering 'third_party/nlohmann' 2025-09-07T08:04:39.0206529Z Entering 'third_party/onnx' 2025-09-07T08:04:39.0370514Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T08:04:39.1007403Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T08:04:39.1117777Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T08:04:39.1602310Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T08:04:39.1921604Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T08:04:39.2387774Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T08:04:39.2770410Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T08:04:39.3202968Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T08:04:39.3564010Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T08:04:39.4011953Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T08:04:39.4383566Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T08:04:39.4842105Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T08:04:39.5302403Z Entering 'third_party/pocketfft' 2025-09-07T08:04:39.5636374Z Entering 'third_party/protobuf' 2025-09-07T08:04:39.6123443Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T08:04:39.6502248Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T08:04:39.6965752Z Entering 'third_party/psimd' 2025-09-07T08:04:39.7366572Z Entering 'third_party/pthreadpool' 2025-09-07T08:04:39.7723229Z Entering 'third_party/pybind11' 2025-09-07T08:04:39.8191449Z Entering 'third_party/python-peachpy' 2025-09-07T08:04:39.8555071Z Entering 'third_party/sleef' 2025-09-07T08:04:39.9042062Z Entering 'third_party/tensorpipe' 2025-09-07T08:04:39.9463590Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T08:04:39.9890842Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T08:04:40.0154499Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T08:04:40.0622516Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T08:04:40.0966377Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T08:04:40.1465443Z ##[endgroup] 2025-09-07T08:04:40.1465795Z ##[group]Persisting credentials for submodules 2025-09-07T08:04:40.1478917Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-09-07T08:04:40.1754287Z Entering 'android/libs/fbjni' 2025-09-07T08:04:40.1805260Z Entering 'third_party/FP16' 2025-09-07T08:04:40.1854602Z Entering 'third_party/FXdiv' 2025-09-07T08:04:40.1903893Z Entering 'third_party/NNPACK' 2025-09-07T08:04:40.1955555Z Entering 'third_party/NVTX' 2025-09-07T08:04:40.2008767Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T08:04:40.2059784Z Entering 'third_party/XNNPACK' 2025-09-07T08:04:40.2123668Z Entering 'third_party/aiter' 2025-09-07T08:04:40.2175656Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T08:04:40.2234248Z Entering 'third_party/benchmark' 2025-09-07T08:04:40.2285410Z Entering 'third_party/composable_kernel' 2025-09-07T08:04:40.2345326Z Entering 'third_party/cpp-httplib' 2025-09-07T08:04:40.2396391Z Entering 'third_party/cpuinfo' 2025-09-07T08:04:40.2447490Z Entering 'third_party/cudnn_frontend' 2025-09-07T08:04:40.2498100Z Entering 'third_party/cutlass' 2025-09-07T08:04:40.2556811Z Entering 'third_party/fbgemm' 2025-09-07T08:04:40.2608874Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T08:04:40.2657578Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T08:04:40.2713601Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T08:04:40.2762526Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T08:04:40.2820164Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T08:04:40.2868125Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T08:04:40.2915651Z Entering 'third_party/fbgemm/external/json' 2025-09-07T08:04:40.2968090Z Entering 'third_party/flash-attention' 2025-09-07T08:04:40.3019848Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T08:04:40.3073551Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T08:04:40.3132655Z Entering 'third_party/flatbuffers' 2025-09-07T08:04:40.3184870Z Entering 'third_party/fmt' 2025-09-07T08:04:40.3234991Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T08:04:40.3285710Z Entering 'third_party/gloo' 2025-09-07T08:04:40.3335934Z Entering 'third_party/googletest' 2025-09-07T08:04:40.3385917Z Entering 'third_party/ideep' 2025-09-07T08:04:40.3436261Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T08:04:40.3492519Z Entering 'third_party/ittapi' 2025-09-07T08:04:40.3544055Z Entering 'third_party/kineto' 2025-09-07T08:04:40.3592529Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T08:04:40.3641011Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T08:04:40.3689965Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T08:04:40.3736434Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T08:04:40.3783208Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T08:04:40.3828896Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T08:04:40.3881602Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T08:04:40.3930765Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T08:04:40.3979326Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T08:04:40.4028362Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T08:04:40.4079232Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T08:04:40.4127844Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T08:04:40.4179333Z Entering 'third_party/kleidiai' 2025-09-07T08:04:40.4233119Z Entering 'third_party/mimalloc' 2025-09-07T08:04:40.4285067Z Entering 'third_party/nlohmann' 2025-09-07T08:04:40.4336112Z Entering 'third_party/onnx' 2025-09-07T08:04:40.4400469Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T08:04:40.4454411Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T08:04:40.4505055Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T08:04:40.4552947Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T08:04:40.4598899Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T08:04:40.4645349Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T08:04:40.4695077Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T08:04:40.4741862Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T08:04:40.4789867Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T08:04:40.4836834Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T08:04:40.4887391Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T08:04:40.4938644Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T08:04:40.5006147Z Entering 'third_party/pocketfft' 2025-09-07T08:04:40.5055611Z Entering 'third_party/protobuf' 2025-09-07T08:04:40.5108268Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T08:04:40.5155752Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T08:04:40.5206755Z Entering 'third_party/psimd' 2025-09-07T08:04:40.5256951Z Entering 'third_party/pthreadpool' 2025-09-07T08:04:40.5307243Z Entering 'third_party/pybind11' 2025-09-07T08:04:40.5357306Z Entering 'third_party/python-peachpy' 2025-09-07T08:04:40.5406989Z Entering 'third_party/sleef' 2025-09-07T08:04:40.5457329Z Entering 'third_party/tensorpipe' 2025-09-07T08:04:40.5506869Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T08:04:40.5554998Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T08:04:40.5602456Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T08:04:40.5650291Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T08:04:40.5696731Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T08:04:40.5769221Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-09-07T08:04:40.6040124Z Entering 'android/libs/fbjni' 2025-09-07T08:04:40.6134598Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-09-07T08:04:40.6158085Z Entering 'third_party/FP16' 2025-09-07T08:04:40.6562737Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-09-07T08:04:40.6584214Z Entering 'third_party/FXdiv' 2025-09-07T08:04:40.7055045Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-09-07T08:04:40.7077356Z Entering 'third_party/NNPACK' 2025-09-07T08:04:40.7424140Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-09-07T08:04:40.7446790Z Entering 'third_party/NVTX' 2025-09-07T08:04:40.7900230Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-09-07T08:04:40.7923040Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T08:04:40.8239782Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-09-07T08:04:40.8261784Z Entering 'third_party/XNNPACK' 2025-09-07T08:04:40.8714342Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-09-07T08:04:40.8749889Z Entering 'third_party/aiter' 2025-09-07T08:04:40.9134660Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-09-07T08:04:40.9157494Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T08:04:40.9597254Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-09-07T08:04:40.9627424Z Entering 'third_party/benchmark' 2025-09-07T08:04:40.9974489Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-09-07T08:04:40.9997194Z Entering 'third_party/composable_kernel' 2025-09-07T08:04:41.0445744Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-09-07T08:04:41.0474919Z Entering 'third_party/cpp-httplib' 2025-09-07T08:04:41.0835719Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-09-07T08:04:41.0858126Z Entering 'third_party/cpuinfo' 2025-09-07T08:04:41.1297217Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-09-07T08:04:41.1320063Z Entering 'third_party/cudnn_frontend' 2025-09-07T08:04:41.1657228Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-09-07T08:04:41.1680501Z Entering 'third_party/cutlass' 2025-09-07T08:04:41.2063206Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-09-07T08:04:41.2093179Z Entering 'third_party/fbgemm' 2025-09-07T08:04:41.2491654Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-09-07T08:04:41.2515315Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T08:04:41.2951263Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-09-07T08:04:41.2973443Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T08:04:41.3317245Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-09-07T08:04:41.3344186Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T08:04:41.3769796Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-09-07T08:04:41.3790452Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T08:04:41.4178144Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-09-07T08:04:41.4207017Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T08:04:41.4619782Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-09-07T08:04:41.4639618Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T08:04:41.5020266Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-09-07T08:04:41.5040345Z Entering 'third_party/fbgemm/external/json' 2025-09-07T08:04:41.5498207Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-09-07T08:04:41.5523639Z Entering 'third_party/flash-attention' 2025-09-07T08:04:41.5895683Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-09-07T08:04:41.5918910Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T08:04:41.6258555Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-09-07T08:04:41.6286618Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T08:04:41.6520987Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-09-07T08:04:41.6551520Z Entering 'third_party/flatbuffers' 2025-09-07T08:04:41.6933316Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-09-07T08:04:41.6958702Z Entering 'third_party/fmt' 2025-09-07T08:04:41.7269773Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-09-07T08:04:41.7290975Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T08:04:41.7732781Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-09-07T08:04:41.7755658Z Entering 'third_party/gloo' 2025-09-07T08:04:41.8124951Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-09-07T08:04:41.8147012Z Entering 'third_party/googletest' 2025-09-07T08:04:41.8608467Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-09-07T08:04:41.8632239Z Entering 'third_party/ideep' 2025-09-07T08:04:41.9033498Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-09-07T08:04:41.9054355Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T08:04:41.9513296Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-09-07T08:04:41.9542836Z Entering 'third_party/ittapi' 2025-09-07T08:04:41.9901728Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-09-07T08:04:41.9923558Z Entering 'third_party/kineto' 2025-09-07T08:04:42.0389925Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-09-07T08:04:42.0410612Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T08:04:42.0779341Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-09-07T08:04:42.0800039Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T08:04:42.1285010Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-09-07T08:04:42.1308596Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T08:04:42.1591429Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-09-07T08:04:42.1614825Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T08:04:42.2063531Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-09-07T08:04:42.2085115Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T08:04:42.2475290Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-09-07T08:04:42.2496782Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T08:04:42.2897447Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-09-07T08:04:42.2921824Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T08:04:42.3343161Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-09-07T08:04:42.3366700Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T08:04:42.3783665Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-09-07T08:04:42.3805773Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T08:04:42.4123935Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-09-07T08:04:42.4146034Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T08:04:42.4331618Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-09-07T08:04:42.4355842Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T08:04:42.4534510Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-09-07T08:04:42.4555676Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T08:04:42.4886757Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-09-07T08:04:42.4910423Z Entering 'third_party/kleidiai' 2025-09-07T08:04:42.5364218Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-09-07T08:04:42.5386635Z Entering 'third_party/mimalloc' 2025-09-07T08:04:42.5728163Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-09-07T08:04:42.5750729Z Entering 'third_party/nlohmann' 2025-09-07T08:04:42.6191740Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-09-07T08:04:42.6216800Z Entering 'third_party/onnx' 2025-09-07T08:04:42.6594955Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-09-07T08:04:42.6631650Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T08:04:42.7055463Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-09-07T08:04:42.7081007Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T08:04:42.7494933Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-09-07T08:04:42.7518498Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T08:04:42.7935673Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-09-07T08:04:42.7956212Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T08:04:42.8061816Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-09-07T08:04:42.8084109Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T08:04:42.8533591Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-09-07T08:04:42.8555502Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T08:04:42.8856733Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-09-07T08:04:42.8879046Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T08:04:42.9339541Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-09-07T08:04:42.9360004Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T08:04:42.9674205Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-09-07T08:04:42.9695102Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T08:04:43.0122752Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-09-07T08:04:43.0143081Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T08:04:43.0501470Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-09-07T08:04:43.0524740Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T08:04:43.0943051Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-09-07T08:04:43.0967644Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T08:04:43.1326599Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-09-07T08:04:43.1367579Z Entering 'third_party/pocketfft' 2025-09-07T08:04:43.1776423Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-09-07T08:04:43.1798760Z Entering 'third_party/protobuf' 2025-09-07T08:04:43.2183226Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-09-07T08:04:43.2208745Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T08:04:43.2662339Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-09-07T08:04:43.2684532Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T08:04:43.2947949Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-09-07T08:04:43.2972869Z Entering 'third_party/psimd' 2025-09-07T08:04:43.3436444Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-09-07T08:04:43.3458463Z Entering 'third_party/pthreadpool' 2025-09-07T08:04:43.3794454Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-09-07T08:04:43.3816348Z Entering 'third_party/pybind11' 2025-09-07T08:04:43.4272399Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-09-07T08:04:43.4295533Z Entering 'third_party/python-peachpy' 2025-09-07T08:04:43.4941882Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-09-07T08:04:43.4963544Z Entering 'third_party/sleef' 2025-09-07T08:04:43.5336160Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-09-07T08:04:43.5358777Z Entering 'third_party/tensorpipe' 2025-09-07T08:04:43.5782560Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-09-07T08:04:43.5804059Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T08:04:43.6180940Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-09-07T08:04:43.6201603Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T08:04:43.6642312Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-09-07T08:04:43.6663105Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T08:04:43.7069853Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-09-07T08:04:43.7092166Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T08:04:43.7972124Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-09-07T08:04:43.7990773Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T08:04:43.8420605Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-09-07T08:04:47.0811257Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-09-07T08:04:47.1098972Z Entering 'android/libs/fbjni' 2025-09-07T08:04:47.1396906Z Entering 'third_party/FP16' 2025-09-07T08:04:47.1619502Z Entering 'third_party/FXdiv' 2025-09-07T08:04:47.2027945Z Entering 'third_party/NNPACK' 2025-09-07T08:04:47.2485340Z Entering 'third_party/NVTX' 2025-09-07T08:04:47.2866353Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T08:04:47.3314657Z Entering 'third_party/XNNPACK' 2025-09-07T08:04:47.3721741Z Entering 'third_party/aiter' 2025-09-07T08:04:47.4203343Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T08:04:47.4551652Z Entering 'third_party/benchmark' 2025-09-07T08:04:47.5004050Z Entering 'third_party/composable_kernel' 2025-09-07T08:04:47.5395562Z Entering 'third_party/cpp-httplib' 2025-09-07T08:04:47.6306117Z Entering 'third_party/cpuinfo' 2025-09-07T08:04:47.6748777Z Entering 'third_party/cudnn_frontend' 2025-09-07T08:04:47.7148768Z Entering 'third_party/cutlass' 2025-09-07T08:04:47.7637635Z Entering 'third_party/fbgemm' 2025-09-07T08:04:47.8012595Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T08:04:47.8469443Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T08:04:47.8894159Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T08:04:47.9329106Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T08:04:47.9742460Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T08:04:48.0198621Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T08:04:48.0578868Z Entering 'third_party/fbgemm/external/json' 2025-09-07T08:04:48.1067548Z Entering 'third_party/flash-attention' 2025-09-07T08:04:48.1475173Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T08:04:48.1888807Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T08:04:48.2302064Z Entering 'third_party/flatbuffers' 2025-09-07T08:04:48.2744790Z Entering 'third_party/fmt' 2025-09-07T08:04:48.3106894Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T08:04:48.3511631Z Entering 'third_party/gloo' 2025-09-07T08:04:48.3840743Z Entering 'third_party/googletest' 2025-09-07T08:04:48.4248558Z Entering 'third_party/ideep' 2025-09-07T08:04:48.4384730Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T08:04:48.4864773Z Entering 'third_party/ittapi' 2025-09-07T08:04:48.5294187Z Entering 'third_party/kineto' 2025-09-07T08:04:48.5677283Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T08:04:48.6115321Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T08:04:48.6561267Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T08:04:48.6682347Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T08:04:48.7150986Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T08:04:48.7334816Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T08:04:48.7805497Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T08:04:48.8139202Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T08:04:48.8546572Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T08:04:48.8938874Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T08:04:48.9394254Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T08:04:48.9753513Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T08:04:49.0246270Z Entering 'third_party/kleidiai' 2025-09-07T08:04:49.0569087Z Entering 'third_party/mimalloc' 2025-09-07T08:04:49.1023325Z Entering 'third_party/nlohmann' 2025-09-07T08:04:49.1377554Z Entering 'third_party/onnx' 2025-09-07T08:04:49.1864809Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T08:04:49.2278336Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T08:04:49.2694245Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T08:04:49.3143635Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T08:04:49.3560401Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T08:04:49.3935129Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T08:04:49.4370941Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T08:04:49.4800808Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T08:04:49.5200912Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T08:04:49.5395219Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T08:04:49.5840432Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T08:04:49.6153112Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T08:04:49.6642440Z Entering 'third_party/pocketfft' 2025-09-07T08:04:49.7050527Z Entering 'third_party/protobuf' 2025-09-07T08:04:49.7474273Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T08:04:49.7884586Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T08:04:49.8328918Z Entering 'third_party/psimd' 2025-09-07T08:04:49.8418662Z Entering 'third_party/pthreadpool' 2025-09-07T08:04:49.8870767Z Entering 'third_party/pybind11' 2025-09-07T08:04:49.9265111Z Entering 'third_party/python-peachpy' 2025-09-07T08:04:49.9420096Z Entering 'third_party/sleef' 2025-09-07T08:04:49.9562545Z Entering 'third_party/tensorpipe' 2025-09-07T08:04:49.9767642Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T08:04:50.0184703Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T08:04:50.0511148Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T08:04:50.0954241Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T08:04:50.1383059Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T08:04:50.1845525Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-09-07T08:04:50.2122627Z Entering 'android/libs/fbjni' 2025-09-07T08:04:50.2187230Z Entering 'third_party/FP16' 2025-09-07T08:04:50.2634659Z Entering 'third_party/FXdiv' 2025-09-07T08:04:50.3057675Z Entering 'third_party/NNPACK' 2025-09-07T08:04:50.3469931Z Entering 'third_party/NVTX' 2025-09-07T08:04:50.3906025Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T08:04:50.4371128Z Entering 'third_party/XNNPACK' 2025-09-07T08:04:50.4789453Z Entering 'third_party/aiter' 2025-09-07T08:04:50.5249255Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T08:04:50.5635377Z Entering 'third_party/benchmark' 2025-09-07T08:04:50.6108573Z Entering 'third_party/composable_kernel' 2025-09-07T08:04:50.6477972Z Entering 'third_party/cpp-httplib' 2025-09-07T08:04:50.7378851Z Entering 'third_party/cpuinfo' 2025-09-07T08:04:50.7829407Z Entering 'third_party/cudnn_frontend' 2025-09-07T08:04:50.8226168Z Entering 'third_party/cutlass' 2025-09-07T08:04:50.8680016Z Entering 'third_party/fbgemm' 2025-09-07T08:04:50.9110777Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T08:04:50.9492584Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T08:04:50.9961622Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T08:04:51.0450598Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T08:04:51.0828888Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T08:04:51.1268260Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T08:04:51.1656641Z Entering 'third_party/fbgemm/external/json' 2025-09-07T08:04:51.2101285Z Entering 'third_party/flash-attention' 2025-09-07T08:04:51.2502184Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T08:04:51.2981536Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T08:04:51.3358497Z Entering 'third_party/flatbuffers' 2025-09-07T08:04:51.3827696Z Entering 'third_party/fmt' 2025-09-07T08:04:51.4081618Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T08:04:51.4280265Z Entering 'third_party/gloo' 2025-09-07T08:04:51.4734835Z Entering 'third_party/googletest' 2025-09-07T08:04:51.5199162Z Entering 'third_party/ideep' 2025-09-07T08:04:51.5554886Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T08:04:51.6030322Z Entering 'third_party/ittapi' 2025-09-07T08:04:51.6434032Z Entering 'third_party/kineto' 2025-09-07T08:04:51.6869032Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T08:04:51.7338217Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T08:04:51.7736758Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T08:04:51.8169535Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T08:04:51.8627940Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T08:04:51.9020046Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T08:04:51.9464200Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T08:04:51.9797922Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T08:04:52.0266823Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T08:04:52.0631273Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T08:04:52.1094957Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T08:04:52.1494626Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T08:04:52.1970624Z Entering 'third_party/kleidiai' 2025-09-07T08:04:52.2346428Z Entering 'third_party/mimalloc' 2025-09-07T08:04:52.2792496Z Entering 'third_party/nlohmann' 2025-09-07T08:04:52.3209951Z Entering 'third_party/onnx' 2025-09-07T08:04:52.3689986Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T08:04:52.3822737Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T08:04:52.4288537Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T08:04:52.4726539Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T08:04:52.5123127Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T08:04:52.5305498Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T08:04:52.5783091Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T08:04:52.5957979Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T08:04:52.6324739Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T08:04:52.6785586Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T08:04:52.7171217Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T08:04:52.7625783Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T08:04:52.8042399Z Entering 'third_party/pocketfft' 2025-09-07T08:04:52.8471872Z Entering 'third_party/protobuf' 2025-09-07T08:04:52.8924261Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T08:04:52.9176367Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T08:04:52.9362426Z Entering 'third_party/psimd' 2025-09-07T08:04:52.9919014Z Entering 'third_party/pthreadpool' 2025-09-07T08:04:53.0356617Z Entering 'third_party/pybind11' 2025-09-07T08:04:53.0475449Z Entering 'third_party/python-peachpy' 2025-09-07T08:04:53.0678036Z Entering 'third_party/sleef' 2025-09-07T08:04:53.1102776Z Entering 'third_party/tensorpipe' 2025-09-07T08:04:53.1420772Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T08:04:53.1870322Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T08:04:53.2198045Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T08:04:53.2664850Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T08:04:53.3026479Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T08:04:53.3531150Z ##[endgroup] 2025-09-07T08:04:53.3575820Z [command]/usr/bin/git log -1 --format=%H 2025-09-07T08:04:53.3605406Z 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T08:04:53.3892032Z ##[group]Run actions/checkout@v4 2025-09-07T08:04:53.3892256Z with: 2025-09-07T08:04:53.3892442Z ref: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T08:04:53.3892675Z fetch-depth: 0 2025-09-07T08:04:53.3892852Z submodules: recursive 2025-09-07T08:04:53.3893043Z show-progress: false 2025-09-07T08:04:53.3893264Z repository: pytorch/pytorch 2025-09-07T08:04:53.3893579Z token: *** 2025-09-07T08:04:53.3893982Z ssh-strict: true 2025-09-07T08:04:53.3894158Z ssh-user: git 2025-09-07T08:04:53.3894339Z persist-credentials: true 2025-09-07T08:04:53.3894529Z clean: true 2025-09-07T08:04:53.3894709Z sparse-checkout-cone-mode: true 2025-09-07T08:04:53.3894922Z fetch-tags: false 2025-09-07T08:04:53.3895088Z lfs: false 2025-09-07T08:04:53.3895269Z set-safe-directory: true 2025-09-07T08:04:53.3895468Z env: 2025-09-07T08:04:53.3895626Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:04:53.3895809Z ##[endgroup] 2025-09-07T08:04:53.4799361Z Syncing repository: pytorch/pytorch 2025-09-07T08:04:53.4802305Z ##[group]Getting Git version info 2025-09-07T08:04:53.4802868Z Working directory is '/home/charlie/_work/pytorch/pytorch' 2025-09-07T08:04:53.4836465Z [command]/usr/bin/git version 2025-09-07T08:04:53.4873237Z git version 2.50.1 2025-09-07T08:04:53.4896926Z ##[endgroup] 2025-09-07T08:04:53.4910459Z Temporarily overriding HOME='/home/charlie/_work/_temp/f06c386f-8d96-4220-a432-d705eeeb56a4' before making global git config changes 2025-09-07T08:04:53.4911129Z Adding repository directory to the temporary git global config as a safe directory 2025-09-07T08:04:53.4915691Z [command]/usr/bin/git config --global --add safe.directory /home/charlie/_work/pytorch/pytorch 2025-09-07T08:04:53.5223139Z [command]/usr/bin/git config --local --get remote.origin.url 2025-09-07T08:04:53.5246478Z https://github.com/pytorch/pytorch 2025-09-07T08:04:53.5263370Z ##[group]Removing previously created refs, to avoid conflicts 2025-09-07T08:04:53.5267628Z [command]/usr/bin/git rev-parse --symbolic-full-name --verify --quiet HEAD 2025-09-07T08:04:53.5293180Z HEAD 2025-09-07T08:04:53.5338780Z ##[endgroup] 2025-09-07T08:04:53.5342041Z [command]/usr/bin/git submodule status 2025-09-07T08:04:53.5654213Z 7e1e1fe3858c63c251c637ae41a20de425dde96f android/libs/fbjni (v0.1.0-12-g7e1e1fe) 2025-09-07T08:04:53.5742216Z 4dfe081cf6bcd15db339cf2680b9281b8451eeb3 third_party/FP16 (4dfe081) 2025-09-07T08:04:53.5831129Z b408327ac2a15ec3e43352421954f5b1967701d1 third_party/FXdiv (b408327) 2025-09-07T08:04:53.5931776Z c07e3a0400713d546e0dea2d5466dd22ea389c73 third_party/NNPACK (c07e3a0) 2025-09-07T08:04:53.5987641Z 2942f167cc30c5e3a44a2aecd5b0d9c07ff61a07 third_party/NVTX (v3.1.0-263-g2942f16) 2025-09-07T08:04:53.6074241Z 1d8f600fd424278486eade7ed3e877c99f0846b1 third_party/VulkanMemoryAllocator (v2.1.0-982-g1d8f600) 2025-09-07T08:04:53.6542781Z 51a0103656eff6fc9bfd39a4597923c4b542c883 third_party/XNNPACK (remotes/origin/ds/ndk-1243-g51a0103656) 2025-09-07T08:04:53.6584779Z 01aae101b9e5e94d6c16a9514c9fb8df99c93150 third_party/aiter (v0.1.1-92-g01aae101) 2025-09-07T08:04:53.6613290Z 299e5928955cc62af9968370293b916f5130916f third_party/benchmark (v1.9.3) 2025-09-07T08:04:53.6694603Z 7fe50dc3da2069d6645d9deb8c017a876472a977 third_party/composable_kernel (rocm-6.4.3-459-g7fe50dc3d) 2025-09-07T08:04:53.6826419Z 89c932f313c6437c38f2982869beacc89c2f2246 third_party/cpp-httplib (v0.26.0) 2025-09-07T08:04:53.6949845Z 5e3d2445e6a84d9599bee2bf78edbb4d80865e1d third_party/cpuinfo (5e3d244) 2025-09-07T08:04:53.6991906Z f937055efc6d414d11f4c6577e3977fe74f35fb6 third_party/cudnn_frontend (v0.5-52-gf937055) 2025-09-07T08:04:53.7090800Z e51efbfe18fe4f4cbb66ab814c55bf4aa0185491 third_party/cutlass (v4.1.0) 2025-09-07T08:04:53.7152073Z 4b39c551efe15e6bbade20565b0ceb2d8ce3352d third_party/fbgemm (v1.3.0-rc1-342-g4b39c551) 2025-09-07T08:04:53.7241773Z 979702c87a8713a8e0a5e9fee122b90d2ef13be5 third_party/flash-attention (v2.7.4) 2025-09-07T08:04:53.7271520Z a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757 third_party/flatbuffers (v24.12.23) 2025-09-07T08:04:53.7641605Z 40626af88bd7df9a5fb80be7b25ac85b122d6c21 third_party/fmt (11.2.0) 2025-09-07T08:04:53.7757852Z 3fb5c176c17c765a3492cd2f0321b0dab712f350 third_party/gemmlowp/gemmlowp (remotes/origin/revert-87-master-135-g3fb5c17) 2025-09-07T08:04:53.7887536Z c7b7b022c124d9643957d9bd55f57ac59fce8fa2 third_party/gloo (remotes/origin/gh/c-p-i-o/1/base-33-gc7b7b02) 2025-09-07T08:04:53.8100909Z 52eb8108c5bdec04579160ae17225d66034bd723 third_party/googletest (release-1.8.0-3544-g52eb8108) 2025-09-07T08:04:53.8188981Z 719d8e6cd7f7a0e01b155657526d693acf97c2b3 third_party/ideep (pytorch-rls-v3.7.1) 2025-09-07T08:04:53.8255922Z dec1d23ca65ab069d225dfe40dea14f455170959 third_party/ittapi (v3.25.5) 2025-09-07T08:04:53.8508625Z 5e7501833f1021ce6f618572d3baf657b6319658 third_party/kineto (remotes/origin/sraikund/test-98-g5e75018) 2025-09-07T08:04:53.8539427Z cca02c2f69dd18e1f12647c1c0bdc8cf90e680c7 third_party/kleidiai (v1.8.0) 2025-09-07T08:04:53.8569361Z fbd8b99c2b828428947d70fdc046bb55609be93e third_party/mimalloc (v2.2.4) 2025-09-07T08:04:53.8599563Z 55f93686c01528224f448c19128836e7df245f72 third_party/nlohmann (v3.12.0) 2025-09-07T08:04:53.8906318Z e709452ef2bbc1d113faf678c24e6d3467696e83 third_party/onnx (v1.18.0) 2025-09-07T08:04:53.8935611Z a799f4aed9c94b765dcdaabaeab7d5e7e2310878 third_party/opentelemetry-cpp (v1.14.2) 2025-09-07T08:04:53.8968217Z 0fa0ef591e38c2758e3184c6c23e497b9f732ffa third_party/pocketfft (release_for_eigen-40-g0fa0ef5) 2025-09-07T08:04:53.9292773Z d1eca4e4b421cd2997495c4b4e65cea6be4e9b8a third_party/protobuf (v3.7.0-rc.2-1279-gd1eca4e4b) 2025-09-07T08:04:53.9380054Z 072586a71b55b7f8c584153d223e95687148a900 third_party/psimd (heads/master) 2025-09-07T08:04:53.9442642Z 4fe0e1e183925bf8cfa6aae24237e724a96479b8 third_party/pthreadpool (0.1-144-g4fe0e1e) 2025-09-07T08:04:53.9472126Z f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8 third_party/pybind11 (v3.0.1) 2025-09-07T08:04:53.9559545Z f45429b087dd7d5bc78bb40dc7cf06425c252d67 third_party/python-peachpy (remotes/origin/pre-generated) 2025-09-07T08:04:53.9641024Z 5a1d179df9cf652951b59010a2d2075372d67f68 third_party/sleef (3.8) 2025-09-07T08:04:53.9723636Z af0118d13e52f5a08841464a768e01a0bf3e3075 third_party/tensorpipe (heads/main) 2025-09-07T08:04:53.9736939Z ##[group]Cleaning the repository 2025-09-07T08:04:53.9740739Z [command]/usr/bin/git clean -ffdx 2025-09-07T08:04:54.0071083Z [command]/usr/bin/git reset --hard HEAD 2025-09-07T08:04:54.7490940Z HEAD is now at 93fb23d6fae Build vLLM nightly wheels (#162000) 2025-09-07T08:04:54.7523134Z ##[endgroup] 2025-09-07T08:04:54.7526501Z ##[group]Disabling automatic garbage collection 2025-09-07T08:04:54.7531715Z [command]/usr/bin/git config --local gc.auto 0 2025-09-07T08:04:54.7909111Z ##[endgroup] 2025-09-07T08:04:54.7909772Z ##[group]Setting up auth 2025-09-07T08:04:54.7918251Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-09-07T08:04:54.7949830Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-09-07T08:04:54.8232838Z Entering 'android/libs/fbjni' 2025-09-07T08:04:54.8284639Z Entering 'third_party/FP16' 2025-09-07T08:04:54.8336350Z Entering 'third_party/FXdiv' 2025-09-07T08:04:54.8388043Z Entering 'third_party/NNPACK' 2025-09-07T08:04:54.8440136Z Entering 'third_party/NVTX' 2025-09-07T08:04:54.8490836Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T08:04:54.8542521Z Entering 'third_party/XNNPACK' 2025-09-07T08:04:54.8608069Z Entering 'third_party/aiter' 2025-09-07T08:04:54.8659134Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T08:04:54.8719903Z Entering 'third_party/benchmark' 2025-09-07T08:04:54.8770737Z Entering 'third_party/composable_kernel' 2025-09-07T08:04:54.8829941Z Entering 'third_party/cpp-httplib' 2025-09-07T08:04:54.8881417Z Entering 'third_party/cpuinfo' 2025-09-07T08:04:54.8932525Z Entering 'third_party/cudnn_frontend' 2025-09-07T08:04:54.8984488Z Entering 'third_party/cutlass' 2025-09-07T08:04:54.9042905Z Entering 'third_party/fbgemm' 2025-09-07T08:04:54.9094711Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T08:04:54.9145736Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T08:04:54.9202683Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T08:04:54.9252095Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T08:04:54.9308997Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T08:04:54.9358498Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T08:04:54.9406975Z Entering 'third_party/fbgemm/external/json' 2025-09-07T08:04:54.9460520Z Entering 'third_party/flash-attention' 2025-09-07T08:04:54.9511426Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T08:04:54.9566001Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T08:04:54.9623487Z Entering 'third_party/flatbuffers' 2025-09-07T08:04:54.9675175Z Entering 'third_party/fmt' 2025-09-07T08:04:54.9724061Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T08:04:54.9772387Z Entering 'third_party/gloo' 2025-09-07T08:04:54.9823522Z Entering 'third_party/googletest' 2025-09-07T08:04:54.9874820Z Entering 'third_party/ideep' 2025-09-07T08:04:54.9924183Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T08:04:54.9981080Z Entering 'third_party/ittapi' 2025-09-07T08:04:55.0031379Z Entering 'third_party/kineto' 2025-09-07T08:04:55.0080635Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T08:04:55.0130025Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T08:04:55.0180243Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T08:04:55.0228967Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T08:04:55.0278097Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T08:04:55.0325570Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T08:04:55.0376751Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T08:04:55.0425623Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T08:04:55.0473566Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T08:04:55.0522674Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T08:04:55.3225979Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T08:04:55.3226898Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T08:04:55.3227553Z Entering 'third_party/kleidiai' 2025-09-07T08:04:55.3228066Z Entering 'third_party/mimalloc' 2025-09-07T08:04:55.3228536Z Entering 'third_party/nlohmann' 2025-09-07T08:04:55.3229029Z Entering 'third_party/onnx' 2025-09-07T08:04:55.3229576Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T08:04:55.3230215Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T08:04:55.3230912Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T08:04:55.3231756Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T08:04:55.3232461Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T08:04:55.3233167Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T08:04:55.3234273Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T08:04:55.3235099Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T08:04:55.3235875Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T08:04:55.3236738Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T08:04:55.3237856Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T08:04:55.3238742Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T08:04:55.3239025Z Entering 'third_party/pocketfft' 2025-09-07T08:04:55.3239250Z Entering 'third_party/protobuf' 2025-09-07T08:04:55.3239502Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T08:04:55.3239805Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T08:04:55.3240075Z Entering 'third_party/psimd' 2025-09-07T08:04:55.3240290Z Entering 'third_party/pthreadpool' 2025-09-07T08:04:55.3240518Z Entering 'third_party/pybind11' 2025-09-07T08:04:55.3240741Z Entering 'third_party/python-peachpy' 2025-09-07T08:04:55.3240984Z Entering 'third_party/sleef' 2025-09-07T08:04:55.3241195Z Entering 'third_party/tensorpipe' 2025-09-07T08:04:55.3241464Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T08:04:55.3241772Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T08:04:55.3242070Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T08:04:55.3242377Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T08:04:55.3242719Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T08:04:55.3243635Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-09-07T08:04:55.3244263Z http.https://github.com/.extraheader 2025-09-07T08:04:55.3244859Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-09-07T08:04:55.3248394Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-09-07T08:04:55.3523489Z Entering 'android/libs/fbjni' 2025-09-07T08:04:55.3553004Z http.https://github.com/.extraheader 2025-09-07T08:04:55.4189714Z Entering 'third_party/FP16' 2025-09-07T08:04:55.4219241Z http.https://github.com/.extraheader 2025-09-07T08:04:55.4387149Z Entering 'third_party/FXdiv' 2025-09-07T08:04:55.4417721Z http.https://github.com/.extraheader 2025-09-07T08:04:55.4835616Z Entering 'third_party/NNPACK' 2025-09-07T08:04:55.4864611Z http.https://github.com/.extraheader 2025-09-07T08:04:55.5016242Z Entering 'third_party/NVTX' 2025-09-07T08:04:55.5045995Z http.https://github.com/.extraheader 2025-09-07T08:04:55.5490073Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T08:04:55.5518683Z http.https://github.com/.extraheader 2025-09-07T08:04:55.5888596Z Entering 'third_party/XNNPACK' 2025-09-07T08:04:55.5920044Z http.https://github.com/.extraheader 2025-09-07T08:04:55.6082012Z Entering 'third_party/aiter' 2025-09-07T08:04:55.6110821Z http.https://github.com/.extraheader 2025-09-07T08:04:55.6503270Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T08:04:55.6531272Z http.https://github.com/.extraheader 2025-09-07T08:04:55.6957196Z Entering 'third_party/benchmark' 2025-09-07T08:04:55.6985980Z http.https://github.com/.extraheader 2025-09-07T08:04:55.7329283Z Entering 'third_party/composable_kernel' 2025-09-07T08:04:55.7358677Z http.https://github.com/.extraheader 2025-09-07T08:04:55.7815577Z Entering 'third_party/cpp-httplib' 2025-09-07T08:04:55.7847647Z http.https://github.com/.extraheader 2025-09-07T08:04:55.8266704Z Entering 'third_party/cpuinfo' 2025-09-07T08:04:55.8296400Z http.https://github.com/.extraheader 2025-09-07T08:04:55.8741280Z Entering 'third_party/cudnn_frontend' 2025-09-07T08:04:55.8772135Z http.https://github.com/.extraheader 2025-09-07T08:04:55.9090247Z Entering 'third_party/cutlass' 2025-09-07T08:04:55.9119537Z http.https://github.com/.extraheader 2025-09-07T08:04:55.9570952Z Entering 'third_party/fbgemm' 2025-09-07T08:04:55.9600325Z http.https://github.com/.extraheader 2025-09-07T08:04:55.9980962Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T08:04:56.0008321Z http.https://github.com/.extraheader 2025-09-07T08:04:56.0435435Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T08:04:56.0463542Z http.https://github.com/.extraheader 2025-09-07T08:04:56.0622207Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T08:04:56.0649682Z http.https://github.com/.extraheader 2025-09-07T08:04:56.0767860Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T08:04:56.0795368Z http.https://github.com/.extraheader 2025-09-07T08:04:56.1255451Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T08:04:56.1283692Z http.https://github.com/.extraheader 2025-09-07T08:04:56.1697587Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T08:04:56.1726605Z http.https://github.com/.extraheader 2025-09-07T08:04:56.2082164Z Entering 'third_party/fbgemm/external/json' 2025-09-07T08:04:56.2110319Z http.https://github.com/.extraheader 2025-09-07T08:04:56.2518161Z Entering 'third_party/flash-attention' 2025-09-07T08:04:56.2548728Z http.https://github.com/.extraheader 2025-09-07T08:04:56.2996530Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T08:04:56.3023287Z http.https://github.com/.extraheader 2025-09-07T08:04:56.3424781Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T08:04:56.3452052Z http.https://github.com/.extraheader 2025-09-07T08:04:56.3599476Z Entering 'third_party/flatbuffers' 2025-09-07T08:04:56.3628019Z http.https://github.com/.extraheader 2025-09-07T08:04:56.3939700Z Entering 'third_party/fmt' 2025-09-07T08:04:56.3971834Z http.https://github.com/.extraheader 2025-09-07T08:04:56.4399138Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T08:04:56.4429701Z http.https://github.com/.extraheader 2025-09-07T08:04:56.4567332Z Entering 'third_party/gloo' 2025-09-07T08:04:56.4595759Z http.https://github.com/.extraheader 2025-09-07T08:04:56.5059000Z Entering 'third_party/googletest' 2025-09-07T08:04:56.5087797Z http.https://github.com/.extraheader 2025-09-07T08:04:56.5420697Z Entering 'third_party/ideep' 2025-09-07T08:04:56.5449115Z http.https://github.com/.extraheader 2025-09-07T08:04:56.5902148Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T08:04:56.5929328Z http.https://github.com/.extraheader 2025-09-07T08:04:56.6387669Z Entering 'third_party/ittapi' 2025-09-07T08:04:56.6416657Z http.https://github.com/.extraheader 2025-09-07T08:04:56.6795358Z Entering 'third_party/kineto' 2025-09-07T08:04:56.6825883Z http.https://github.com/.extraheader 2025-09-07T08:04:56.7232489Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T08:04:56.7260318Z http.https://github.com/.extraheader 2025-09-07T08:04:56.7621732Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T08:04:56.7649263Z http.https://github.com/.extraheader 2025-09-07T08:04:56.8108007Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T08:04:56.8135822Z http.https://github.com/.extraheader 2025-09-07T08:04:56.8474624Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T08:04:56.8502317Z http.https://github.com/.extraheader 2025-09-07T08:04:56.8924845Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T08:04:56.8953049Z http.https://github.com/.extraheader 2025-09-07T08:04:56.9325998Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T08:04:56.9354285Z http.https://github.com/.extraheader 2025-09-07T08:04:56.9769154Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T08:04:56.9797471Z http.https://github.com/.extraheader 2025-09-07T08:04:57.0180920Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T08:04:57.0208786Z http.https://github.com/.extraheader 2025-09-07T08:04:57.0610237Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T08:04:57.0637851Z http.https://github.com/.extraheader 2025-09-07T08:04:57.1030337Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T08:04:57.1058042Z http.https://github.com/.extraheader 2025-09-07T08:04:57.1489722Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T08:04:57.1516324Z http.https://github.com/.extraheader 2025-09-07T08:04:57.1869747Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T08:04:57.1896963Z http.https://github.com/.extraheader 2025-09-07T08:04:57.2332599Z Entering 'third_party/kleidiai' 2025-09-07T08:04:57.2363422Z http.https://github.com/.extraheader 2025-09-07T08:04:57.2706830Z Entering 'third_party/mimalloc' 2025-09-07T08:04:57.2735883Z http.https://github.com/.extraheader 2025-09-07T08:04:57.3175497Z Entering 'third_party/nlohmann' 2025-09-07T08:04:57.3205414Z http.https://github.com/.extraheader 2025-09-07T08:04:57.3562552Z Entering 'third_party/onnx' 2025-09-07T08:04:57.3592985Z http.https://github.com/.extraheader 2025-09-07T08:04:57.4039929Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T08:04:57.4068667Z http.https://github.com/.extraheader 2025-09-07T08:04:57.4432715Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T08:04:57.4461902Z http.https://github.com/.extraheader 2025-09-07T08:04:57.4877330Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T08:04:57.4905460Z http.https://github.com/.extraheader 2025-09-07T08:04:57.5250279Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T08:04:57.5278231Z http.https://github.com/.extraheader 2025-09-07T08:04:57.5734635Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T08:04:57.5762046Z http.https://github.com/.extraheader 2025-09-07T08:04:57.6189122Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T08:04:57.6217099Z http.https://github.com/.extraheader 2025-09-07T08:04:57.6580036Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T08:04:57.6608397Z http.https://github.com/.extraheader 2025-09-07T08:04:57.6783990Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T08:04:57.6811760Z http.https://github.com/.extraheader 2025-09-07T08:04:57.7248554Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T08:04:57.7276709Z http.https://github.com/.extraheader 2025-09-07T08:04:57.7631905Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T08:04:57.7659258Z http.https://github.com/.extraheader 2025-09-07T08:04:57.8110613Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T08:04:57.8138181Z http.https://github.com/.extraheader 2025-09-07T08:04:57.8541844Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T08:04:57.8569883Z http.https://github.com/.extraheader 2025-09-07T08:04:57.8738506Z Entering 'third_party/pocketfft' 2025-09-07T08:04:57.8767511Z http.https://github.com/.extraheader 2025-09-07T08:04:57.9111581Z Entering 'third_party/protobuf' 2025-09-07T08:04:57.9141722Z http.https://github.com/.extraheader 2025-09-07T08:04:57.9592905Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T08:04:57.9620329Z http.https://github.com/.extraheader 2025-09-07T08:04:57.9957450Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T08:04:57.9985061Z http.https://github.com/.extraheader 2025-09-07T08:04:58.0131869Z Entering 'third_party/psimd' 2025-09-07T08:04:58.0163342Z http.https://github.com/.extraheader 2025-09-07T08:04:58.0613123Z Entering 'third_party/pthreadpool' 2025-09-07T08:04:58.0642193Z http.https://github.com/.extraheader 2025-09-07T08:04:58.0983655Z Entering 'third_party/pybind11' 2025-09-07T08:04:58.1013108Z http.https://github.com/.extraheader 2025-09-07T08:04:58.1462776Z Entering 'third_party/python-peachpy' 2025-09-07T08:04:58.1491745Z http.https://github.com/.extraheader 2025-09-07T08:04:58.1789194Z Entering 'third_party/sleef' 2025-09-07T08:04:58.1819701Z http.https://github.com/.extraheader 2025-09-07T08:04:58.2268949Z Entering 'third_party/tensorpipe' 2025-09-07T08:04:58.2297261Z http.https://github.com/.extraheader 2025-09-07T08:04:58.2667016Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T08:04:58.2694088Z http.https://github.com/.extraheader 2025-09-07T08:04:58.3143494Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T08:04:58.3170732Z http.https://github.com/.extraheader 2025-09-07T08:04:58.3554874Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T08:04:58.3581437Z http.https://github.com/.extraheader 2025-09-07T08:04:58.4028660Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T08:04:58.4056187Z http.https://github.com/.extraheader 2025-09-07T08:04:58.4436708Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T08:04:58.4465137Z http.https://github.com/.extraheader 2025-09-07T08:04:58.4954482Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-09-07T08:04:58.6179560Z ##[endgroup] 2025-09-07T08:04:58.6180244Z ##[group]Fetching the repository 2025-09-07T08:04:58.6191750Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-09-07T08:04:59.1791380Z [command]/usr/bin/git rev-parse --verify --quiet 93fb23d6fae7c4e82c4239a1033e522088742634^{object} 2025-09-07T08:04:59.1819914Z 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T08:04:59.1825068Z ##[endgroup] 2025-09-07T08:04:59.1827167Z ##[group]Determining the checkout info 2025-09-07T08:04:59.1827571Z ##[endgroup] 2025-09-07T08:04:59.1829651Z [command]/usr/bin/git sparse-checkout disable 2025-09-07T08:04:59.6463168Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-09-07T08:04:59.6593555Z ##[group]Checking out the ref 2025-09-07T08:04:59.6597830Z [command]/usr/bin/git checkout --progress --force 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T08:04:59.9003110Z HEAD is now at 93fb23d6fae Build vLLM nightly wheels (#162000) 2025-09-07T08:04:59.9015299Z ##[endgroup] 2025-09-07T08:04:59.9015648Z ##[group]Setting up auth for fetching submodules 2025-09-07T08:04:59.9020742Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-09-07T08:05:00.0477620Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-09-07T08:05:00.0512689Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-09-07T08:05:00.1577953Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-09-07T08:05:00.1737162Z ##[endgroup] 2025-09-07T08:05:00.1737487Z ##[group]Fetching submodules 2025-09-07T08:05:00.1740305Z [command]/usr/bin/git submodule sync --recursive 2025-09-07T08:05:00.2675593Z Synchronizing submodule url for 'android/libs/fbjni' 2025-09-07T08:05:00.3518605Z Synchronizing submodule url for 'third_party/FP16' 2025-09-07T08:05:00.4391459Z Synchronizing submodule url for 'third_party/FXdiv' 2025-09-07T08:05:00.5271586Z Synchronizing submodule url for 'third_party/NNPACK' 2025-09-07T08:05:00.6590566Z Synchronizing submodule url for 'third_party/NVTX' 2025-09-07T08:05:00.7392923Z Synchronizing submodule url for 'third_party/VulkanMemoryAllocator' 2025-09-07T08:05:00.8254556Z Synchronizing submodule url for 'third_party/XNNPACK' 2025-09-07T08:05:00.8926549Z Synchronizing submodule url for 'third_party/aiter' 2025-09-07T08:05:01.0190391Z Synchronizing submodule url for 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T08:05:01.1085386Z Synchronizing submodule url for 'third_party/benchmark' 2025-09-07T08:05:01.1921852Z Synchronizing submodule url for 'third_party/composable_kernel' 2025-09-07T08:05:01.2821413Z Synchronizing submodule url for 'third_party/cpp-httplib' 2025-09-07T08:05:01.3461752Z Synchronizing submodule url for 'third_party/cpuinfo' 2025-09-07T08:05:01.4292027Z Synchronizing submodule url for 'third_party/cudnn_frontend' 2025-09-07T08:05:01.5158325Z Synchronizing submodule url for 'third_party/cutlass' 2025-09-07T08:05:01.5767943Z Synchronizing submodule url for 'third_party/fbgemm' 2025-09-07T08:05:01.6845898Z Synchronizing submodule url for 'third_party/fbgemm/external/asmjit' 2025-09-07T08:05:01.7604041Z Synchronizing submodule url for 'third_party/fbgemm/external/composable_kernel' 2025-09-07T08:05:01.8492193Z Synchronizing submodule url for 'third_party/fbgemm/external/cpuinfo' 2025-09-07T08:05:01.9362282Z Synchronizing submodule url for 'third_party/fbgemm/external/cutlass' 2025-09-07T08:05:02.0278371Z Synchronizing submodule url for 'third_party/fbgemm/external/googletest' 2025-09-07T08:05:02.0872608Z Synchronizing submodule url for 'third_party/fbgemm/external/hipify_torch' 2025-09-07T08:05:02.1910557Z Synchronizing submodule url for 'third_party/fbgemm/external/json' 2025-09-07T08:05:02.2806230Z Synchronizing submodule url for 'third_party/flash-attention' 2025-09-07T08:05:02.3679089Z Synchronizing submodule url for 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T08:05:02.4495861Z Synchronizing submodule url for 'third_party/flash-attention/csrc/cutlass' 2025-09-07T08:05:02.5319680Z Synchronizing submodule url for 'third_party/flatbuffers' 2025-09-07T08:05:02.6201742Z Synchronizing submodule url for 'third_party/fmt' 2025-09-07T08:05:02.7068755Z Synchronizing submodule url for 'third_party/gemmlowp/gemmlowp' 2025-09-07T08:05:02.7943565Z Synchronizing submodule url for 'third_party/gloo' 2025-09-07T08:05:02.8833463Z Synchronizing submodule url for 'third_party/googletest' 2025-09-07T08:05:02.9667673Z Synchronizing submodule url for 'third_party/ideep' 2025-09-07T08:05:03.0647112Z Synchronizing submodule url for 'third_party/ideep/mkl-dnn' 2025-09-07T08:05:03.1232474Z Synchronizing submodule url for 'third_party/ittapi' 2025-09-07T08:05:03.2668359Z Synchronizing submodule url for 'third_party/kineto' 2025-09-07T08:05:03.3496234Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T08:05:03.4505315Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T08:05:03.5365101Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T08:05:03.6167228Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T08:05:03.7034718Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T08:05:03.7936005Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T08:05:03.8794771Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T08:05:03.9676211Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T08:05:04.0602152Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T08:05:04.1479988Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T08:05:04.2395253Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T08:05:04.3047412Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T08:05:04.3911921Z Synchronizing submodule url for 'third_party/kleidiai' 2025-09-07T08:05:04.4770610Z Synchronizing submodule url for 'third_party/mimalloc' 2025-09-07T08:05:04.5419018Z Synchronizing submodule url for 'third_party/nlohmann' 2025-09-07T08:05:04.6281522Z Synchronizing submodule url for 'third_party/onnx' 2025-09-07T08:05:04.7586799Z Synchronizing submodule url for 'third_party/onnx/third_party/pybind11' 2025-09-07T08:05:04.8481529Z Synchronizing submodule url for 'third_party/opentelemetry-cpp' 2025-09-07T08:05:04.9547183Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T08:05:05.0465313Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T08:05:05.1358910Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T08:05:05.1981127Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T08:05:05.2800485Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T08:05:05.3358208Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T08:05:05.4610864Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T08:05:05.5904497Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T08:05:05.6528226Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T08:05:05.7366898Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T08:05:05.8207423Z Synchronizing submodule url for 'third_party/pocketfft' 2025-09-07T08:05:05.9478399Z Synchronizing submodule url for 'third_party/protobuf' 2025-09-07T08:05:06.0371238Z Synchronizing submodule url for 'third_party/protobuf/third_party/benchmark' 2025-09-07T08:05:06.0911463Z Synchronizing submodule url for 'third_party/protobuf/third_party/googletest' 2025-09-07T08:05:06.1781952Z Synchronizing submodule url for 'third_party/psimd' 2025-09-07T08:05:06.2436024Z Synchronizing submodule url for 'third_party/pthreadpool' 2025-09-07T08:05:06.3261532Z Synchronizing submodule url for 'third_party/pybind11' 2025-09-07T08:05:06.4124787Z Synchronizing submodule url for 'third_party/python-peachpy' 2025-09-07T08:05:06.5007404Z Synchronizing submodule url for 'third_party/sleef' 2025-09-07T08:05:06.5864455Z Synchronizing submodule url for 'third_party/tensorpipe' 2025-09-07T08:05:06.6716493Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/googletest' 2025-09-07T08:05:06.7335333Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libnop' 2025-09-07T08:05:06.8674439Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libuv' 2025-09-07T08:05:06.9266185Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T08:05:06.9715953Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T08:05:06.9770485Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2025-09-07T08:05:07.0370294Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2025-09-07T08:05:07.1001679Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2025-09-07T08:05:07.1758135Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2025-09-07T08:05:07.2434199Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2025-09-07T08:05:07.3743663Z Submodule path 'third_party/NVTX': checked out '2942f167cc30c5e3a44a2aecd5b0d9c07ff61a07' 2025-09-07T08:05:07.4399303Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1' 2025-09-07T08:05:07.8088079Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883' 2025-09-07T08:05:08.0790237Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150' 2025-09-07T08:05:08.4845519Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf' 2025-09-07T08:05:08.5765881Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f' 2025-09-07T08:05:08.9815676Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-09-07T08:05:09.1207558Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246' 2025-09-07T08:05:09.3004236Z Submodule path 'third_party/cpuinfo': checked out '5e3d2445e6a84d9599bee2bf78edbb4d80865e1d' 2025-09-07T08:05:09.3903042Z Submodule path 'third_party/cudnn_frontend': checked out 'f937055efc6d414d11f4c6577e3977fe74f35fb6' 2025-09-07T08:05:10.7252544Z Submodule path 'third_party/cutlass': checked out 'e51efbfe18fe4f4cbb66ab814c55bf4aa0185491' 2025-09-07T08:05:10.8979671Z Submodule path 'third_party/fbgemm': checked out '4b39c551efe15e6bbade20565b0ceb2d8ce3352d' 2025-09-07T08:05:11.0287525Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea' 2025-09-07T08:05:11.3702176Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out 'b1281b8b08d973a7064f864f47eeb30f3e2596e9' 2025-09-07T08:05:11.5390943Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-09-07T08:05:12.3464781Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '311f3c8e51dc0eb56310cfc6980bf63d0fbd7917' 2025-09-07T08:05:12.4318439Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-09-07T08:05:12.5306941Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691' 2025-09-07T08:05:12.6779691Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-09-07T08:05:12.8252758Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2025-09-07T08:05:13.1660745Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2025-09-07T08:05:14.3380874Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2025-09-07T08:05:14.6058865Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757' 2025-09-07T08:05:14.6919073Z Submodule path 'third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21' 2025-09-07T08:05:14.7557419Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2025-09-07T08:05:14.8399049Z Submodule path 'third_party/gloo': checked out 'c7b7b022c124d9643957d9bd55f57ac59fce8fa2' 2025-09-07T08:05:14.9205261Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-09-07T08:05:15.0071628Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3' 2025-09-07T08:05:15.2831585Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d' 2025-09-07T08:05:15.3485762Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959' 2025-09-07T08:05:15.4957418Z Submodule path 'third_party/kineto': checked out '5e7501833f1021ce6f618572d3baf657b6319658' 2025-09-07T08:05:15.7085686Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out '7d04a0053a845370ae06ce317a22a48e9edcc74e' 2025-09-07T08:05:15.8371894Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9' 2025-09-07T08:05:15.9315848Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400' 2025-09-07T08:05:16.0234423Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2025-09-07T08:05:16.1008884Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067' 2025-09-07T08:05:16.3190776Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4' 2025-09-07T08:05:16.5903417Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446' 2025-09-07T08:05:16.6796028Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '58d77fa8070e8cec2dc1ed015d66b454c8d78850' 2025-09-07T08:05:16.8283238Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5' 2025-09-07T08:05:16.9072868Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150' 2025-09-07T08:05:16.9925864Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '0041a40c1350ba702d475b9c4ad62da77caea164' 2025-09-07T08:05:17.1031941Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347' 2025-09-07T08:05:17.1832025Z Submodule path 'third_party/kleidiai': checked out 'cca02c2f69dd18e1f12647c1c0bdc8cf90e680c7' 2025-09-07T08:05:17.2739162Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e' 2025-09-07T08:05:17.4410707Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72' 2025-09-07T08:05:17.6734969Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83' 2025-09-07T08:05:17.7659332Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4' 2025-09-07T08:05:17.8932214Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878' 2025-09-07T08:05:17.9827889Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2' 2025-09-07T08:05:18.0372274Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1' 2025-09-07T08:05:18.0991723Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa' 2025-09-07T08:05:18.2673225Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d' 2025-09-07T08:05:18.3580735Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce' 2025-09-07T08:05:18.4451027Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5' 2025-09-07T08:05:18.5089325Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d' 2025-09-07T08:05:18.8294031Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4' 2025-09-07T08:05:18.8937405Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-09-07T08:05:19.6901889Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50' 2025-09-07T08:05:19.7815310Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa' 2025-09-07T08:05:20.1046765Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2025-09-07T08:05:20.1874228Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2025-09-07T08:05:20.3139954Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2025-09-07T08:05:20.4003657Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2025-09-07T08:05:20.4780361Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8' 2025-09-07T08:05:20.5645688Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8' 2025-09-07T08:05:20.7011487Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2025-09-07T08:05:20.7623536Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68' 2025-09-07T08:05:20.8444019Z Submodule path 'third_party/tensorpipe': checked out 'af0118d13e52f5a08841464a768e01a0bf3e3075' 2025-09-07T08:05:20.9322213Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2025-09-07T08:05:21.0246209Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2025-09-07T08:05:21.1302559Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b' 2025-09-07T08:05:21.2141673Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2025-09-07T08:05:21.3013191Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2025-09-07T08:05:21.3058954Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2025-09-07T08:05:21.3333545Z Entering 'android/libs/fbjni' 2025-09-07T08:05:21.3419417Z Entering 'third_party/FP16' 2025-09-07T08:05:21.3865791Z Entering 'third_party/FXdiv' 2025-09-07T08:05:21.4231515Z Entering 'third_party/NNPACK' 2025-09-07T08:05:21.4709012Z Entering 'third_party/NVTX' 2025-09-07T08:05:21.5128338Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T08:05:21.5605795Z Entering 'third_party/XNNPACK' 2025-09-07T08:05:21.5830192Z Entering 'third_party/aiter' 2025-09-07T08:05:21.6308210Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T08:05:21.6641384Z Entering 'third_party/benchmark' 2025-09-07T08:05:21.7477437Z Entering 'third_party/composable_kernel' 2025-09-07T08:05:21.7990165Z Entering 'third_party/cpp-httplib' 2025-09-07T08:05:21.8350756Z Entering 'third_party/cpuinfo' 2025-09-07T08:05:21.8807948Z Entering 'third_party/cudnn_frontend' 2025-09-07T08:05:21.9215820Z Entering 'third_party/cutlass' 2025-09-07T08:05:21.9706732Z Entering 'third_party/fbgemm' 2025-09-07T08:05:22.0114028Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T08:05:22.0584399Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T08:05:22.0983413Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T08:05:22.1403622Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T08:05:22.1865275Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T08:05:22.2313271Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T08:05:22.2729711Z Entering 'third_party/fbgemm/external/json' 2025-09-07T08:05:22.3213003Z Entering 'third_party/flash-attention' 2025-09-07T08:05:22.3679418Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T08:05:22.4065059Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T08:05:22.4511442Z Entering 'third_party/flatbuffers' 2025-09-07T08:05:22.4930469Z Entering 'third_party/fmt' 2025-09-07T08:05:22.5288786Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T08:05:22.5763340Z Entering 'third_party/gloo' 2025-09-07T08:05:22.6211032Z Entering 'third_party/googletest' 2025-09-07T08:05:22.6532462Z Entering 'third_party/ideep' 2025-09-07T08:05:22.7009384Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T08:05:22.7409539Z Entering 'third_party/ittapi' 2025-09-07T08:05:22.8305775Z Entering 'third_party/kineto' 2025-09-07T08:05:22.8697705Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T08:05:22.8868817Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T08:05:22.9350145Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T08:05:22.9727706Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T08:05:23.0203987Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T08:05:23.0601587Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T08:05:23.0831985Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T08:05:23.1305120Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T08:05:23.1664968Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T08:05:23.2135386Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T08:05:23.2553629Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T08:05:23.3037030Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T08:05:23.3445975Z Entering 'third_party/kleidiai' 2025-09-07T08:05:23.3919593Z Entering 'third_party/mimalloc' 2025-09-07T08:05:23.4340846Z Entering 'third_party/nlohmann' 2025-09-07T08:05:23.4792078Z Entering 'third_party/onnx' 2025-09-07T08:05:23.4957840Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T08:05:23.5412804Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T08:05:23.5806245Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T08:05:23.6272552Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T08:05:23.6652317Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T08:05:23.7140374Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T08:05:23.7499114Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T08:05:23.7945602Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T08:05:23.8336508Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T08:05:23.8561212Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T08:05:23.9019437Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T08:05:23.9337945Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T08:05:23.9816902Z Entering 'third_party/pocketfft' 2025-09-07T08:05:23.9941686Z Entering 'third_party/protobuf' 2025-09-07T08:05:24.0351565Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T08:05:24.0999498Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T08:05:24.1464322Z Entering 'third_party/psimd' 2025-09-07T08:05:24.1856751Z Entering 'third_party/pthreadpool' 2025-09-07T08:05:24.2752578Z Entering 'third_party/pybind11' 2025-09-07T08:05:24.3220823Z Entering 'third_party/python-peachpy' 2025-09-07T08:05:24.3604543Z Entering 'third_party/sleef' 2025-09-07T08:05:24.3814706Z Entering 'third_party/tensorpipe' 2025-09-07T08:05:24.4204523Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T08:05:24.4645610Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T08:05:24.5100782Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T08:05:24.5509339Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T08:05:24.6452341Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T08:05:24.6893678Z ##[endgroup] 2025-09-07T08:05:24.6894348Z ##[group]Persisting credentials for submodules 2025-09-07T08:05:24.6900769Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-09-07T08:05:24.7172164Z Entering 'android/libs/fbjni' 2025-09-07T08:05:24.7202067Z url.https://github.com/.insteadof 2025-09-07T08:05:24.7202329Z url.https://github.com/.insteadof 2025-09-07T08:05:24.7485594Z Entering 'third_party/FP16' 2025-09-07T08:05:24.7514211Z url.https://github.com/.insteadof 2025-09-07T08:05:24.7514532Z url.https://github.com/.insteadof 2025-09-07T08:05:24.8351578Z Entering 'third_party/FXdiv' 2025-09-07T08:05:24.8380975Z url.https://github.com/.insteadof 2025-09-07T08:05:24.8381275Z url.https://github.com/.insteadof 2025-09-07T08:05:24.8834518Z Entering 'third_party/NNPACK' 2025-09-07T08:05:24.8863262Z url.https://github.com/.insteadof 2025-09-07T08:05:24.8863573Z url.https://github.com/.insteadof 2025-09-07T08:05:24.9018434Z Entering 'third_party/NVTX' 2025-09-07T08:05:24.9048963Z url.https://github.com/.insteadof 2025-09-07T08:05:24.9049252Z url.https://github.com/.insteadof 2025-09-07T08:05:24.9506719Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T08:05:24.9536514Z url.https://github.com/.insteadof 2025-09-07T08:05:24.9536783Z url.https://github.com/.insteadof 2025-09-07T08:05:24.9870428Z Entering 'third_party/XNNPACK' 2025-09-07T08:05:24.9899519Z url.https://github.com/.insteadof 2025-09-07T08:05:24.9899817Z url.https://github.com/.insteadof 2025-09-07T08:05:25.0728890Z Entering 'third_party/aiter' 2025-09-07T08:05:25.0758332Z url.https://github.com/.insteadof 2025-09-07T08:05:25.0758659Z url.https://github.com/.insteadof 2025-09-07T08:05:25.1192415Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T08:05:25.1220430Z url.https://github.com/.insteadof 2025-09-07T08:05:25.1220734Z url.https://github.com/.insteadof 2025-09-07T08:05:25.1393853Z Entering 'third_party/benchmark' 2025-09-07T08:05:25.1422818Z url.https://github.com/.insteadof 2025-09-07T08:05:25.1423077Z url.https://github.com/.insteadof 2025-09-07T08:05:25.2078650Z Entering 'third_party/composable_kernel' 2025-09-07T08:05:25.2107620Z url.https://github.com/.insteadof 2025-09-07T08:05:25.2107971Z url.https://github.com/.insteadof 2025-09-07T08:05:25.2420131Z Entering 'third_party/cpp-httplib' 2025-09-07T08:05:25.2448905Z url.https://github.com/.insteadof 2025-09-07T08:05:25.2449162Z url.https://github.com/.insteadof 2025-09-07T08:05:25.2881701Z Entering 'third_party/cpuinfo' 2025-09-07T08:05:25.2910843Z url.https://github.com/.insteadof 2025-09-07T08:05:25.2911148Z url.https://github.com/.insteadof 2025-09-07T08:05:25.3398495Z Entering 'third_party/cudnn_frontend' 2025-09-07T08:05:25.3428981Z url.https://github.com/.insteadof 2025-09-07T08:05:25.3429262Z url.https://github.com/.insteadof 2025-09-07T08:05:25.4070345Z Entering 'third_party/cutlass' 2025-09-07T08:05:25.4099327Z url.https://github.com/.insteadof 2025-09-07T08:05:25.4099622Z url.https://github.com/.insteadof 2025-09-07T08:05:25.4971175Z Entering 'third_party/fbgemm' 2025-09-07T08:05:25.5000764Z url.https://github.com/.insteadof 2025-09-07T08:05:25.5001058Z url.https://github.com/.insteadof 2025-09-07T08:05:25.5380941Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T08:05:25.5408533Z url.https://github.com/.insteadof 2025-09-07T08:05:25.5408829Z url.https://github.com/.insteadof 2025-09-07T08:05:25.5856159Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T08:05:25.5886197Z url.https://github.com/.insteadof 2025-09-07T08:05:25.5886500Z url.https://github.com/.insteadof 2025-09-07T08:05:25.6061112Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T08:05:25.6088934Z url.https://github.com/.insteadof 2025-09-07T08:05:25.6089210Z url.https://github.com/.insteadof 2025-09-07T08:05:25.6433566Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T08:05:25.6461482Z url.https://github.com/.insteadof 2025-09-07T08:05:25.6461757Z url.https://github.com/.insteadof 2025-09-07T08:05:25.6914791Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T08:05:25.6942304Z url.https://github.com/.insteadof 2025-09-07T08:05:25.6942640Z url.https://github.com/.insteadof 2025-09-07T08:05:25.7365053Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T08:05:25.7392859Z url.https://github.com/.insteadof 2025-09-07T08:05:25.7393215Z url.https://github.com/.insteadof 2025-09-07T08:05:25.7565724Z Entering 'third_party/fbgemm/external/json' 2025-09-07T08:05:25.7592880Z url.https://github.com/.insteadof 2025-09-07T08:05:25.7593353Z url.https://github.com/.insteadof 2025-09-07T08:05:25.7955092Z Entering 'third_party/flash-attention' 2025-09-07T08:05:25.7984693Z url.https://github.com/.insteadof 2025-09-07T08:05:25.7984942Z url.https://github.com/.insteadof 2025-09-07T08:05:25.8433254Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T08:05:25.8460868Z url.https://github.com/.insteadof 2025-09-07T08:05:25.8461156Z url.https://github.com/.insteadof 2025-09-07T08:05:25.8792203Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T08:05:25.8820718Z url.https://github.com/.insteadof 2025-09-07T08:05:25.8821028Z url.https://github.com/.insteadof 2025-09-07T08:05:25.9255754Z Entering 'third_party/flatbuffers' 2025-09-07T08:05:25.9286568Z url.https://github.com/.insteadof 2025-09-07T08:05:25.9286886Z url.https://github.com/.insteadof 2025-09-07T08:05:25.9640090Z Entering 'third_party/fmt' 2025-09-07T08:05:25.9668512Z url.https://github.com/.insteadof 2025-09-07T08:05:25.9668828Z url.https://github.com/.insteadof 2025-09-07T08:05:25.9820830Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T08:05:25.9849175Z url.https://github.com/.insteadof 2025-09-07T08:05:25.9849435Z url.https://github.com/.insteadof 2025-09-07T08:05:26.0265163Z Entering 'third_party/gloo' 2025-09-07T08:05:26.0301757Z url.https://github.com/.insteadof 2025-09-07T08:05:26.0302074Z url.https://github.com/.insteadof 2025-09-07T08:05:26.0644820Z Entering 'third_party/googletest' 2025-09-07T08:05:26.0674960Z url.https://github.com/.insteadof 2025-09-07T08:05:26.0675286Z url.https://github.com/.insteadof 2025-09-07T08:05:26.1104877Z Entering 'third_party/ideep' 2025-09-07T08:05:26.1134047Z url.https://github.com/.insteadof 2025-09-07T08:05:26.1135966Z url.https://github.com/.insteadof 2025-09-07T08:05:26.1449584Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T08:05:26.1479313Z url.https://github.com/.insteadof 2025-09-07T08:05:26.1479622Z url.https://github.com/.insteadof 2025-09-07T08:05:26.1948413Z Entering 'third_party/ittapi' 2025-09-07T08:05:26.1977586Z url.https://github.com/.insteadof 2025-09-07T08:05:26.1977848Z url.https://github.com/.insteadof 2025-09-07T08:05:26.2356802Z Entering 'third_party/kineto' 2025-09-07T08:05:26.2386651Z url.https://github.com/.insteadof 2025-09-07T08:05:26.2386948Z url.https://github.com/.insteadof 2025-09-07T08:05:26.2560020Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T08:05:26.2590945Z url.https://github.com/.insteadof 2025-09-07T08:05:26.2591253Z url.https://github.com/.insteadof 2025-09-07T08:05:26.2998505Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T08:05:26.3027456Z url.https://github.com/.insteadof 2025-09-07T08:05:26.3027743Z url.https://github.com/.insteadof 2025-09-07T08:05:26.3436502Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T08:05:26.3463222Z url.https://github.com/.insteadof 2025-09-07T08:05:26.3464226Z url.https://github.com/.insteadof 2025-09-07T08:05:26.3879000Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T08:05:26.3909791Z url.https://github.com/.insteadof 2025-09-07T08:05:26.3910088Z url.https://github.com/.insteadof 2025-09-07T08:05:26.4048101Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T08:05:26.4075740Z url.https://github.com/.insteadof 2025-09-07T08:05:26.4076022Z url.https://github.com/.insteadof 2025-09-07T08:05:26.4515721Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T08:05:26.4544550Z url.https://github.com/.insteadof 2025-09-07T08:05:26.4544865Z url.https://github.com/.insteadof 2025-09-07T08:05:26.4888156Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T08:05:26.4915139Z url.https://github.com/.insteadof 2025-09-07T08:05:26.4915437Z url.https://github.com/.insteadof 2025-09-07T08:05:26.5373986Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T08:05:26.5401832Z url.https://github.com/.insteadof 2025-09-07T08:05:26.5402113Z url.https://github.com/.insteadof 2025-09-07T08:05:26.5745886Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T08:05:26.5773903Z url.https://github.com/.insteadof 2025-09-07T08:05:26.5774191Z url.https://github.com/.insteadof 2025-09-07T08:05:26.5943661Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T08:05:26.5971554Z url.https://github.com/.insteadof 2025-09-07T08:05:26.5971823Z url.https://github.com/.insteadof 2025-09-07T08:05:26.6411341Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T08:05:26.6439327Z url.https://github.com/.insteadof 2025-09-07T08:05:26.6439636Z url.https://github.com/.insteadof 2025-09-07T08:05:26.6848918Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T08:05:26.6877522Z url.https://github.com/.insteadof 2025-09-07T08:05:26.6877845Z url.https://github.com/.insteadof 2025-09-07T08:05:26.7266498Z Entering 'third_party/kleidiai' 2025-09-07T08:05:26.7295421Z url.https://github.com/.insteadof 2025-09-07T08:05:26.7295686Z url.https://github.com/.insteadof 2025-09-07T08:05:26.8121426Z Entering 'third_party/mimalloc' 2025-09-07T08:05:26.8151249Z url.https://github.com/.insteadof 2025-09-07T08:05:26.8151556Z url.https://github.com/.insteadof 2025-09-07T08:05:26.8324605Z Entering 'third_party/nlohmann' 2025-09-07T08:05:26.8354509Z url.https://github.com/.insteadof 2025-09-07T08:05:26.8354863Z url.https://github.com/.insteadof 2025-09-07T08:05:26.8772644Z Entering 'third_party/onnx' 2025-09-07T08:05:26.8802200Z url.https://github.com/.insteadof 2025-09-07T08:05:26.8802507Z url.https://github.com/.insteadof 2025-09-07T08:05:26.9207194Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T08:05:26.9235443Z url.https://github.com/.insteadof 2025-09-07T08:05:26.9235723Z url.https://github.com/.insteadof 2025-09-07T08:05:26.9686731Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T08:05:26.9715257Z url.https://github.com/.insteadof 2025-09-07T08:05:26.9715568Z url.https://github.com/.insteadof 2025-09-07T08:05:27.0099978Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T08:05:27.0128680Z url.https://github.com/.insteadof 2025-09-07T08:05:27.0128969Z url.https://github.com/.insteadof 2025-09-07T08:05:27.0548845Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T08:05:27.0576209Z url.https://github.com/.insteadof 2025-09-07T08:05:27.0576496Z url.https://github.com/.insteadof 2025-09-07T08:05:27.1028717Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T08:05:27.1056302Z url.https://github.com/.insteadof 2025-09-07T08:05:27.1056605Z url.https://github.com/.insteadof 2025-09-07T08:05:27.1448652Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T08:05:27.1477130Z url.https://github.com/.insteadof 2025-09-07T08:05:27.1477488Z url.https://github.com/.insteadof 2025-09-07T08:05:27.1912890Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T08:05:27.1939456Z url.https://github.com/.insteadof 2025-09-07T08:05:27.1939748Z url.https://github.com/.insteadof 2025-09-07T08:05:27.2123190Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T08:05:27.2150784Z url.https://github.com/.insteadof 2025-09-07T08:05:27.2151101Z url.https://github.com/.insteadof 2025-09-07T08:05:27.2535185Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T08:05:27.2562648Z url.https://github.com/.insteadof 2025-09-07T08:05:27.2562930Z url.https://github.com/.insteadof 2025-09-07T08:05:27.2945810Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T08:05:27.2975343Z url.https://github.com/.insteadof 2025-09-07T08:05:27.2975637Z url.https://github.com/.insteadof 2025-09-07T08:05:27.3410661Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T08:05:27.3438514Z url.https://github.com/.insteadof 2025-09-07T08:05:27.3438815Z url.https://github.com/.insteadof 2025-09-07T08:05:27.3861903Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T08:05:27.3889124Z url.https://github.com/.insteadof 2025-09-07T08:05:27.3889418Z url.https://github.com/.insteadof 2025-09-07T08:05:27.4326068Z Entering 'third_party/pocketfft' 2025-09-07T08:05:27.4355356Z url.https://github.com/.insteadof 2025-09-07T08:05:27.4355614Z url.https://github.com/.insteadof 2025-09-07T08:05:27.4502764Z Entering 'third_party/protobuf' 2025-09-07T08:05:27.4532034Z url.https://github.com/.insteadof 2025-09-07T08:05:27.4532308Z url.https://github.com/.insteadof 2025-09-07T08:05:27.5294161Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T08:05:27.5321741Z url.https://github.com/.insteadof 2025-09-07T08:05:27.5322007Z url.https://github.com/.insteadof 2025-09-07T08:05:27.5757568Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T08:05:27.5785513Z url.https://github.com/.insteadof 2025-09-07T08:05:27.5785795Z url.https://github.com/.insteadof 2025-09-07T08:05:27.6235154Z Entering 'third_party/psimd' 2025-09-07T08:05:27.6264103Z url.https://github.com/.insteadof 2025-09-07T08:05:27.6264419Z url.https://github.com/.insteadof 2025-09-07T08:05:27.6826376Z Entering 'third_party/pthreadpool' 2025-09-07T08:05:27.6855824Z url.https://github.com/.insteadof 2025-09-07T08:05:27.6856104Z url.https://github.com/.insteadof 2025-09-07T08:05:27.7713510Z Entering 'third_party/pybind11' 2025-09-07T08:05:27.7742268Z url.https://github.com/.insteadof 2025-09-07T08:05:27.7742570Z url.https://github.com/.insteadof 2025-09-07T08:05:27.8192716Z Entering 'third_party/python-peachpy' 2025-09-07T08:05:27.8222038Z url.https://github.com/.insteadof 2025-09-07T08:05:27.8222348Z url.https://github.com/.insteadof 2025-09-07T08:05:27.8627423Z Entering 'third_party/sleef' 2025-09-07T08:05:27.8656362Z url.https://github.com/.insteadof 2025-09-07T08:05:27.8656634Z url.https://github.com/.insteadof 2025-09-07T08:05:27.9044338Z Entering 'third_party/tensorpipe' 2025-09-07T08:05:27.9073343Z url.https://github.com/.insteadof 2025-09-07T08:05:27.9073629Z url.https://github.com/.insteadof 2025-09-07T08:05:27.9883375Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T08:05:27.9911351Z url.https://github.com/.insteadof 2025-09-07T08:05:27.9911641Z url.https://github.com/.insteadof 2025-09-07T08:05:28.0342881Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T08:05:28.0370233Z url.https://github.com/.insteadof 2025-09-07T08:05:28.0370525Z url.https://github.com/.insteadof 2025-09-07T08:05:28.0752733Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T08:05:28.0780020Z url.https://github.com/.insteadof 2025-09-07T08:05:28.0780307Z url.https://github.com/.insteadof 2025-09-07T08:05:28.1598120Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T08:05:28.1625448Z url.https://github.com/.insteadof 2025-09-07T08:05:28.1625986Z url.https://github.com/.insteadof 2025-09-07T08:05:28.1791877Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T08:05:28.1819249Z url.https://github.com/.insteadof 2025-09-07T08:05:28.1819536Z url.https://github.com/.insteadof 2025-09-07T08:05:28.2018111Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-09-07T08:05:28.2288182Z Entering 'android/libs/fbjni' 2025-09-07T08:05:28.2452106Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-09-07T08:05:28.2473538Z Entering 'third_party/FP16' 2025-09-07T08:05:28.2848208Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-09-07T08:05:28.2869289Z Entering 'third_party/FXdiv' 2025-09-07T08:05:28.3306350Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-09-07T08:05:28.3328583Z Entering 'third_party/NNPACK' 2025-09-07T08:05:28.3692423Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-09-07T08:05:28.3713482Z Entering 'third_party/NVTX' 2025-09-07T08:05:28.4106374Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-09-07T08:05:28.4132355Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T08:05:28.5014786Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-09-07T08:05:28.5041147Z Entering 'third_party/XNNPACK' 2025-09-07T08:05:28.5466118Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-09-07T08:05:28.5501885Z Entering 'third_party/aiter' 2025-09-07T08:05:28.5878880Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-09-07T08:05:28.5900176Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T08:05:28.6057786Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-09-07T08:05:28.6083875Z Entering 'third_party/benchmark' 2025-09-07T08:05:28.6573505Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-09-07T08:05:28.6596234Z Entering 'third_party/composable_kernel' 2025-09-07T08:05:28.6955541Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-09-07T08:05:28.6985502Z Entering 'third_party/cpp-httplib' 2025-09-07T08:05:28.7429629Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-09-07T08:05:28.7451068Z Entering 'third_party/cpuinfo' 2025-09-07T08:05:28.7904847Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-09-07T08:05:28.7931957Z Entering 'third_party/cudnn_frontend' 2025-09-07T08:05:28.8286147Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-09-07T08:05:28.8307758Z Entering 'third_party/cutlass' 2025-09-07T08:05:28.8462061Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-09-07T08:05:28.8492451Z Entering 'third_party/fbgemm' 2025-09-07T08:05:28.8914932Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-09-07T08:05:28.8943401Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T08:05:28.9337593Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-09-07T08:05:28.9357888Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T08:05:28.9814820Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-09-07T08:05:28.9841538Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T08:05:29.0014232Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-09-07T08:05:29.0034937Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T08:05:29.0407961Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-09-07T08:05:29.0436331Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T08:05:29.0885004Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-09-07T08:05:29.0905450Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T08:05:29.1293078Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-09-07T08:05:29.1313904Z Entering 'third_party/fbgemm/external/json' 2025-09-07T08:05:29.1525462Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-09-07T08:05:29.1550587Z Entering 'third_party/flash-attention' 2025-09-07T08:05:29.1758107Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-09-07T08:05:29.1778925Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T08:05:29.2138515Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-09-07T08:05:29.2165417Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T08:05:29.2371456Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-09-07T08:05:29.2402137Z Entering 'third_party/flatbuffers' 2025-09-07T08:05:29.2818687Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-09-07T08:05:29.2842723Z Entering 'third_party/fmt' 2025-09-07T08:05:29.3187083Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-09-07T08:05:29.3208792Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T08:05:29.3408608Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-09-07T08:05:29.3430983Z Entering 'third_party/gloo' 2025-09-07T08:05:29.3788312Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-09-07T08:05:29.3810238Z Entering 'third_party/googletest' 2025-09-07T08:05:29.4264827Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-09-07T08:05:29.4286928Z Entering 'third_party/ideep' 2025-09-07T08:05:29.4702505Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-09-07T08:05:29.4724717Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T08:05:29.4943112Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-09-07T08:05:29.4972492Z Entering 'third_party/ittapi' 2025-09-07T08:05:29.5125647Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-09-07T08:05:29.5147164Z Entering 'third_party/kineto' 2025-09-07T08:05:29.5504272Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-09-07T08:05:29.5525995Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T08:05:29.5941230Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-09-07T08:05:29.5961590Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T08:05:29.6336626Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-09-07T08:05:29.6360080Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T08:05:29.6822636Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-09-07T08:05:29.6845606Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T08:05:29.7204708Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-09-07T08:05:29.7225552Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T08:05:29.7686206Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-09-07T08:05:29.7706519Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T08:05:29.8104218Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-09-07T08:05:29.8127807Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T08:05:29.8565749Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-09-07T08:05:29.8587258Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T08:05:29.8989905Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-09-07T08:05:29.9011550Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T08:05:29.9470001Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-09-07T08:05:29.9494855Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T08:05:29.9652579Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-09-07T08:05:29.9676698Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T08:05:30.0071585Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-09-07T08:05:30.0091772Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T08:05:30.0551775Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-09-07T08:05:30.0575536Z Entering 'third_party/kleidiai' 2025-09-07T08:05:30.1012421Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-09-07T08:05:30.1035126Z Entering 'third_party/mimalloc' 2025-09-07T08:05:30.1468878Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-09-07T08:05:30.1491054Z Entering 'third_party/nlohmann' 2025-09-07T08:05:30.1947372Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-09-07T08:05:30.1970315Z Entering 'third_party/onnx' 2025-09-07T08:05:30.2328109Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-09-07T08:05:30.2365274Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T08:05:30.2790777Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-09-07T08:05:30.2818431Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T08:05:30.3239390Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-09-07T08:05:30.3263670Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T08:05:30.3706410Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-09-07T08:05:30.3727535Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T08:05:30.4170133Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-09-07T08:05:30.4191588Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T08:05:30.4636304Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-09-07T08:05:30.4656564Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T08:05:30.4953520Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-09-07T08:05:30.4975407Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T08:05:30.5419796Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-09-07T08:05:30.5444017Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T08:05:30.5779988Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-09-07T08:05:30.5800502Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T08:05:30.6241298Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-09-07T08:05:30.6261678Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T08:05:30.6650957Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-09-07T08:05:30.6674273Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T08:05:30.7094092Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-09-07T08:05:30.7118469Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T08:05:30.7547469Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-09-07T08:05:30.7588581Z Entering 'third_party/pocketfft' 2025-09-07T08:05:30.7916136Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-09-07T08:05:30.7941763Z Entering 'third_party/protobuf' 2025-09-07T08:05:30.8129031Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-09-07T08:05:30.8153071Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T08:05:30.8319515Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-09-07T08:05:30.8340875Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T08:05:30.8611289Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-09-07T08:05:30.8636843Z Entering 'third_party/psimd' 2025-09-07T08:05:30.9080881Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-09-07T08:05:30.9103206Z Entering 'third_party/pthreadpool' 2025-09-07T08:05:30.9275109Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-09-07T08:05:30.9296753Z Entering 'third_party/pybind11' 2025-09-07T08:05:30.9519180Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-09-07T08:05:30.9541315Z Entering 'third_party/python-peachpy' 2025-09-07T08:05:31.0008964Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-09-07T08:05:31.0030802Z Entering 'third_party/sleef' 2025-09-07T08:05:31.0424965Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-09-07T08:05:31.0446995Z Entering 'third_party/tensorpipe' 2025-09-07T08:05:31.0892653Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-09-07T08:05:31.0914100Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T08:05:31.1365142Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-09-07T08:05:31.1386593Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T08:05:31.1808832Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-09-07T08:05:31.1829089Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T08:05:31.2184791Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-09-07T08:05:31.2206821Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T08:05:31.2662706Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-09-07T08:05:31.2681822Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T08:05:31.3142406Z file:/home/charlie/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-09-07T08:05:34.4837995Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-09-07T08:05:34.5124666Z Entering 'android/libs/fbjni' 2025-09-07T08:05:34.5303358Z Entering 'third_party/FP16' 2025-09-07T08:05:34.5728736Z Entering 'third_party/FXdiv' 2025-09-07T08:05:34.6204395Z Entering 'third_party/NNPACK' 2025-09-07T08:05:34.6645212Z Entering 'third_party/NVTX' 2025-09-07T08:05:34.7133069Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T08:05:34.7563424Z Entering 'third_party/XNNPACK' 2025-09-07T08:05:34.7753355Z Entering 'third_party/aiter' 2025-09-07T08:05:34.8206469Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T08:05:34.8622801Z Entering 'third_party/benchmark' 2025-09-07T08:05:34.9096025Z Entering 'third_party/composable_kernel' 2025-09-07T08:05:34.9528679Z Entering 'third_party/cpp-httplib' 2025-09-07T08:05:34.9984586Z Entering 'third_party/cpuinfo' 2025-09-07T08:05:35.0465168Z Entering 'third_party/cudnn_frontend' 2025-09-07T08:05:35.0906884Z Entering 'third_party/cutlass' 2025-09-07T08:05:35.1372281Z Entering 'third_party/fbgemm' 2025-09-07T08:05:35.1617065Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T08:05:35.2101091Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T08:05:35.6392275Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T08:05:35.6786918Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T08:05:35.7267620Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T08:05:35.7704958Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T08:05:35.8376813Z Entering 'third_party/fbgemm/external/json' 2025-09-07T08:05:35.8817417Z Entering 'third_party/flash-attention' 2025-09-07T08:05:35.9307306Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T08:05:35.9748807Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T08:05:36.0193163Z Entering 'third_party/flatbuffers' 2025-09-07T08:05:36.0664912Z Entering 'third_party/fmt' 2025-09-07T08:05:36.1061682Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T08:05:36.1532496Z Entering 'third_party/gloo' 2025-09-07T08:05:36.1945337Z Entering 'third_party/googletest' 2025-09-07T08:05:36.2396921Z Entering 'third_party/ideep' 2025-09-07T08:05:36.2822219Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T08:05:36.3316151Z Entering 'third_party/ittapi' 2025-09-07T08:05:36.3721180Z Entering 'third_party/kineto' 2025-09-07T08:05:36.4194837Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T08:05:36.4611456Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T08:05:36.5090048Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T08:05:36.5265681Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T08:05:36.5660592Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T08:05:36.6136959Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T08:05:36.6537478Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T08:05:36.7007128Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T08:05:36.7200681Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T08:05:36.7626007Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T08:05:36.8108756Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T08:05:36.8512526Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T08:05:36.8990495Z Entering 'third_party/kleidiai' 2025-09-07T08:05:36.9423248Z Entering 'third_party/mimalloc' 2025-09-07T08:05:36.9865240Z Entering 'third_party/nlohmann' 2025-09-07T08:05:37.0320284Z Entering 'third_party/onnx' 2025-09-07T08:05:37.0692381Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T08:05:37.1144450Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T08:05:37.1547742Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T08:05:37.2024690Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T08:05:37.2258906Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T08:05:37.2710683Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T08:05:37.3105187Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T08:05:37.3585230Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T08:05:37.4020432Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T08:05:37.4418807Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T08:05:37.4892907Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T08:05:37.5067532Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T08:05:37.5494849Z Entering 'third_party/pocketfft' 2025-09-07T08:05:37.5954967Z Entering 'third_party/protobuf' 2025-09-07T08:05:37.6361600Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T08:05:37.6833159Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T08:05:37.7284819Z Entering 'third_party/psimd' 2025-09-07T08:05:37.7758114Z Entering 'third_party/pthreadpool' 2025-09-07T08:05:37.8151698Z Entering 'third_party/pybind11' 2025-09-07T08:05:37.8634578Z Entering 'third_party/python-peachpy' 2025-09-07T08:05:37.9065790Z Entering 'third_party/sleef' 2025-09-07T08:05:37.9546290Z Entering 'third_party/tensorpipe' 2025-09-07T08:05:37.9955855Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T08:05:38.0445995Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T08:05:38.0864381Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T08:05:38.1331064Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T08:05:38.1774230Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T08:05:38.2170417Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-09-07T08:05:38.2440618Z Entering 'android/libs/fbjni' 2025-09-07T08:05:38.2620898Z Entering 'third_party/FP16' 2025-09-07T08:05:38.3083288Z Entering 'third_party/FXdiv' 2025-09-07T08:05:38.3500923Z Entering 'third_party/NNPACK' 2025-09-07T08:05:38.3697213Z Entering 'third_party/NVTX' 2025-09-07T08:05:38.4175035Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T08:05:38.4564333Z Entering 'third_party/XNNPACK' 2025-09-07T08:05:38.5059925Z Entering 'third_party/aiter' 2025-09-07T08:05:38.5452275Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T08:05:38.5941380Z Entering 'third_party/benchmark' 2025-09-07T08:05:38.6346428Z Entering 'third_party/composable_kernel' 2025-09-07T08:05:38.6828432Z Entering 'third_party/cpp-httplib' 2025-09-07T08:05:38.7226344Z Entering 'third_party/cpuinfo' 2025-09-07T08:05:38.7703074Z Entering 'third_party/cudnn_frontend' 2025-09-07T08:05:38.8151098Z Entering 'third_party/cutlass' 2025-09-07T08:05:38.8334984Z Entering 'third_party/fbgemm' 2025-09-07T08:05:38.8813222Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T08:05:38.9019036Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T08:05:38.9385532Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T08:05:38.9856462Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T08:05:39.0066785Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T08:05:39.0466252Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T08:05:39.0947737Z Entering 'third_party/fbgemm/external/json' 2025-09-07T08:05:39.1416981Z Entering 'third_party/flash-attention' 2025-09-07T08:05:39.1862622Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T08:05:39.2305619Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T08:05:39.2489668Z Entering 'third_party/flatbuffers' 2025-09-07T08:05:39.2690063Z Entering 'third_party/fmt' 2025-09-07T08:05:39.3062477Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T08:05:39.3540511Z Entering 'third_party/gloo' 2025-09-07T08:05:39.3965434Z Entering 'third_party/googletest' 2025-09-07T08:05:39.4423022Z Entering 'third_party/ideep' 2025-09-07T08:05:39.4908067Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T08:05:39.5352259Z Entering 'third_party/ittapi' 2025-09-07T08:05:39.5817836Z Entering 'third_party/kineto' 2025-09-07T08:05:39.6182463Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T08:05:39.6660938Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T08:05:39.7057227Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T08:05:39.7533303Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T08:05:39.7923485Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T08:05:39.8398132Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T08:05:39.8852504Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T08:05:39.9007595Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T08:05:39.9497367Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T08:05:39.9886972Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T08:05:40.0368619Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T08:05:40.0764028Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T08:05:40.1249365Z Entering 'third_party/kleidiai' 2025-09-07T08:05:40.1476059Z Entering 'third_party/mimalloc' 2025-09-07T08:05:40.1865382Z Entering 'third_party/nlohmann' 2025-09-07T08:05:40.2356820Z Entering 'third_party/onnx' 2025-09-07T08:05:40.2765006Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T08:05:40.3227597Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T08:05:40.3687196Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T08:05:40.4100103Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T08:05:40.4582941Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T08:05:40.4766041Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T08:05:40.5201693Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T08:05:40.5367666Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T08:05:40.5608586Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T08:05:40.6064432Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T08:05:40.6463617Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T08:05:40.6948338Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T08:05:40.7360594Z Entering 'third_party/pocketfft' 2025-09-07T08:05:40.7818708Z Entering 'third_party/protobuf' 2025-09-07T08:05:40.8282684Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T08:05:40.8717606Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T08:05:40.9190278Z Entering 'third_party/psimd' 2025-09-07T08:05:40.9644193Z Entering 'third_party/pthreadpool' 2025-09-07T08:05:41.0009192Z Entering 'third_party/pybind11' 2025-09-07T08:05:41.0483091Z Entering 'third_party/python-peachpy' 2025-09-07T08:05:41.0694895Z Entering 'third_party/sleef' 2025-09-07T08:05:41.1074396Z Entering 'third_party/tensorpipe' 2025-09-07T08:05:41.1546569Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T08:05:41.1984347Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T08:05:41.2454061Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T08:05:41.2919524Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T08:05:41.3340216Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T08:05:41.3836263Z ##[endgroup] 2025-09-07T08:05:41.3879408Z [command]/usr/bin/git log -1 --format=%H 2025-09-07T08:05:41.3908168Z 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T08:05:41.4077960Z Prepare all required actions 2025-09-07T08:05:41.4078450Z Getting action download info 2025-09-07T08:05:41.6834277Z ##[group]Run ./.github/actions/setup-linux 2025-09-07T08:05:41.6834601Z env: 2025-09-07T08:05:41.6834761Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:05:41.6834946Z ##[endgroup] 2025-09-07T08:05:41.7262970Z ##[group]Run set -euo pipefail 2025-09-07T08:05:41.7263265Z set -euo pipefail 2025-09-07T08:05:41.7263489Z function get_ec2_metadata() { 2025-09-07T08:05:41.7263968Z  # Pulled from instance metadata endpoint for EC2 2025-09-07T08:05:41.7264441Z  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html 2025-09-07T08:05:41.7264847Z  category=$1 2025-09-07T08:05:41.7265110Z  # If it is GCP runner (runner name contains gcp), do not run this 2025-09-07T08:05:41.7265425Z  runner_name_str=i-05a095f6e498981b2-1003 2025-09-07T08:05:41.7265726Z  if [[ -f /.inarc ]]; then 2025-09-07T08:05:41.7265991Z  echo "ARC Runner, no info on ec2 metadata" 2025-09-07T08:05:41.7266262Z  elif [[ $runner_name_str == *"gcp"* ]]; then 2025-09-07T08:05:41.7266618Z  echo "Runner is from Google Cloud Platform, No info on ec2 metadata" 2025-09-07T08:05:41.7266921Z  else 2025-09-07T08:05:41.7267541Z  curl -H "X-aws-ec2-metadata-token: $(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 30")" -fsSL "http://169.254.169.254/latest/meta-data/${category}" 2025-09-07T08:05:41.7268198Z  fi 2025-09-07T08:05:41.7268351Z } 2025-09-07T08:05:41.7268550Z echo "ami-id: $(get_ec2_metadata ami-id)" 2025-09-07T08:05:41.7268865Z echo "instance-id: $(get_ec2_metadata instance-id)" 2025-09-07T08:05:41.7269209Z echo "instance-type: $(get_ec2_metadata instance-type)" 2025-09-07T08:05:41.7269501Z echo "system info $(uname -a)" 2025-09-07T08:05:41.7285902Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:05:41.7286197Z env: 2025-09-07T08:05:41.7286358Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:05:41.7286562Z ##[endgroup] 2025-09-07T08:05:41.7748585Z ami-id: ARC Runner, no info on ec2 metadata 2025-09-07T08:05:41.7755703Z instance-id: ARC Runner, no info on ec2 metadata 2025-09-07T08:05:41.7761765Z instance-type: ARC Runner, no info on ec2 metadata 2025-09-07T08:05:41.7773396Z system info Linux 784802b6db88 6.8.0-1031-aws #33-Ubuntu SMP Fri Jun 20 18:11:07 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux 2025-09-07T08:05:41.9105713Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T08:05:41.9106460Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T08:05:41.9122463Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:05:41.9122762Z env: 2025-09-07T08:05:41.9122930Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:05:41.9123132Z ##[endgroup] 2025-09-07T08:05:42.0191748Z ##[group]Run nick-fields/retry@v3.0.0 2025-09-07T08:05:42.0191996Z with: 2025-09-07T08:05:42.0192154Z shell: bash 2025-09-07T08:05:42.0192322Z timeout_minutes: 5 2025-09-07T08:05:42.0192499Z max_attempts: 3 2025-09-07T08:05:42.0192679Z retry_wait_seconds: 30 2025-09-07T08:05:42.0194876Z command: AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" # For LF Runners we need to make sure we also login to Meta's ECR docker registry too. META_AWS_ACCOUNT_ID=308535385114 if [ "$AWS_ACCOUNT_ID" != "$META_AWS_ACCOUNT_ID" ] ; then aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$META_AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" fi 2025-09-07T08:05:42.0196749Z polling_interval_seconds: 1 2025-09-07T08:05:42.0197044Z warning_on_retry: true 2025-09-07T08:05:42.0197268Z continue_on_error: false 2025-09-07T08:05:42.0197482Z env: 2025-09-07T08:05:42.0197646Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:05:42.0197855Z AWS_RETRY_MODE: standard 2025-09-07T08:05:42.0198061Z AWS_MAX_ATTEMPTS: 5 2025-09-07T08:05:42.0198263Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T08:05:42.0198486Z ##[endgroup] 2025-09-07T08:05:43.5710527Z 2025-09-07T08:05:43.5711125Z WARNING! Your credentials are stored unencrypted in '/home/charlie/.docker/config.json'. 2025-09-07T08:05:43.5711718Z Configure a credential helper to remove this warning. See 2025-09-07T08:05:43.5712136Z https://docs.docker.com/go/credential-store/ 2025-09-07T08:05:43.5712373Z 2025-09-07T08:05:43.5712469Z Login Succeeded 2025-09-07T08:05:44.0943585Z Command completed after 1 attempt(s). 2025-09-07T08:05:44.1166817Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-09-07T08:05:44.1167261Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-09-07T08:05:44.1167638Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-09-07T08:05:44.1183537Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:05:44.1183999Z env: 2025-09-07T08:05:44.1184167Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:05:44.1184362Z ##[endgroup] 2025-09-07T08:05:44.2072317Z ##[group]Run set +e 2025-09-07T08:05:44.2072530Z set +e 2025-09-07T08:05:44.2072699Z set -x 2025-09-07T08:05:44.2072859Z  2025-09-07T08:05:44.2073030Z PT_DOMAIN=download.pytorch.org 2025-09-07T08:05:44.2073465Z # TODO: Flaky access to download.pytorch.org https://github.com/pytorch/pytorch/issues/100400, 2025-09-07T08:05:44.2074202Z # cleaning this up once the issue is fixed. There are more than one resolved IP here, the last 2025-09-07T08:05:44.2074601Z # one is returned at random 2025-09-07T08:05:44.2074919Z RESOLVED_IP=$(dig -4 +short "${PT_DOMAIN}" | tail -n1) 2025-09-07T08:05:44.2075233Z  2025-09-07T08:05:44.2075418Z if [ -z "${RESOLVED_IP}" ]; then 2025-09-07T08:05:44.2075760Z  echo "Couldn't resolve ${PT_DOMAIN}, retrying with Google DNS..." 2025-09-07T08:05:44.2076175Z  RESOLVED_IP=$(dig -4 +short "${PT_DOMAIN}" @8.8.8.8 | tail -n1) 2025-09-07T08:05:44.2076488Z  2025-09-07T08:05:44.2076672Z  if [ -z "${RESOLVED_IP}" ]; then 2025-09-07T08:05:44.2077044Z  echo "Couldn't resolve ${PT_DOMAIN}, exiting..." 2025-09-07T08:05:44.2077338Z  exit 1 2025-09-07T08:05:44.2077523Z  fi 2025-09-07T08:05:44.2077688Z fi 2025-09-07T08:05:44.2077851Z  2025-09-07T08:05:44.2078050Z if grep -r "${PT_DOMAIN}" /etc/hosts; then 2025-09-07T08:05:44.2078354Z  # Clean up any old records first 2025-09-07T08:05:44.2078638Z  sudo sed -i "/${PT_DOMAIN}/d" /etc/hosts 2025-09-07T08:05:44.2078900Z fi 2025-09-07T08:05:44.2079064Z  2025-09-07T08:05:44.2079534Z echo "${RESOLVED_IP} ${PT_DOMAIN}" | sudo tee -a /etc/hosts 2025-09-07T08:05:44.2079851Z cat /etc/hosts 2025-09-07T08:05:44.2094525Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:05:44.2094834Z env: 2025-09-07T08:05:44.2095000Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:05:44.2095206Z ##[endgroup] 2025-09-07T08:05:44.2535975Z + PT_DOMAIN=download.pytorch.org 2025-09-07T08:05:44.2542857Z ++ dig -4 +short download.pytorch.org 2025-09-07T08:05:44.2543635Z ++ tail -n1 2025-09-07T08:05:44.2908791Z + RESOLVED_IP=3.170.131.102 2025-09-07T08:05:44.2909084Z + '[' -z 3.170.131.102 ']' 2025-09-07T08:05:44.2909362Z + grep -r download.pytorch.org /etc/hosts 2025-09-07T08:05:44.2925114Z + echo '3.170.131.102 download.pytorch.org' 2025-09-07T08:05:44.2926474Z + sudo tee -a /etc/hosts 2025-09-07T08:05:44.2992166Z 3.170.131.102 download.pytorch.org 2025-09-07T08:05:44.3001062Z + cat /etc/hosts 2025-09-07T08:05:44.3009910Z 127.0.0.1 localhost 2025-09-07T08:05:44.3014661Z ::1 localhost ip6-localhost ip6-loopback 2025-09-07T08:05:44.3014972Z fe00:: ip6-localnet 2025-09-07T08:05:44.3015186Z ff00:: ip6-mcastprefix 2025-09-07T08:05:44.3015387Z ff02::1 ip6-allnodes 2025-09-07T08:05:44.3015596Z ff02::2 ip6-allrouters 2025-09-07T08:05:44.3015789Z 172.17.0.2 784802b6db88 2025-09-07T08:05:44.3015987Z 3.170.131.102 download.pytorch.org 2025-09-07T08:05:44.3425442Z ##[group]Run set +x 2025-09-07T08:05:44.3425931Z set +x 2025-09-07T08:05:44.3426345Z  2025-09-07T08:05:44.3426722Z max_attempts=30 2025-09-07T08:05:44.3427164Z delay=10 2025-09-07T08:05:44.3427561Z attempt=1 2025-09-07T08:05:44.3427955Z  2025-09-07T08:05:44.3428389Z for attempt in $(seq 1 $max_attempts); do 2025-09-07T08:05:44.3429375Z  echo "Attempt $attempt of $max_attempts: Checking if Docker daemon is running..." 2025-09-07T08:05:44.3430015Z  if docker info > /dev/null 2>&1; then 2025-09-07T08:05:44.3430398Z  echo "Docker is running. Proceeding with the next steps" 2025-09-07T08:05:44.3430711Z  exit 0 2025-09-07T08:05:44.3430880Z  else 2025-09-07T08:05:44.3431072Z  echo "Docker is not running yet." 2025-09-07T08:05:44.3431340Z  echo "Retrying in $delay seconds..." 2025-09-07T08:05:44.3431591Z  sleep $delay 2025-09-07T08:05:44.3431778Z  fi 2025-09-07T08:05:44.3431939Z done 2025-09-07T08:05:44.3432196Z echo "Reached maximum attempts to connect to Docker. Exiting." 2025-09-07T08:05:44.3432508Z exit 1 2025-09-07T08:05:44.3447590Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:05:44.3447885Z env: 2025-09-07T08:05:44.3448046Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:05:44.3448240Z ##[endgroup] 2025-09-07T08:05:44.3642376Z Attempt 1 of 30: Checking if Docker daemon is running... 2025-09-07T08:05:44.4103069Z Docker is running. Proceeding with the next steps 2025-09-07T08:05:44.4537550Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main 2025-09-07T08:05:44.4537951Z with: 2025-09-07T08:05:44.4538691Z docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:05:44.4539505Z use-custom-docker-registry: true 2025-09-07T08:05:44.4539749Z docker-build-dir: .ci/docker 2025-09-07T08:05:44.4539986Z docker-build-script: ./build.sh 2025-09-07T08:05:44.4540233Z working-directory: . 2025-09-07T08:05:44.4540510Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T08:05:44.4540825Z force-push: false 2025-09-07T08:05:44.4541011Z env: 2025-09-07T08:05:44.4541179Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:05:44.4541387Z ##[endgroup] 2025-09-07T08:05:44.5417602Z ##[group]Run set -ex 2025-09-07T08:05:44.5417854Z set -ex 2025-09-07T08:05:44.5418028Z  2025-09-07T08:05:44.5418587Z # If the docker build directory or the build script doesn't exist, the action will 2025-09-07T08:05:44.5419090Z # gracefully return the docker image name as it is. Pulling docker image in Linux 2025-09-07T08:05:44.5419509Z # job could then download the pre-built image as usual 2025-09-07T08:05:44.5420033Z if [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then 2025-09-07T08:05:44.5420527Z  echo "skip=false" >> "${GITHUB_OUTPUT}" 2025-09-07T08:05:44.5420777Z else 2025-09-07T08:05:44.5420971Z  echo "skip=true" >> "${GITHUB_OUTPUT}" 2025-09-07T08:05:44.5421302Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-09-07T08:05:44.5421609Z  2025-09-07T08:05:44.5422030Z  echo "Not using custom ECR registry. Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..." 2025-09-07T08:05:44.5422519Z  exit 0 2025-09-07T08:05:44.5422698Z fi 2025-09-07T08:05:44.5422859Z  2025-09-07T08:05:44.5423117Z if [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then 2025-09-07T08:05:44.5423573Z  # The docker image name already includes the ECR prefix and tag, so we can just 2025-09-07T08:05:44.5424176Z  # use it as it is, but first let's extract the tag 2025-09-07T08:05:44.5424536Z  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}') 2025-09-07T08:05:44.5424918Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-09-07T08:05:44.5425280Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-09-07T08:05:44.5425584Z else 2025-09-07T08:05:44.5425778Z  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then 2025-09-07T08:05:44.5426048Z  CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:} 2025-09-07T08:05:44.5426335Z  DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*} 2025-09-07T08:05:44.5426571Z  fi 2025-09-07T08:05:44.5426889Z  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}") 2025-09-07T08:05:44.5427324Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-09-07T08:05:44.5427782Z  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-09-07T08:05:44.5428286Z  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}" 2025-09-07T08:05:44.5428587Z fi 2025-09-07T08:05:44.5444891Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:05:44.5445185Z env: 2025-09-07T08:05:44.5445340Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:05:44.5445540Z REPO_NAME: pytorch 2025-09-07T08:05:44.5446425Z DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:05:44.5447143Z DOCKER_BUILD_DIR: .ci/docker 2025-09-07T08:05:44.5447352Z DOCKER_BUILD_SCRIPT: ./build.sh 2025-09-07T08:05:44.5447635Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T08:05:44.5447928Z USE_CUSTOM_DOCKER_REGISTRY: true 2025-09-07T08:05:44.5448152Z CUSTOM_TAG_PREFIX: 2025-09-07T08:05:44.5448329Z ##[endgroup] 2025-09-07T08:05:44.5885734Z + [[ -d .ci/docker ]] 2025-09-07T08:05:44.5885972Z + [[ -f .ci/docker/./build.sh ]] 2025-09-07T08:05:44.5886210Z + [[ true == \t\r\u\e ]] 2025-09-07T08:05:44.5886419Z + echo skip=false 2025-09-07T08:05:44.5887383Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]] 2025-09-07T08:05:44.5894738Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:05:44.5896079Z ++ awk -F '[:,]' '{print $2}' 2025-09-07T08:05:44.5909376Z + DOCKER_TAG=pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:05:44.5910255Z + echo docker-tag=pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:05:44.5911319Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:05:44.6300299Z ##[group]Run set +e 2025-09-07T08:05:44.6300543Z set +e 2025-09-07T08:05:44.6300723Z set -x 2025-09-07T08:05:44.6300890Z  2025-09-07T08:05:44.6301045Z login() { 2025-09-07T08:05:44.6301431Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-09-07T08:05:44.6301893Z } 2025-09-07T08:05:44.6302050Z  2025-09-07T08:05:44.6302201Z retry () { 2025-09-07T08:05:44.6302402Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-09-07T08:05:44.6302643Z } 2025-09-07T08:05:44.6302794Z  2025-09-07T08:05:44.6302962Z retry login "${DOCKER_REGISTRY}" 2025-09-07T08:05:44.6303192Z  2025-09-07T08:05:44.6303347Z START_TIME=$(date +%s) 2025-09-07T08:05:44.6303566Z # Wait up to 120 minutes 2025-09-07T08:05:44.6304022Z while [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do 2025-09-07T08:05:44.6304396Z  # Check if image already exists, if it does then skip building it 2025-09-07T08:05:44.6304774Z  if docker manifest inspect "${DOCKER_IMAGE}"; then 2025-09-07T08:05:44.6305054Z  exit 0 2025-09-07T08:05:44.6305225Z  fi 2025-09-07T08:05:44.6305396Z  2025-09-07T08:05:44.6305677Z  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can 2025-09-07T08:05:44.6306160Z  # use this to differentiate between the Docker build and regular build jobs. For the 2025-09-07T08:05:44.6306632Z  # latter, it will wait for the Docker images to become available before continuing 2025-09-07T08:05:44.6307001Z  if [ "${DOCKER_PUSH:-false}" == "true" ]; then 2025-09-07T08:05:44.6307294Z  # It's a Docker build job, let's build the image 2025-09-07T08:05:44.6307540Z  break 2025-09-07T08:05:44.6307720Z  else 2025-09-07T08:05:44.6307967Z  # It's a regular build job, wait for the image to become available 2025-09-07T08:05:44.6308261Z  sleep 300 2025-09-07T08:05:44.6308432Z  fi 2025-09-07T08:05:44.6308589Z done 2025-09-07T08:05:44.6308750Z  2025-09-07T08:05:44.6309210Z # NB: This part requires a full checkout. Otherwise, the merge base will 2025-09-07T08:05:44.6309629Z # be empty. The default action would be to continue rebuild the image 2025-09-07T08:05:44.6309989Z if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then 2025-09-07T08:05:44.6310314Z  # if we're on the base branch then use the parent commit 2025-09-07T08:05:44.6310601Z  MERGE_BASE=$(git rev-parse HEAD~) 2025-09-07T08:05:44.6310831Z else 2025-09-07T08:05:44.6311055Z  # otherwise we're on a PR, so use the most recent base commit 2025-09-07T08:05:44.6311392Z  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") 2025-09-07T08:05:44.6311646Z fi 2025-09-07T08:05:44.6311801Z  2025-09-07T08:05:44.6311961Z if [[ -z "${MERGE_BASE}" ]]; then 2025-09-07T08:05:44.6312220Z  echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-09-07T08:05:44.6312452Z  2025-09-07T08:05:44.6312783Z  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..." 2025-09-07T08:05:44.6313350Z  exit 0 2025-09-07T08:05:44.6313502Z fi 2025-09-07T08:05:44.6313644Z  2025-09-07T08:05:44.6314051Z if ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then 2025-09-07T08:05:44.6314536Z  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit" 2025-09-07T08:05:44.6314941Z  exit 1 2025-09-07T08:05:44.6315093Z fi 2025-09-07T08:05:44.6315246Z  2025-09-07T08:05:44.6315502Z PREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}") 2025-09-07T08:05:44.6315955Z # If no image exists but the hash is the same as the previous hash then we should error out here 2025-09-07T08:05:44.6316372Z if [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then 2025-09-07T08:05:44.6316849Z  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch" 2025-09-07T08:05:44.6317463Z  echo " Will re-build docker image to store in local cache, TTS may be longer" 2025-09-07T08:05:44.6317792Z fi 2025-09-07T08:05:44.6317931Z  2025-09-07T08:05:44.6318131Z echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-09-07T08:05:44.6333202Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:05:44.6333500Z env: 2025-09-07T08:05:44.6333665Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:05:44.6334038Z DOCKER_BUILD_DIR: .ci/docker 2025-09-07T08:05:44.6334290Z BASE_REVISION: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T08:05:44.6335031Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:05:44.6335989Z DOCKER_TAG: pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:05:44.6336553Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T08:05:44.6336826Z DOCKER_PUSH: 2025-09-07T08:05:44.6337007Z ##[endgroup] 2025-09-07T08:05:44.6784264Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T08:05:44.6784619Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T08:05:44.6788471Z + aws ecr get-login-password --region us-east-1 2025-09-07T08:05:44.6789272Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T08:05:45.4597254Z 2025-09-07T08:05:45.4597568Z WARNING! Your credentials are stored unencrypted in '/home/charlie/.docker/config.json'. 2025-09-07T08:05:45.4598066Z Configure a credential helper to remove this warning. See 2025-09-07T08:05:45.4598417Z https://docs.docker.com/go/credential-store/ 2025-09-07T08:05:45.4598601Z 2025-09-07T08:05:45.4598673Z Login Succeeded 2025-09-07T08:05:45.4621043Z ++ date +%s 2025-09-07T08:05:45.4632905Z + START_TIME=1757232345 2025-09-07T08:05:45.4637427Z ++ date +%s 2025-09-07T08:05:45.4650353Z + [[ 1757225145 -lt 1757232345 ]] 2025-09-07T08:05:45.4651199Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:05:45.8551969Z { 2025-09-07T08:05:45.8552338Z "schemaVersion": 2, 2025-09-07T08:05:45.8552951Z "mediaType": "application/vnd.docker.distribution.manifest.v2+json", 2025-09-07T08:05:45.8553576Z "config": { 2025-09-07T08:05:45.8554376Z "mediaType": "application/vnd.docker.container.image.v1+json", 2025-09-07T08:05:45.8554942Z "size": 31375, 2025-09-07T08:05:45.8555553Z "digest": "sha256:29d1d8a31b215537637bab7c99e18c255840b899cf7023e4e3cb5efa3270aef8" 2025-09-07T08:05:45.8556030Z }, 2025-09-07T08:05:45.8556189Z "layers": [ 2025-09-07T08:05:45.8556373Z { 2025-09-07T08:05:45.8556686Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8557358Z "size": 30448359, 2025-09-07T08:05:45.8557678Z "digest": "sha256:e6fdc8487bfe6d764301ef3634bc6c043841dc3ab05ca14f81e69c0f92562d46" 2025-09-07T08:05:45.8558039Z }, 2025-09-07T08:05:45.8558199Z { 2025-09-07T08:05:45.8558445Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8558777Z "size": 1554, 2025-09-07T08:05:45.8559099Z "digest": "sha256:171dcef20c49de4bc9268f60e02f111b72c638b0f24c3c5636c5013029db6d30" 2025-09-07T08:05:45.8559444Z }, 2025-09-07T08:05:45.8559585Z { 2025-09-07T08:05:45.8559823Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8560131Z "size": 313297922, 2025-09-07T08:05:45.8560450Z "digest": "sha256:4c92b3f72f1df31fe9f487fc1c27fcf1ba475ffb43abd69056306d1247786e40" 2025-09-07T08:05:45.8560808Z }, 2025-09-07T08:05:45.8560943Z { 2025-09-07T08:05:45.8561195Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8561502Z "size": 792, 2025-09-07T08:05:45.8561801Z "digest": "sha256:744f9ba90a6582eb601b3c20409bb10d6dad635dd118c3975f79721f4c82747c" 2025-09-07T08:05:45.8562137Z }, 2025-09-07T08:05:45.8562276Z { 2025-09-07T08:05:45.8562519Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8562821Z "size": 106, 2025-09-07T08:05:45.8563115Z "digest": "sha256:d3c08322a3326e45849dd80264a047c4f42ba4a2419d35c919542e2890e23934" 2025-09-07T08:05:45.8563456Z }, 2025-09-07T08:05:45.8563597Z { 2025-09-07T08:05:45.8563993Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8564290Z "size": 704, 2025-09-07T08:05:45.8564600Z "digest": "sha256:ffd43b71f3ccf3ba563606231cb1d191eb9dd0052f422d54835e6af350525170" 2025-09-07T08:05:45.8564948Z }, 2025-09-07T08:05:45.8565088Z { 2025-09-07T08:05:45.8565319Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8565626Z "size": 1215, 2025-09-07T08:05:45.8565932Z "digest": "sha256:830692b57f6e2758398ec80c3b67a20441d12696b54ed14f2ecebf926198f7d6" 2025-09-07T08:05:45.8566257Z }, 2025-09-07T08:05:45.8566382Z { 2025-09-07T08:05:45.8566603Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8566897Z "size": 482, 2025-09-07T08:05:45.8567175Z "digest": "sha256:5bad36d184686719399be50830a98939d7dbda2313fb407df5915217483fc6a3" 2025-09-07T08:05:45.8567484Z }, 2025-09-07T08:05:45.8567620Z { 2025-09-07T08:05:45.8567840Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8568123Z "size": 110343614, 2025-09-07T08:05:45.8568419Z "digest": "sha256:0e34fdd9ac5c39eb0a9d2c2d258b26f42bb79d7dc0a22014bf201daa2e033eb4" 2025-09-07T08:05:45.8568751Z }, 2025-09-07T08:05:45.8568887Z { 2025-09-07T08:05:45.8569112Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8569392Z "size": 4786, 2025-09-07T08:05:45.8569868Z "digest": "sha256:3c868a62868ef54f82ac11be8dabe1b4365d000bacfe4c104e08022fc96dd767" 2025-09-07T08:05:45.8570220Z }, 2025-09-07T08:05:45.8570365Z { 2025-09-07T08:05:45.8570579Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8570861Z "size": 1710, 2025-09-07T08:05:45.8571146Z "digest": "sha256:62170a22dd571d55ffccac64c0be17f4006d2498cfbf7c6289325f0899cba005" 2025-09-07T08:05:45.8571464Z }, 2025-09-07T08:05:45.8571593Z { 2025-09-07T08:05:45.8571812Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8572096Z "size": 724, 2025-09-07T08:05:45.8572382Z "digest": "sha256:553c1d23b6c4dbd8ab136d0c3659460391ffa14cb9b43be9d7b2f47f90895697" 2025-09-07T08:05:45.8572697Z }, 2025-09-07T08:05:45.8572837Z { 2025-09-07T08:05:45.8573057Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8573341Z "size": 543, 2025-09-07T08:05:45.8573611Z "digest": "sha256:9408d557a804a7dce00897e03ce9f4f447281eb38ce4bc331098a1f1a5ff0d30" 2025-09-07T08:05:45.8574270Z }, 2025-09-07T08:05:45.8574409Z { 2025-09-07T08:05:45.8574644Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8574931Z "size": 3241148049, 2025-09-07T08:05:45.8575438Z "digest": "sha256:df607cfc7c07db6d442e0274e2be8cdc507df8716717363aa92f2fea069bdd9a" 2025-09-07T08:05:45.8575778Z }, 2025-09-07T08:05:45.8575917Z { 2025-09-07T08:05:45.8576135Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8576419Z "size": 32, 2025-09-07T08:05:45.8576706Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T08:05:45.8577030Z }, 2025-09-07T08:05:45.8577157Z { 2025-09-07T08:05:45.8577381Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8577664Z "size": 380, 2025-09-07T08:05:45.8577952Z "digest": "sha256:40a8e39faeda9f5273ff5014b2ef7d1ffeeef321de234186a705b1e0574326d2" 2025-09-07T08:05:45.8578274Z }, 2025-09-07T08:05:45.8578405Z { 2025-09-07T08:05:45.8578638Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8578917Z "size": 53548049, 2025-09-07T08:05:45.8579198Z "digest": "sha256:d895771c9faca390d7270f8c9c832b1428128c31ba6760b837d64b7e5920373f" 2025-09-07T08:05:45.8579527Z }, 2025-09-07T08:05:45.8579663Z { 2025-09-07T08:05:45.8579884Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8580163Z "size": 232, 2025-09-07T08:05:45.8580443Z "digest": "sha256:c4ee04f39d49efb46e52443e60c7f41832ea708d9bc5bf76c6d740895c66f57a" 2025-09-07T08:05:45.8580762Z }, 2025-09-07T08:05:45.8580897Z { 2025-09-07T08:05:45.8581112Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8581401Z "size": 3403403, 2025-09-07T08:05:45.8581686Z "digest": "sha256:3690c9826e48ed74e21e494d9d78990902abbc68795d002260ce71bff9a2cb3b" 2025-09-07T08:05:45.8582009Z }, 2025-09-07T08:05:45.8582152Z { 2025-09-07T08:05:45.8582371Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8582652Z "size": 1478, 2025-09-07T08:05:45.8582935Z "digest": "sha256:57cbc5013733eedfdf176b6db4b44458e826e1f64c0ef38849e9d77addc88936" 2025-09-07T08:05:45.8583250Z }, 2025-09-07T08:05:45.8583383Z { 2025-09-07T08:05:45.8583601Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8584043Z "size": 482, 2025-09-07T08:05:45.8584322Z "digest": "sha256:f5f4b06b58bbe4201d8b2eb5b0c6c1299f2725dd59e71cc45ef76ad89bba4deb" 2025-09-07T08:05:45.8584643Z }, 2025-09-07T08:05:45.8584772Z { 2025-09-07T08:05:45.8584990Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8585275Z "size": 197, 2025-09-07T08:05:45.8585555Z "digest": "sha256:f59713ce4bf491fe1f663d90e3b32d2290a7d8a4a0e8e13301e3bdb10b949f8e" 2025-09-07T08:05:45.8585874Z }, 2025-09-07T08:05:45.8586008Z { 2025-09-07T08:05:45.8586463Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8586756Z "size": 608, 2025-09-07T08:05:45.8587037Z "digest": "sha256:fe0486521517e626cae4fcbd9c83eb3956aad3ab0f833becee187b830891417b" 2025-09-07T08:05:45.8587359Z }, 2025-09-07T08:05:45.8587485Z { 2025-09-07T08:05:45.8587717Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8588007Z "size": 7874747615, 2025-09-07T08:05:45.8588298Z "digest": "sha256:8c21cc3715a2d715295f0299d8d2443262a3ae8defc1921f3226a0a24fc9c8fe" 2025-09-07T08:05:45.8588676Z + exit 0 2025-09-07T08:05:45.8588809Z }, 2025-09-07T08:05:45.8588942Z { 2025-09-07T08:05:45.8589162Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8589449Z "size": 829, 2025-09-07T08:05:45.8589728Z "digest": "sha256:d37c58456a6a4aa45d78abdb95553b3de0c79d941e18dc757c2c39fd59819739" 2025-09-07T08:05:45.8590061Z }, 2025-09-07T08:05:45.8590196Z { 2025-09-07T08:05:45.8590579Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8590856Z "size": 36688200, 2025-09-07T08:05:45.8591153Z "digest": "sha256:d042f63abc13891184a9d8e0dcdfae9a0daa140dea919fd319f12dcab5c684eb" 2025-09-07T08:05:45.8591481Z }, 2025-09-07T08:05:45.8591614Z { 2025-09-07T08:05:45.8591828Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8592111Z "size": 104, 2025-09-07T08:05:45.8592385Z "digest": "sha256:621284a9c05a47131a59226f6847b5b76ad211908278c1bdb990029d42259941" 2025-09-07T08:05:45.8592702Z }, 2025-09-07T08:05:45.8592831Z { 2025-09-07T08:05:45.8593069Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8593360Z "size": 1496, 2025-09-07T08:05:45.8593656Z "digest": "sha256:85f605d2dd3a8378567d3d974f0ec4694ef5fd988b25aca5d9aebd7c9b9ff018" 2025-09-07T08:05:45.8594126Z }, 2025-09-07T08:05:45.8594267Z { 2025-09-07T08:05:45.8594493Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8594777Z "size": 454406172, 2025-09-07T08:05:45.8595059Z "digest": "sha256:381b5539e5981dc994e71ab212f50135c32128fe1cc35d78bc386da6dffe1d51" 2025-09-07T08:05:45.8595391Z }, 2025-09-07T08:05:45.8595523Z { 2025-09-07T08:05:45.8595740Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8596025Z "size": 162, 2025-09-07T08:05:45.8596299Z "digest": "sha256:a487c0c800295407a4c7ab88c5b9e891b8b6aab9e35e62994d124369fcd7ba87" 2025-09-07T08:05:45.8596614Z }, 2025-09-07T08:05:45.8596745Z { 2025-09-07T08:05:45.8597047Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8597332Z "size": 346, 2025-09-07T08:05:45.8597604Z "digest": "sha256:48bcb81e256634f4132369d8bac738d9d622b010e5802e5292f565edba9035df" 2025-09-07T08:05:45.8597920Z }, 2025-09-07T08:05:45.8598045Z { 2025-09-07T08:05:45.8598265Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8598551Z "size": 32, 2025-09-07T08:05:45.8598832Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T08:05:45.8599162Z }, 2025-09-07T08:05:45.8599302Z { 2025-09-07T08:05:45.8599526Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8599808Z "size": 106, 2025-09-07T08:05:45.8600080Z "digest": "sha256:e261928c0043c734790a38fa9ebf1bf8674801fa2f5051c3d2eac04e0f02b743" 2025-09-07T08:05:45.8600406Z }, 2025-09-07T08:05:45.8600545Z { 2025-09-07T08:05:45.8600763Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8601045Z "size": 425, 2025-09-07T08:05:45.8601331Z "digest": "sha256:0fea55428091bc98d5c48986120dd1da50b9b6cbd507408b2cdebdbe455e272e" 2025-09-07T08:05:45.8601657Z }, 2025-09-07T08:05:45.8601795Z { 2025-09-07T08:05:45.8602006Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8602471Z "size": 20224775, 2025-09-07T08:05:45.8602785Z "digest": "sha256:b4291bccbb8428a38187cd286fef7c24bd4863c7872c4d1cf96404ec1a69b321" 2025-09-07T08:05:45.8603118Z }, 2025-09-07T08:05:45.8603245Z { 2025-09-07T08:05:45.8603465Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8603887Z "size": 108, 2025-09-07T08:05:45.8604183Z "digest": "sha256:ddc91b09189afc218499daee92ebc22c6deefb22ee115c52c07627ecbaf7b9d5" 2025-09-07T08:05:45.8604504Z }, 2025-09-07T08:05:45.8604639Z { 2025-09-07T08:05:45.8604869Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8605153Z "size": 640, 2025-09-07T08:05:45.8605422Z "digest": "sha256:7540c74286279d1d6a29cdb51d3421e64860c6af74ca4a95736725c0509791ed" 2025-09-07T08:05:45.8605737Z }, 2025-09-07T08:05:45.8605872Z { 2025-09-07T08:05:45.8606094Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8606373Z "size": 724, 2025-09-07T08:05:45.8606658Z "digest": "sha256:553c1d23b6c4dbd8ab136d0c3659460391ffa14cb9b43be9d7b2f47f90895697" 2025-09-07T08:05:45.8607147Z }, 2025-09-07T08:05:45.8607279Z { 2025-09-07T08:05:45.8607491Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8607769Z "size": 149, 2025-09-07T08:05:45.8608035Z "digest": "sha256:003c4e2598fb39f97ec7734271e034a48a3956a58429c9d06601770c2c40de11" 2025-09-07T08:05:45.8608347Z }, 2025-09-07T08:05:45.8608470Z { 2025-09-07T08:05:45.8608688Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8608967Z "size": 135, 2025-09-07T08:05:45.8609243Z "digest": "sha256:5687149362ae68fa2aa7d4ecd39fbf7ea86c0f6ced36a71f3c59f68f6c465cfc" 2025-09-07T08:05:45.8609557Z }, 2025-09-07T08:05:45.8609691Z { 2025-09-07T08:05:45.8609908Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8610202Z "size": 141, 2025-09-07T08:05:45.8610485Z "digest": "sha256:cdd2cf54eb2a3d8d034aa1556c9724d240b06397ba08f8b13b0bed6d65755aeb" 2025-09-07T08:05:45.8610817Z }, 2025-09-07T08:05:45.8610948Z { 2025-09-07T08:05:45.8611170Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8611449Z "size": 18615922074, 2025-09-07T08:05:45.8611754Z "digest": "sha256:d3ad4df1ba3a86ef1f84c427aae440ff027d483949d48eec4be6135260668cad" 2025-09-07T08:05:45.8612092Z }, 2025-09-07T08:05:45.8612228Z { 2025-09-07T08:05:45.8612442Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8612723Z "size": 223, 2025-09-07T08:05:45.8613002Z "digest": "sha256:3c9055753b4c79d74c707a91d8626ce10bc439129ba10dad3ebc643d9d4955dd" 2025-09-07T08:05:45.8613324Z }, 2025-09-07T08:05:45.8613449Z { 2025-09-07T08:05:45.8613668Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8614104Z "size": 353035275, 2025-09-07T08:05:45.8614396Z "digest": "sha256:31cf8d0bd21c76ae21f73d8b19b30949d161a498354f54191b4e5a294e929701" 2025-09-07T08:05:45.8614713Z }, 2025-09-07T08:05:45.8614847Z { 2025-09-07T08:05:45.8615069Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8615361Z "size": 6523020957, 2025-09-07T08:05:45.8615650Z "digest": "sha256:6623ea81497183b62e034e4ea8df8bf00fa75aaa192eea2821b2dd8655383b8f" 2025-09-07T08:05:45.8615984Z }, 2025-09-07T08:05:45.8616119Z { 2025-09-07T08:05:45.8616342Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8616619Z "size": 129, 2025-09-07T08:05:45.8616894Z "digest": "sha256:11696c3aa3808236d49256bc170b49d55cf657e499592b39b4856f6137220f55" 2025-09-07T08:05:45.8617213Z }, 2025-09-07T08:05:45.8617346Z { 2025-09-07T08:05:45.8617561Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8617843Z "size": 778, 2025-09-07T08:05:45.8618138Z "digest": "sha256:ef4d544e35cacc73a229bcbc7a5510f8b156c7b3041f19f3a274562cd97cfd94" 2025-09-07T08:05:45.8618602Z }, 2025-09-07T08:05:45.8618751Z { 2025-09-07T08:05:45.8618974Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8619253Z "size": 724, 2025-09-07T08:05:45.8619530Z "digest": "sha256:553c1d23b6c4dbd8ab136d0c3659460391ffa14cb9b43be9d7b2f47f90895697" 2025-09-07T08:05:45.8619843Z }, 2025-09-07T08:05:45.8619975Z { 2025-09-07T08:05:45.8620191Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8620470Z "size": 141, 2025-09-07T08:05:45.8620732Z "digest": "sha256:5c5108865e5e293209ae9bae8a29645035242e7e4b4433208a777496fddc988c" 2025-09-07T08:05:45.8621043Z }, 2025-09-07T08:05:45.8621174Z { 2025-09-07T08:05:45.8621391Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8621667Z "size": 32, 2025-09-07T08:05:45.8621949Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T08:05:45.8622275Z }, 2025-09-07T08:05:45.8622421Z { 2025-09-07T08:05:45.8622782Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8623079Z "size": 159, 2025-09-07T08:05:45.8623354Z "digest": "sha256:9e97578e9edf1a11187740a5aa102633331fb6a714d0ed48683782de5a36fbd8" 2025-09-07T08:05:45.8623673Z }, 2025-09-07T08:05:45.8623926Z { 2025-09-07T08:05:45.8624146Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8624426Z "size": 1012, 2025-09-07T08:05:45.8624709Z "digest": "sha256:da5a91b54cb51f851560992645bc203f2287d9b1d7a4f04f7f4ea7efe45036ce" 2025-09-07T08:05:45.8625027Z }, 2025-09-07T08:05:45.8625160Z { 2025-09-07T08:05:45.8625389Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8625671Z "size": 724, 2025-09-07T08:05:45.8625950Z "digest": "sha256:553c1d23b6c4dbd8ab136d0c3659460391ffa14cb9b43be9d7b2f47f90895697" 2025-09-07T08:05:45.8626287Z }, 2025-09-07T08:05:45.8626434Z { 2025-09-07T08:05:45.8626668Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8626952Z "size": 135, 2025-09-07T08:05:45.8627243Z "digest": "sha256:1e93be219e89e7733b91ba7e3af1a44d985e84959f732ecd5f5ca61bd13b5d41" 2025-09-07T08:05:45.8627568Z }, 2025-09-07T08:05:45.8627699Z { 2025-09-07T08:05:45.8627917Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8628215Z "size": 32, 2025-09-07T08:05:45.8628498Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T08:05:45.8628823Z }, 2025-09-07T08:05:45.8628951Z { 2025-09-07T08:05:45.8629177Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8629459Z "size": 158, 2025-09-07T08:05:45.8629735Z "digest": "sha256:136825afebb533ee295f0d2523595281086c6410c60d5f712b84cefd24cb31d5" 2025-09-07T08:05:45.8630048Z }, 2025-09-07T08:05:45.8630172Z { 2025-09-07T08:05:45.8630392Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8630677Z "size": 1368, 2025-09-07T08:05:45.8630957Z "digest": "sha256:22b39805302d877e4c1ba433ebc36520438ea29a9ba8bc059efbcd9106f3a82d" 2025-09-07T08:05:45.8631269Z }, 2025-09-07T08:05:45.8631401Z { 2025-09-07T08:05:45.8631621Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8631903Z "size": 32, 2025-09-07T08:05:45.8632181Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T08:05:45.8632501Z }, 2025-09-07T08:05:45.8632633Z { 2025-09-07T08:05:45.8632849Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8633134Z "size": 136, 2025-09-07T08:05:45.8633413Z "digest": "sha256:d12add675e3505e74eb9880eeef540ea0801282ca1ae01c3c221157cec91f5ae" 2025-09-07T08:05:45.8633878Z }, 2025-09-07T08:05:45.8634028Z { 2025-09-07T08:05:45.8634250Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8634536Z "size": 380, 2025-09-07T08:05:45.8634971Z "digest": "sha256:bc127046d33a7a98563698411b54ece8a167d520922879d7b69e8ca73a12d034" 2025-09-07T08:05:45.8635314Z }, 2025-09-07T08:05:45.8635444Z { 2025-09-07T08:05:45.8635671Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8635952Z "size": 32, 2025-09-07T08:05:45.8636232Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T08:05:45.8636548Z }, 2025-09-07T08:05:45.8636681Z { 2025-09-07T08:05:45.8636899Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8637273Z "size": 104, 2025-09-07T08:05:45.8637541Z "digest": "sha256:951e8ce838415c4257680a9d60d216f3750cbb18d243d9a21e2008cce7e589cf" 2025-09-07T08:05:45.8637855Z }, 2025-09-07T08:05:45.8637987Z { 2025-09-07T08:05:45.8638204Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8638481Z "size": 408, 2025-09-07T08:05:45.8638768Z "digest": "sha256:32340b97ae50ba7b2918ab40d6f4a8db875afee69318f484e4deb0a1e2ec4beb" 2025-09-07T08:05:45.8639239Z }, 2025-09-07T08:05:45.8639372Z { 2025-09-07T08:05:45.8639596Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8639883Z "size": 32, 2025-09-07T08:05:45.8640164Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T08:05:45.8640484Z }, 2025-09-07T08:05:45.8640612Z { 2025-09-07T08:05:45.8640833Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8641118Z "size": 109, 2025-09-07T08:05:45.8641405Z "digest": "sha256:5bbb04cd6b57ae13d7cf05ab9e9b4ed9752833ee2dba4eeaac47bde6022c4725" 2025-09-07T08:05:45.8641734Z }, 2025-09-07T08:05:45.8641868Z { 2025-09-07T08:05:45.8642093Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8642379Z "size": 1897, 2025-09-07T08:05:45.8642683Z "digest": "sha256:d8c4b845cfc7ca7cc0604f472bf6da8b1f1d4e98dff3c76e1985a7013a5b9e3f" 2025-09-07T08:05:45.8643016Z }, 2025-09-07T08:05:45.8643149Z { 2025-09-07T08:05:45.8643383Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8643668Z "size": 243440375, 2025-09-07T08:05:45.8644123Z "digest": "sha256:b35c180f4d8ddc2396eac4a6b893f438481a8163ceb0b88f203488bc5f2a8ba4" 2025-09-07T08:05:45.8644452Z }, 2025-09-07T08:05:45.8644587Z { 2025-09-07T08:05:45.8644820Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8645108Z "size": 106, 2025-09-07T08:05:45.8645391Z "digest": "sha256:5f967b3c303a99e609441551f7c8988cca4fd464c0c3127506bff8509583091b" 2025-09-07T08:05:45.8645709Z }, 2025-09-07T08:05:45.8645838Z { 2025-09-07T08:05:45.8646070Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8646358Z "size": 166, 2025-09-07T08:05:45.8646638Z "digest": "sha256:04770904f012e5584f1c19a0bc92d9863baaebf08bf75b4a9981f2b7795c8953" 2025-09-07T08:05:45.8646963Z }, 2025-09-07T08:05:45.8647100Z { 2025-09-07T08:05:45.8647339Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8647623Z "size": 7943, 2025-09-07T08:05:45.8647900Z "digest": "sha256:73373941fb321b4cb4a171b1423a68a4c7fedada3a1498868d7efe93cb03170e" 2025-09-07T08:05:45.8648218Z }, 2025-09-07T08:05:45.8648354Z { 2025-09-07T08:05:45.8648575Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8648849Z "size": 8072, 2025-09-07T08:05:45.8649134Z "digest": "sha256:9572e6cd907bfa4888456dbccc6e22146a0044374585f3fa0a8ced19b831ed62" 2025-09-07T08:05:45.8649452Z }, 2025-09-07T08:05:45.8649583Z { 2025-09-07T08:05:45.8649796Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8650076Z "size": 304, 2025-09-07T08:05:45.8650356Z "digest": "sha256:64a544aba233551e38898f138dd6ba3161ccdb9554e0ffb5b9d8f0f7fe4a7fa8" 2025-09-07T08:05:45.8650682Z }, 2025-09-07T08:05:45.8650989Z { 2025-09-07T08:05:45.8651225Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8651512Z "size": 13362696, 2025-09-07T08:05:45.8651797Z "digest": "sha256:7e35418a24997de5428763c93826679486760a1a9563209ae64de66ba45f99c1" 2025-09-07T08:05:45.8652101Z }, 2025-09-07T08:05:45.8652246Z { 2025-09-07T08:05:45.8652471Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8652757Z "size": 108, 2025-09-07T08:05:45.8653031Z "digest": "sha256:2ed8e82748d4a1131f41d9e41322f47a6ffef67a5a2b7bf5392237db5c035c61" 2025-09-07T08:05:45.8653355Z }, 2025-09-07T08:05:45.8653487Z { 2025-09-07T08:05:45.8653847Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8654132Z "size": 54145663, 2025-09-07T08:05:45.8654422Z "digest": "sha256:c988fbcccd708fb158a81c429d32e1060a7e40924fc3c987c629fa69d9484717" 2025-09-07T08:05:45.8654741Z }, 2025-09-07T08:05:45.8654874Z { 2025-09-07T08:05:45.8655091Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T08:05:45.8655531Z "size": 32, 2025-09-07T08:05:45.8655811Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T08:05:45.8656146Z } 2025-09-07T08:05:45.8656275Z ] 2025-09-07T08:05:45.8656407Z } 2025-09-07T08:05:45.8909873Z ##[group]Run set -eux 2025-09-07T08:05:45.8910091Z set -eux 2025-09-07T08:05:45.8910400Z # It's ok if this steps fails, it would then be an anonymous user like what we used to have 2025-09-07T08:05:45.8911228Z aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin || true 2025-09-07T08:05:45.8927417Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:05:45.8927695Z env: 2025-09-07T08:05:45.8927861Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:05:45.8928057Z ##[endgroup] 2025-09-07T08:05:45.9406389Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token 2025-09-07T08:05:45.9407138Z + jq --raw-output .SecretString 2025-09-07T08:05:45.9408877Z + jq -r .docker_hub_readonly_token 2025-09-07T08:05:45.9410318Z + docker login --username pytorchbot --password-stdin 2025-09-07T08:05:46.5220468Z 2025-09-07T08:05:46.5222126Z An error occurred (AccessDeniedException) when calling the GetSecretValue operation: User: arn:aws:sts::308535385114:assumed-role/gh-ci-github-action-runners-runner-role/i-05a095f6e498981b2 is not authorized to perform: secretsmanager:GetSecretValue on resource: docker_hub_readonly_token because no identity-based policy allows the secretsmanager:GetSecretValue action 2025-09-07T08:05:46.5935020Z Error: Cannot perform an interactive login from a non TTY device 2025-09-07T08:05:46.5955085Z + true 2025-09-07T08:05:46.6403499Z ##[group]Run tag=${ECR_DOCKER_IMAGE##*:} 2025-09-07T08:05:46.6404022Z tag=${ECR_DOCKER_IMAGE##*:} 2025-09-07T08:05:46.6404382Z echo "docker pull ghcr.io/pytorch/ci-image:${tag/:/-}" 2025-09-07T08:05:46.6419318Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:05:46.6419612Z env: 2025-09-07T08:05:46.6419772Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:05:46.6420455Z ECR_DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:05:46.6421134Z ##[endgroup] 2025-09-07T08:05:46.6847048Z docker pull ghcr.io/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:05:46.7286467Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main 2025-09-07T08:05:46.7286846Z with: 2025-09-07T08:05:46.7287577Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:05:46.7288470Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T08:05:46.7288786Z env: 2025-09-07T08:05:46.7288973Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:05:46.7289178Z ##[endgroup] 2025-09-07T08:05:46.7898146Z ##[group]Run set -x 2025-09-07T08:05:46.7898375Z set -x 2025-09-07T08:05:46.7898555Z set +e 2025-09-07T08:05:46.7898732Z  2025-09-07T08:05:46.7898903Z login() { 2025-09-07T08:05:46.7899295Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-09-07T08:05:46.7899708Z } 2025-09-07T08:05:46.7899874Z  2025-09-07T08:05:46.7900064Z retry () { 2025-09-07T08:05:46.7900274Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-09-07T08:05:46.7900518Z } 2025-09-07T08:05:46.7900683Z  2025-09-07T08:05:46.7900863Z retry login "${DOCKER_REGISTRY}" 2025-09-07T08:05:46.7901107Z  2025-09-07T08:05:46.7901715Z IMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024') 2025-09-07T08:05:46.7902259Z echo "Compressed size of image in MB: ${IMAGE_SIZE}" 2025-09-07T08:05:46.7902558Z  2025-09-07T08:05:46.7902725Z set -e 2025-09-07T08:05:46.7903005Z # ignore output since only exit code is used for conditional 2025-09-07T08:05:46.7903396Z # only pull docker image if it's not available locally 2025-09-07T08:05:46.7904010Z if ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then 2025-09-07T08:05:46.7904432Z  retry docker pull "${DOCKER_IMAGE}" 2025-09-07T08:05:46.7904687Z fi 2025-09-07T08:05:46.7921046Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:05:46.7921354Z env: 2025-09-07T08:05:46.7921524Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:05:46.7922215Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:05:46.7923018Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T08:05:46.7923305Z ##[endgroup] 2025-09-07T08:05:46.8091763Z + set +e 2025-09-07T08:05:46.8092028Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T08:05:46.8092382Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T08:05:46.8095064Z + aws ecr get-login-password --region us-east-1 2025-09-07T08:05:46.8099598Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T08:05:47.6138658Z 2025-09-07T08:05:47.6139202Z WARNING! Your credentials are stored unencrypted in '/home/charlie/.docker/config.json'. 2025-09-07T08:05:47.6140086Z Configure a credential helper to remove this warning. See 2025-09-07T08:05:47.6140738Z https://docs.docker.com/go/credential-store/ 2025-09-07T08:05:47.6141108Z 2025-09-07T08:05:47.6141247Z Login Succeeded 2025-09-07T08:05:47.6167354Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:05:47.6168271Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024' 2025-09-07T08:05:47.9831955Z + IMAGE_SIZE=36183.606596946716 2025-09-07T08:05:47.9832252Z Compressed size of image in MB: 36183.606596946716 2025-09-07T08:05:47.9832602Z + echo 'Compressed size of image in MB: 36183.606596946716' 2025-09-07T08:05:47.9832883Z + set -e 2025-09-07T08:05:47.9834335Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:05:47.9957900Z + retry docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:05:51.1678450Z + docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:05:51.1679662Z pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77: Pulling from pytorch/ci-image 2025-09-07T08:05:51.1680348Z e6fdc8487bfe: Pulling fs layer 2025-09-07T08:05:51.1680586Z 171dcef20c49: Pulling fs layer 2025-09-07T08:05:51.1680811Z 4c92b3f72f1d: Pulling fs layer 2025-09-07T08:05:51.1681020Z 744f9ba90a65: Pulling fs layer 2025-09-07T08:05:51.1681267Z d3c08322a332: Pulling fs layer 2025-09-07T08:05:51.1681474Z ffd43b71f3cc: Pulling fs layer 2025-09-07T08:05:51.1681679Z 830692b57f6e: Pulling fs layer 2025-09-07T08:05:51.1681883Z 5bad36d18468: Pulling fs layer 2025-09-07T08:05:51.1682088Z 0e34fdd9ac5c: Pulling fs layer 2025-09-07T08:05:51.1682285Z 3c868a62868e: Pulling fs layer 2025-09-07T08:05:51.1682489Z 62170a22dd57: Pulling fs layer 2025-09-07T08:05:51.1683134Z 553c1d23b6c4: Pulling fs layer 2025-09-07T08:05:51.1683344Z d3c08322a332: Waiting 2025-09-07T08:05:51.1683526Z 9408d557a804: Pulling fs layer 2025-09-07T08:05:51.1683918Z ffd43b71f3cc: Waiting 2025-09-07T08:05:51.1684114Z 5bad36d18468: Waiting 2025-09-07T08:05:51.1684306Z df607cfc7c07: Pulling fs layer 2025-09-07T08:05:51.1684506Z 0e34fdd9ac5c: Waiting 2025-09-07T08:05:51.1684704Z 4f4fb700ef54: Pulling fs layer 2025-09-07T08:05:51.1684911Z 830692b57f6e: Waiting 2025-09-07T08:05:51.1685087Z 3c868a62868e: Waiting 2025-09-07T08:05:51.1685251Z 9408d557a804: Waiting 2025-09-07T08:05:51.1685440Z 40a8e39faeda: Pulling fs layer 2025-09-07T08:05:51.1685645Z df607cfc7c07: Waiting 2025-09-07T08:05:51.1685827Z d895771c9fac: Pulling fs layer 2025-09-07T08:05:51.1686026Z 62170a22dd57: Waiting 2025-09-07T08:05:51.1686198Z 553c1d23b6c4: Waiting 2025-09-07T08:05:51.1686372Z 744f9ba90a65: Waiting 2025-09-07T08:05:51.1686559Z c4ee04f39d49: Pulling fs layer 2025-09-07T08:05:51.1686757Z 40a8e39faeda: Waiting 2025-09-07T08:05:51.1686937Z 4f4fb700ef54: Waiting 2025-09-07T08:05:51.1687115Z 3690c9826e48: Pulling fs layer 2025-09-07T08:05:51.1687318Z 57cbc5013733: Pulling fs layer 2025-09-07T08:05:51.1687518Z f5f4b06b58bb: Pulling fs layer 2025-09-07T08:05:51.1687708Z d895771c9fac: Waiting 2025-09-07T08:05:51.1687897Z f59713ce4bf4: Pulling fs layer 2025-09-07T08:05:51.1688092Z c4ee04f39d49: Waiting 2025-09-07T08:05:51.1688258Z fe0486521517: Pulling fs layer 2025-09-07T08:05:51.1688454Z 8c21cc3715a2: Pulling fs layer 2025-09-07T08:05:51.1688639Z 57cbc5013733: Waiting 2025-09-07T08:05:51.1688796Z f5f4b06b58bb: Waiting 2025-09-07T08:05:51.1688972Z d37c58456a6a: Pulling fs layer 2025-09-07T08:05:51.1689165Z 3690c9826e48: Waiting 2025-09-07T08:05:51.1689324Z fe0486521517: Waiting 2025-09-07T08:05:51.1689513Z d042f63abc13: Pulling fs layer 2025-09-07T08:05:51.1689696Z f59713ce4bf4: Waiting 2025-09-07T08:05:51.1689870Z d37c58456a6a: Waiting 2025-09-07T08:05:51.1690039Z 621284a9c05a: Pulling fs layer 2025-09-07T08:05:51.1690226Z d042f63abc13: Waiting 2025-09-07T08:05:51.1690382Z 8c21cc3715a2: Waiting 2025-09-07T08:05:51.1690550Z 85f605d2dd3a: Pulling fs layer 2025-09-07T08:05:51.1690741Z 621284a9c05a: Waiting 2025-09-07T08:05:51.1690911Z 381b5539e598: Pulling fs layer 2025-09-07T08:05:51.1691095Z a487c0c80029: Pulling fs layer 2025-09-07T08:05:51.1691290Z 48bcb81e2566: Pulling fs layer 2025-09-07T08:05:51.1691474Z 381b5539e598: Waiting 2025-09-07T08:05:51.1691633Z 85f605d2dd3a: Waiting 2025-09-07T08:05:51.1691788Z a487c0c80029: Waiting 2025-09-07T08:05:51.1691953Z e261928c0043: Pulling fs layer 2025-09-07T08:05:51.1692143Z 0fea55428091: Pulling fs layer 2025-09-07T08:05:51.1692544Z b4291bccbb84: Pulling fs layer 2025-09-07T08:05:51.1692744Z 48bcb81e2566: Waiting 2025-09-07T08:05:51.1692920Z ddc91b09189a: Pulling fs layer 2025-09-07T08:05:51.1693112Z 7540c7428627: Pulling fs layer 2025-09-07T08:05:51.1693303Z 003c4e2598fb: Pulling fs layer 2025-09-07T08:05:51.1693479Z e261928c0043: Waiting 2025-09-07T08:05:51.1693646Z 0fea55428091: Waiting 2025-09-07T08:05:51.1693996Z 5687149362ae: Pulling fs layer 2025-09-07T08:05:51.1694207Z cdd2cf54eb2a: Pulling fs layer 2025-09-07T08:05:51.1694394Z b4291bccbb84: Waiting 2025-09-07T08:05:51.1694571Z ddc91b09189a: Waiting 2025-09-07T08:05:51.1694738Z 7540c7428627: Waiting 2025-09-07T08:05:51.1694910Z d3ad4df1ba3a: Pulling fs layer 2025-09-07T08:05:51.1695098Z 3c9055753b4c: Pulling fs layer 2025-09-07T08:05:51.1695286Z 003c4e2598fb: Waiting 2025-09-07T08:05:51.1695459Z 31cf8d0bd21c: Pulling fs layer 2025-09-07T08:05:51.1695648Z d3ad4df1ba3a: Waiting 2025-09-07T08:05:51.1695812Z 6623ea814971: Pulling fs layer 2025-09-07T08:05:51.1696002Z 3c9055753b4c: Waiting 2025-09-07T08:05:51.1696174Z 11696c3aa380: Pulling fs layer 2025-09-07T08:05:51.1696360Z cdd2cf54eb2a: Waiting 2025-09-07T08:05:51.1696527Z ef4d544e35ca: Pulling fs layer 2025-09-07T08:05:51.1696722Z 31cf8d0bd21c: Waiting 2025-09-07T08:05:51.1696894Z 6623ea814971: Waiting 2025-09-07T08:05:51.1697055Z 5687149362ae: Waiting 2025-09-07T08:05:51.1697379Z 5c5108865e5e: Pulling fs layer 2025-09-07T08:05:51.1697563Z 11696c3aa380: Waiting 2025-09-07T08:05:51.1697727Z ef4d544e35ca: Waiting 2025-09-07T08:05:51.1697906Z 9e97578e9edf: Pulling fs layer 2025-09-07T08:05:51.1698094Z 5c5108865e5e: Waiting 2025-09-07T08:05:51.1698272Z da5a91b54cb5: Pulling fs layer 2025-09-07T08:05:51.1698465Z 1e93be219e89: Pulling fs layer 2025-09-07T08:05:51.1698658Z 136825afebb5: Pulling fs layer 2025-09-07T08:05:51.1698844Z 22b39805302d: Pulling fs layer 2025-09-07T08:05:51.1699046Z d12add675e35: Pulling fs layer 2025-09-07T08:05:51.1699240Z bc127046d33a: Pulling fs layer 2025-09-07T08:05:51.1699429Z 9e97578e9edf: Waiting 2025-09-07T08:05:51.1699598Z 951e8ce83841: Pulling fs layer 2025-09-07T08:05:51.1699785Z 32340b97ae50: Pulling fs layer 2025-09-07T08:05:51.1699969Z 136825afebb5: Waiting 2025-09-07T08:05:51.1700134Z 5bbb04cd6b57: Pulling fs layer 2025-09-07T08:05:51.1700318Z da5a91b54cb5: Waiting 2025-09-07T08:05:51.1700493Z d8c4b845cfc7: Pulling fs layer 2025-09-07T08:05:51.1700681Z 1e93be219e89: Waiting 2025-09-07T08:05:51.1700844Z b35c180f4d8d: Pulling fs layer 2025-09-07T08:05:51.1701028Z 22b39805302d: Waiting 2025-09-07T08:05:51.1701202Z d12add675e35: Waiting 2025-09-07T08:05:51.1701373Z 5f967b3c303a: Pulling fs layer 2025-09-07T08:05:51.1701569Z bc127046d33a: Waiting 2025-09-07T08:05:51.1701736Z 04770904f012: Pulling fs layer 2025-09-07T08:05:51.1701918Z 73373941fb32: Pulling fs layer 2025-09-07T08:05:51.1702108Z 9572e6cd907b: Pulling fs layer 2025-09-07T08:05:51.1702294Z 951e8ce83841: Waiting 2025-09-07T08:05:51.1702453Z 32340b97ae50: Waiting 2025-09-07T08:05:51.1702628Z 64a544aba233: Pulling fs layer 2025-09-07T08:05:51.1702823Z 7e35418a2499: Pulling fs layer 2025-09-07T08:05:51.1703012Z d8c4b845cfc7: Waiting 2025-09-07T08:05:51.1703173Z 5f967b3c303a: Waiting 2025-09-07T08:05:51.1703332Z 04770904f012: Waiting 2025-09-07T08:05:51.1703508Z 2ed8e82748d4: Pulling fs layer 2025-09-07T08:05:51.1703849Z c988fbcccd70: Pulling fs layer 2025-09-07T08:05:51.1704039Z 5bbb04cd6b57: Waiting 2025-09-07T08:05:51.1704203Z 64a544aba233: Waiting 2025-09-07T08:05:51.1704374Z 2ed8e82748d4: Waiting 2025-09-07T08:05:51.1704535Z c988fbcccd70: Waiting 2025-09-07T08:05:51.1704689Z 73373941fb32: Waiting 2025-09-07T08:05:51.1704849Z b35c180f4d8d: Waiting 2025-09-07T08:05:51.1705008Z 9572e6cd907b: Waiting 2025-09-07T08:05:51.3258790Z 171dcef20c49: Verifying Checksum 2025-09-07T08:05:51.3259164Z 171dcef20c49: Download complete 2025-09-07T08:05:51.4800100Z 744f9ba90a65: Verifying Checksum 2025-09-07T08:05:51.4800409Z 744f9ba90a65: Download complete 2025-09-07T08:05:51.6062883Z e6fdc8487bfe: Verifying Checksum 2025-09-07T08:05:51.6063967Z e6fdc8487bfe: Download complete 2025-09-07T08:05:51.6449798Z d3c08322a332: Verifying Checksum 2025-09-07T08:05:51.6450043Z d3c08322a332: Download complete 2025-09-07T08:05:51.8919843Z ffd43b71f3cc: Verifying Checksum 2025-09-07T08:05:51.8920433Z 830692b57f6e: Verifying Checksum 2025-09-07T08:05:51.8920994Z ffd43b71f3cc: Download complete 2025-09-07T08:05:51.8921596Z 830692b57f6e: Download complete 2025-09-07T08:05:52.2522922Z 5bad36d18468: Verifying Checksum 2025-09-07T08:05:52.2523176Z 5bad36d18468: Download complete 2025-09-07T08:05:52.4112893Z 3c868a62868e: Verifying Checksum 2025-09-07T08:05:52.4113190Z 3c868a62868e: Download complete 2025-09-07T08:05:52.5757264Z 62170a22dd57: Verifying Checksum 2025-09-07T08:05:52.5757576Z 62170a22dd57: Download complete 2025-09-07T08:05:52.7355643Z 553c1d23b6c4: Verifying Checksum 2025-09-07T08:05:52.7356207Z 553c1d23b6c4: Download complete 2025-09-07T08:05:52.8950442Z 9408d557a804: Download complete 2025-09-07T08:05:53.3358385Z 0e34fdd9ac5c: Verifying Checksum 2025-09-07T08:05:53.3358950Z 0e34fdd9ac5c: Download complete 2025-09-07T08:05:53.4250553Z 4f4fb700ef54: Verifying Checksum 2025-09-07T08:05:53.4250809Z 4f4fb700ef54: Download complete 2025-09-07T08:05:53.5717151Z 40a8e39faeda: Verifying Checksum 2025-09-07T08:05:53.5717608Z 40a8e39faeda: Download complete 2025-09-07T08:05:54.2298259Z d895771c9fac: Verifying Checksum 2025-09-07T08:05:54.2298938Z d895771c9fac: Download complete 2025-09-07T08:05:54.4677549Z c4ee04f39d49: Verifying Checksum 2025-09-07T08:05:54.4677884Z c4ee04f39d49: Download complete 2025-09-07T08:05:54.6856438Z 3690c9826e48: Verifying Checksum 2025-09-07T08:05:54.6856724Z 3690c9826e48: Download complete 2025-09-07T08:05:54.7371519Z 4c92b3f72f1d: Verifying Checksum 2025-09-07T08:05:54.7371762Z 4c92b3f72f1d: Download complete 2025-09-07T08:05:55.1217706Z f5f4b06b58bb: Download complete 2025-09-07T08:05:55.1217972Z 57cbc5013733: Download complete 2025-09-07T08:05:55.2502153Z fe0486521517: Verifying Checksum 2025-09-07T08:05:55.2502706Z fe0486521517: Download complete 2025-09-07T08:05:55.2693874Z f59713ce4bf4: Verifying Checksum 2025-09-07T08:05:55.2694148Z f59713ce4bf4: Download complete 2025-09-07T08:05:55.4255417Z d37c58456a6a: Verifying Checksum 2025-09-07T08:05:55.4255689Z d37c58456a6a: Download complete 2025-09-07T08:05:55.4960700Z e6fdc8487bfe: Pull complete 2025-09-07T08:05:55.9273131Z d042f63abc13: Verifying Checksum 2025-09-07T08:05:55.9273608Z d042f63abc13: Download complete 2025-09-07T08:05:56.0718729Z 621284a9c05a: Verifying Checksum 2025-09-07T08:05:56.0719174Z 621284a9c05a: Download complete 2025-09-07T08:05:56.3226552Z 85f605d2dd3a: Verifying Checksum 2025-09-07T08:05:56.3227005Z 85f605d2dd3a: Download complete 2025-09-07T08:06:01.0328985Z 171dcef20c49: Pull complete 2025-09-07T08:06:01.2744177Z 381b5539e598: Verifying Checksum 2025-09-07T08:06:01.2744494Z 381b5539e598: Download complete 2025-09-07T08:06:01.6400208Z a487c0c80029: Verifying Checksum 2025-09-07T08:06:01.6400509Z a487c0c80029: Download complete 2025-09-07T08:06:01.9091920Z 48bcb81e2566: Download complete 2025-09-07T08:06:02.2448960Z e261928c0043: Verifying Checksum 2025-09-07T08:06:02.2449244Z e261928c0043: Download complete 2025-09-07T08:06:02.5457195Z 0fea55428091: Verifying Checksum 2025-09-07T08:06:02.5457767Z 0fea55428091: Download complete 2025-09-07T08:06:03.1519041Z b4291bccbb84: Verifying Checksum 2025-09-07T08:06:03.1519405Z b4291bccbb84: Download complete 2025-09-07T08:06:03.4008995Z ddc91b09189a: Verifying Checksum 2025-09-07T08:06:03.4009311Z ddc91b09189a: Download complete 2025-09-07T08:06:03.6250676Z 7540c7428627: Download complete 2025-09-07T08:06:03.7862466Z 003c4e2598fb: Verifying Checksum 2025-09-07T08:06:03.7862783Z 003c4e2598fb: Download complete 2025-09-07T08:06:03.9273289Z 5687149362ae: Verifying Checksum 2025-09-07T08:06:03.9273594Z 5687149362ae: Download complete 2025-09-07T08:06:04.0840353Z cdd2cf54eb2a: Verifying Checksum 2025-09-07T08:06:04.0840600Z cdd2cf54eb2a: Download complete 2025-09-07T08:06:13.3777332Z 4c92b3f72f1d: Pull complete 2025-09-07T08:06:18.7855965Z 744f9ba90a65: Pull complete 2025-09-07T08:06:23.6985184Z d3c08322a332: Pull complete 2025-09-07T08:06:25.9000573Z df607cfc7c07: Verifying Checksum 2025-09-07T08:06:25.9001093Z df607cfc7c07: Download complete 2025-09-07T08:06:26.1064202Z 3c9055753b4c: Download complete 2025-09-07T08:06:29.4401256Z ffd43b71f3cc: Pull complete 2025-09-07T08:06:29.7918882Z 31cf8d0bd21c: Verifying Checksum 2025-09-07T08:06:29.7919244Z 31cf8d0bd21c: Download complete 2025-09-07T08:06:35.5984707Z 830692b57f6e: Pull complete 2025-09-07T08:06:41.1017790Z 5bad36d18468: Pull complete 2025-09-07T08:06:47.8320347Z 0e34fdd9ac5c: Pull complete 2025-09-07T08:06:52.8301792Z 3c868a62868e: Pull complete 2025-09-07T08:06:58.0354597Z 62170a22dd57: Pull complete 2025-09-07T08:07:03.9253463Z 553c1d23b6c4: Pull complete 2025-09-07T08:07:09.3289935Z 9408d557a804: Pull complete 2025-09-07T08:07:14.1411262Z 8c21cc3715a2: Verifying Checksum 2025-09-07T08:07:14.1411606Z 8c21cc3715a2: Download complete 2025-09-07T08:07:14.2988302Z 11696c3aa380: Download complete 2025-09-07T08:07:14.4591067Z ef4d544e35ca: Download complete 2025-09-07T08:07:14.6072823Z 5c5108865e5e: Verifying Checksum 2025-09-07T08:07:14.6073135Z 5c5108865e5e: Download complete 2025-09-07T08:07:14.7723610Z 9e97578e9edf: Verifying Checksum 2025-09-07T08:07:14.7724174Z 9e97578e9edf: Download complete 2025-09-07T08:07:14.9243153Z da5a91b54cb5: Download complete 2025-09-07T08:07:15.0920558Z 1e93be219e89: Verifying Checksum 2025-09-07T08:07:15.0920845Z 1e93be219e89: Download complete 2025-09-07T08:07:15.2504759Z 136825afebb5: Download complete 2025-09-07T08:07:15.4154361Z 22b39805302d: Verifying Checksum 2025-09-07T08:07:15.4155011Z 22b39805302d: Download complete 2025-09-07T08:07:15.5625116Z d12add675e35: Verifying Checksum 2025-09-07T08:07:15.5625602Z d12add675e35: Download complete 2025-09-07T08:07:15.7144755Z bc127046d33a: Verifying Checksum 2025-09-07T08:07:15.7145069Z bc127046d33a: Download complete 2025-09-07T08:07:15.8776856Z 951e8ce83841: Verifying Checksum 2025-09-07T08:07:15.8777213Z 951e8ce83841: Download complete 2025-09-07T08:07:16.0185782Z 32340b97ae50: Verifying Checksum 2025-09-07T08:07:16.0186127Z 32340b97ae50: Download complete 2025-09-07T08:07:16.1726217Z 5bbb04cd6b57: Download complete 2025-09-07T08:07:16.3161969Z d8c4b845cfc7: Download complete 2025-09-07T08:07:18.8932925Z b35c180f4d8d: Verifying Checksum 2025-09-07T08:07:18.8933297Z b35c180f4d8d: Download complete 2025-09-07T08:07:19.0387874Z 5f967b3c303a: Verifying Checksum 2025-09-07T08:07:19.0388241Z 5f967b3c303a: Download complete 2025-09-07T08:07:19.1922034Z 04770904f012: Verifying Checksum 2025-09-07T08:07:19.1922358Z 04770904f012: Download complete 2025-09-07T08:07:19.3599486Z 73373941fb32: Download complete 2025-09-07T08:07:19.5262947Z 9572e6cd907b: Download complete 2025-09-07T08:07:19.6808605Z 64a544aba233: Verifying Checksum 2025-09-07T08:07:19.6808854Z 64a544aba233: Download complete 2025-09-07T08:07:19.9826210Z 7e35418a2499: Verifying Checksum 2025-09-07T08:07:19.9826529Z 7e35418a2499: Download complete 2025-09-07T08:07:20.1480105Z 2ed8e82748d4: Download complete 2025-09-07T08:07:20.8454156Z c988fbcccd70: Verifying Checksum 2025-09-07T08:07:20.8454434Z c988fbcccd70: Download complete 2025-09-07T08:07:35.2766159Z 6623ea814971: Verifying Checksum 2025-09-07T08:07:35.2766468Z 6623ea814971: Download complete 2025-09-07T08:07:51.5151285Z df607cfc7c07: Pull complete 2025-09-07T08:07:57.0665411Z 4f4fb700ef54: Pull complete 2025-09-07T08:08:02.6597783Z 40a8e39faeda: Pull complete 2025-09-07T08:08:09.2560585Z d895771c9fac: Pull complete 2025-09-07T08:08:14.8348548Z c4ee04f39d49: Pull complete 2025-09-07T08:08:20.5094745Z 3690c9826e48: Pull complete 2025-09-07T08:08:25.6470017Z 57cbc5013733: Pull complete 2025-09-07T08:08:31.2937887Z f5f4b06b58bb: Pull complete 2025-09-07T08:08:36.4794232Z f59713ce4bf4: Pull complete 2025-09-07T08:08:42.0525597Z fe0486521517: Pull complete 2025-09-07T08:09:10.3758151Z d3ad4df1ba3a: Verifying Checksum 2025-09-07T08:09:10.3758453Z d3ad4df1ba3a: Download complete 2025-09-07T08:14:26.4304934Z 8c21cc3715a2: Pull complete 2025-09-07T08:14:29.9660026Z d37c58456a6a: Pull complete 2025-09-07T08:14:33.6371642Z d042f63abc13: Pull complete 2025-09-07T08:14:37.0408476Z 621284a9c05a: Pull complete 2025-09-07T08:14:40.8526382Z 85f605d2dd3a: Pull complete 2025-09-07T08:14:49.5580679Z 381b5539e598: Pull complete 2025-09-07T08:14:52.8991277Z a487c0c80029: Pull complete 2025-09-07T08:14:56.8363214Z 48bcb81e2566: Pull complete 2025-09-07T08:15:03.5381155Z e261928c0043: Pull complete 2025-09-07T08:15:06.9754869Z 0fea55428091: Pull complete 2025-09-07T08:15:10.6843892Z b4291bccbb84: Pull complete 2025-09-07T08:15:16.1409140Z ddc91b09189a: Pull complete 2025-09-07T08:15:21.8324583Z 7540c7428627: Pull complete 2025-09-07T08:15:33.1334580Z 003c4e2598fb: Pull complete 2025-09-07T08:15:38.9129174Z 5687149362ae: Pull complete 2025-09-07T08:15:44.4932172Z cdd2cf54eb2a: Pull complete 2025-09-07T08:17:38.5112239Z d3ad4df1ba3a: Pull complete 2025-09-07T08:17:41.6519599Z 3c9055753b4c: Pull complete 2025-09-07T08:17:45.5387043Z 31cf8d0bd21c: Pull complete 2025-09-07T08:19:24.8958879Z 6623ea814971: Pull complete 2025-09-07T08:19:28.8647929Z 11696c3aa380: Pull complete 2025-09-07T08:19:30.8209479Z ef4d544e35ca: Pull complete 2025-09-07T08:24:59.8307804Z 5c5108865e5e: Pull complete 2025-09-07T08:24:59.9173315Z 9e97578e9edf: Pull complete 2025-09-07T08:24:59.9676716Z da5a91b54cb5: Pull complete 2025-09-07T08:25:00.0936256Z 1e93be219e89: Pull complete 2025-09-07T08:25:00.1920795Z 136825afebb5: Pull complete 2025-09-07T08:25:00.2496381Z 22b39805302d: Pull complete 2025-09-07T08:25:00.3341560Z d12add675e35: Pull complete 2025-09-07T08:25:00.3826867Z bc127046d33a: Pull complete 2025-09-07T08:25:00.4673321Z 951e8ce83841: Pull complete 2025-09-07T08:25:00.5059084Z 32340b97ae50: Pull complete 2025-09-07T08:25:00.5797290Z 5bbb04cd6b57: Pull complete 2025-09-07T08:25:00.6158080Z d8c4b845cfc7: Pull complete 2025-09-07T08:25:05.6188749Z b35c180f4d8d: Pull complete 2025-09-07T08:25:05.6582546Z 5f967b3c303a: Pull complete 2025-09-07T08:25:05.6976125Z 04770904f012: Pull complete 2025-09-07T08:25:05.7368342Z 73373941fb32: Pull complete 2025-09-07T08:25:05.7761518Z 9572e6cd907b: Pull complete 2025-09-07T08:25:05.8229409Z 64a544aba233: Pull complete 2025-09-07T08:25:07.1741825Z 7e35418a2499: Pull complete 2025-09-07T08:25:07.2216793Z 2ed8e82748d4: Pull complete 2025-09-07T08:25:08.0572467Z c988fbcccd70: Pull complete 2025-09-07T08:25:08.1187700Z Digest: sha256:f30843ff9ea9e117a2c8e6d207e85c9e77dfe682f1dfcdfea5b94178d1bf00b3 2025-09-07T08:25:08.1266593Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:25:08.1294547Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:25:08.1350439Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T08:25:08.1351293Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T08:25:08.1367724Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:25:08.1368037Z env: 2025-09-07T08:25:08.1368199Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:25:08.1368410Z ##[endgroup] 2025-09-07T08:25:08.1449711Z ##[group]Run echo "GPU_FLAG=--gpus all -e NVIDIA_DRIVER_CAPABILITIES=all" >> "${GITHUB_ENV}" 2025-09-07T08:25:08.1450254Z echo "GPU_FLAG=--gpus all -e NVIDIA_DRIVER_CAPABILITIES=all" >> "${GITHUB_ENV}" 2025-09-07T08:25:08.1466692Z shell: /usr/bin/bash -e {0} 2025-09-07T08:25:08.1466911Z env: 2025-09-07T08:25:08.1467077Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:25:08.1467276Z ##[endgroup] 2025-09-07T08:25:08.1530036Z ##[group]Run echo "SCCACHE_SERVER_PORT_DOCKER_FLAG=-e SCCACHE_SERVER_PORT=$((RUNNER_UID + 4226))" >> "${GITHUB_ENV}" 2025-09-07T08:25:08.1530717Z echo "SCCACHE_SERVER_PORT_DOCKER_FLAG=-e SCCACHE_SERVER_PORT=$((RUNNER_UID + 4226))" >> "${GITHUB_ENV}" 2025-09-07T08:25:08.1546724Z shell: /usr/bin/bash -e {0} 2025-09-07T08:25:08.1546943Z env: 2025-09-07T08:25:08.1547107Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:25:08.1547365Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:25:08.1547660Z ##[endgroup] 2025-09-07T08:25:08.1614060Z Prepare all required actions 2025-09-07T08:25:08.1637805Z ##[group]Run ./.github/actions/get-workflow-job-id 2025-09-07T08:25:08.1638057Z with: 2025-09-07T08:25:08.1638647Z github-token: *** 2025-09-07T08:25:08.1638836Z env: 2025-09-07T08:25:08.1639000Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:25:08.1639242Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:25:08.1639577Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:25:08.1639855Z ##[endgroup] 2025-09-07T08:25:08.1652350Z ##[group]Run set -eux 2025-09-07T08:25:08.1652565Z set -eux 2025-09-07T08:25:08.1652892Z python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}" 2025-09-07T08:25:08.1667878Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:25:08.1668168Z env: 2025-09-07T08:25:08.1668335Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:25:08.1668848Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:25:08.1669201Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:25:08.1669708Z GITHUB_TOKEN: *** 2025-09-07T08:25:08.1669895Z ##[endgroup] 2025-09-07T08:25:08.1700921Z + python3 .github/scripts/get_workflow_job_id.py 17525296438 i-05a095f6e498981b2-1003 2025-09-07T08:25:08.7977501Z Setting output job-id=49775781833 2025-09-07T08:25:08.7978001Z Setting output job-name=test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100) 2025-09-07T08:25:08.8076531Z ##[group]Run python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84 2025-09-07T08:25:08.8077116Z python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84 2025-09-07T08:25:08.8078034Z python3 -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 & 2025-09-07T08:25:08.8078665Z echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}" 2025-09-07T08:25:08.8093910Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:25:08.8094236Z env: 2025-09-07T08:25:08.8094406Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:25:08.8094660Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:25:08.8095002Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:25:08.8095284Z JOB_ID: 49775781833 2025-09-07T08:25:08.8095615Z JOB_NAME: test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100) 2025-09-07T08:25:08.8095985Z WORKFLOW_NAME: inductor-perf-nightly-h100 2025-09-07T08:25:08.8096250Z WORKFLOW_RUN_ID: 17525296438 2025-09-07T08:25:08.8096459Z MONITOR_LOG_INTERVAL: 15 2025-09-07T08:25:08.8096666Z MONITOR_DATA_COLLECT_INTERVAL: 4 2025-09-07T08:25:08.8096874Z ##[endgroup] 2025-09-07T08:25:09.0946724Z Defaulting to user installation because normal site-packages is not writeable 2025-09-07T08:25:09.3624402Z Collecting psutil==5.9.8 2025-09-07T08:25:09.4251302Z Downloading psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (288 kB) 2025-09-07T08:25:09.4718773Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 288.2/288.2 KB 6.4 MB/s eta 0:00:00 2025-09-07T08:25:09.5158955Z Collecting dataclasses_json==0.6.7 2025-09-07T08:25:09.5275457Z Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB) 2025-09-07T08:25:09.6703412Z Collecting nvidia-ml-py==11.525.84 2025-09-07T08:25:09.6820068Z Downloading nvidia_ml_py-11.525.84-py3-none-any.whl (34 kB) 2025-09-07T08:25:09.7159148Z Collecting typing-inspect<1,>=0.4.0 2025-09-07T08:25:09.7280234Z Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB) 2025-09-07T08:25:09.8037684Z Collecting marshmallow<4.0.0,>=3.18.0 2025-09-07T08:25:09.8157233Z Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB) 2025-09-07T08:25:09.8237797Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50.9/50.9 KB 6.5 MB/s eta 0:00:00 2025-09-07T08:25:09.8651635Z Collecting packaging>=17.0 2025-09-07T08:25:09.8770408Z Downloading packaging-25.0-py3-none-any.whl (66 kB) 2025-09-07T08:25:09.8859730Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.5/66.5 KB 7.7 MB/s eta 0:00:00 2025-09-07T08:25:09.9230895Z Collecting typing-extensions>=3.7.4 2025-09-07T08:25:09.9352322Z Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB) 2025-09-07T08:25:09.9429132Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 KB 5.7 MB/s eta 0:00:00 2025-09-07T08:25:09.9605943Z Collecting mypy-extensions>=0.3.0 2025-09-07T08:25:09.9737218Z Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB) 2025-09-07T08:25:10.0442160Z Installing collected packages: nvidia-ml-py, typing-extensions, psutil, packaging, mypy-extensions, typing-inspect, marshmallow, dataclasses_json 2025-09-07T08:25:10.3580984Z Successfully installed dataclasses_json-0.6.7 marshmallow-3.26.1 mypy-extensions-1.1.0 nvidia-ml-py-11.525.84 packaging-25.0 psutil-5.9.8 typing-extensions-4.15.0 typing-inspect-0.9.0 2025-09-07T08:25:10.4136563Z Prepare all required actions 2025-09-07T08:25:10.4136891Z Getting action download info 2025-09-07T08:25:10.5957746Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-09-07T08:25:10.9979479Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093) 2025-09-07T08:25:18.1966436Z ##[group]Run ./.github/actions/download-build-artifacts 2025-09-07T08:25:18.1966722Z with: 2025-09-07T08:25:18.1966927Z name: linux-jammy-cuda12.8-py3.10-gcc9-sm90 2025-09-07T08:25:18.1967182Z s3-bucket: gha-artifacts 2025-09-07T08:25:18.1967368Z env: 2025-09-07T08:25:18.1967528Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:25:18.1967775Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:25:18.1968103Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:25:18.1968378Z ##[endgroup] 2025-09-07T08:25:18.5122693Z ##[group]Run seemethere/download-artifact-s3@v4 2025-09-07T08:25:18.5122987Z with: 2025-09-07T08:25:18.5123192Z name: linux-jammy-cuda12.8-py3.10-gcc9-sm90 2025-09-07T08:25:18.5123462Z s3-bucket: gha-artifacts 2025-09-07T08:25:18.5123682Z region: us-east-1 2025-09-07T08:25:18.5124044Z env: 2025-09-07T08:25:18.5124216Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:25:18.5124484Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:25:18.5124850Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:25:18.5125163Z ##[endgroup] 2025-09-07T08:25:18.9799494Z (node:9224) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-09-07T08:25:18.9799962Z 2025-09-07T08:25:18.9800143Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-09-07T08:25:18.9800646Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-09-07T08:25:18.9801185Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-09-07T08:25:19.1095881Z Found 1 objects with prefix pytorch/pytorch/17525296438/linux-jammy-cuda12.8-py3.10-gcc9-sm90/ 2025-09-07T08:25:19.1096978Z Starting download (1/1): /home/charlie/_work/pytorch/pytorch/artifacts.zip 2025-09-07T08:25:43.9567333Z Finished download (1/1): /home/charlie/_work/pytorch/pytorch/artifacts.zip 2025-09-07T08:25:43.9574864Z Artifact download has finished successfully 2025-09-07T08:25:44.0190794Z ##[group]Run unzip -o artifacts.zip 2025-09-07T08:25:44.0191073Z unzip -o artifacts.zip 2025-09-07T08:25:44.0207694Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:25:44.0208001Z env: 2025-09-07T08:25:44.0208171Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:25:44.0208438Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:25:44.0208777Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:25:44.0209058Z ##[endgroup] 2025-09-07T08:25:44.0607513Z Archive: artifacts.zip 2025-09-07T08:25:44.0609466Z creating: dist/ 2025-09-07T08:25:45.7971977Z inflating: dist/torch-2.9.0a0+git93fb23d-cp310-cp310-linux_x86_64.whl 2025-09-07T08:25:45.7972588Z creating: dist/vision/ 2025-09-07T08:25:45.8080127Z inflating: dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl 2025-09-07T08:25:45.8080618Z creating: dist/audio/ 2025-09-07T08:25:45.8136039Z inflating: dist/audio/torchaudio-2.8.0a0+2e30055-cp310-cp310-linux_x86_64.whl 2025-09-07T08:25:45.8136486Z creating: dist/torchrec/ 2025-09-07T08:25:45.8160344Z inflating: dist/torchrec/torchrec-0.3.2-py3-none-any.whl 2025-09-07T08:25:45.8160682Z creating: dist/fbgemm_gpu/ 2025-09-07T08:25:46.6446245Z inflating: dist/fbgemm_gpu/fbgemm_gpu-0.4.1.post421-cp310-cp310-linux_x86_64.whl 2025-09-07T08:25:46.6446685Z creating: dist/ao/ 2025-09-07T08:25:46.6484769Z inflating: dist/ao/torchao-0.7.0+git51c87b6e-py3-none-any.whl 2025-09-07T08:25:46.6607805Z inflating: dist/.ninja_log 2025-09-07T08:25:46.6608566Z creating: build/custom_test_artifacts/ 2025-09-07T08:25:46.6609493Z creating: build/custom_test_artifacts/custom-op-build/ 2025-09-07T08:25:46.6609930Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2025-09-07T08:25:46.6610445Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/ 2025-09-07T08:25:46.6617461Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml 2025-09-07T08:25:46.6618017Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/ 2025-09-07T08:25:46.6618546Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeSystem.cmake 2025-09-07T08:25:46.6619123Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/ 2025-09-07T08:25:46.6619681Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/tmp/ 2025-09-07T08:25:46.6621741Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/CMakeCCompilerId.c 2025-09-07T08:25:46.6622915Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/a.out 2025-09-07T08:25:46.6623558Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeCCompiler.cmake 2025-09-07T08:25:46.6624299Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/ 2025-09-07T08:25:46.6624830Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/tmp/ 2025-09-07T08:25:46.6627069Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-09-07T08:25:46.6628241Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/a.out 2025-09-07T08:25:46.6629095Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeCXXCompiler.cmake 2025-09-07T08:25:46.6630428Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_C.bin 2025-09-07T08:25:46.6631727Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CXX.bin 2025-09-07T08:25:46.6632327Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/ 2025-09-07T08:25:46.6632850Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/ 2025-09-07T08:25:46.6673415Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-09-07T08:25:46.6713106Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-09-07T08:25:46.6714219Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-09-07T08:25:46.6759559Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-09-07T08:25:46.6760482Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-09-07T08:25:46.6761419Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-09-07T08:25:46.6762416Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-09-07T08:25:46.6763311Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-09-07T08:25:46.6764310Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-09-07T08:25:46.6765182Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-09-07T08:25:46.6766267Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-09-07T08:25:46.6767263Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-09-07T08:25:46.6768097Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-09-07T08:25:46.6768873Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-09-07T08:25:46.6769642Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-09-07T08:25:46.6770413Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-09-07T08:25:46.6771157Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.o 2025-09-07T08:25:46.6771918Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-09-07T08:25:46.6837288Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCUDA/a.out 2025-09-07T08:25:46.6837975Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeCUDACompiler.cmake 2025-09-07T08:25:46.6905612Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CUDA.bin 2025-09-07T08:25:46.6906302Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/ 2025-09-07T08:25:46.6906783Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2025-09-07T08:25:46.6907279Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2025-09-07T08:25:46.6907799Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2025-09-07T08:25:46.6908377Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts 2025-09-07T08:25:46.6909063Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make 2025-09-07T08:25:46.6909703Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2025-09-07T08:25:46.6910266Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2025-09-07T08:25:46.6910841Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2025-09-07T08:25:46.6911437Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2025-09-07T08:25:46.6912041Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2025-09-07T08:25:46.6912622Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2025-09-07T08:25:46.6913212Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2025-09-07T08:25:46.6931472Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d 2025-09-07T08:25:46.7118575Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2025-09-07T08:25:46.7119369Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2025-09-07T08:25:46.7120206Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts 2025-09-07T08:25:46.7121150Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make 2025-09-07T08:25:46.7122224Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2025-09-07T08:25:46.7123002Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2025-09-07T08:25:46.7123876Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2025-09-07T08:25:46.7125215Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2025-09-07T08:25:46.7125941Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2025-09-07T08:25:46.7126645Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2025-09-07T08:25:46.7127340Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2025-09-07T08:25:46.7143260Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d 2025-09-07T08:25:46.7217413Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2025-09-07T08:25:46.7218392Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-09-07T08:25:46.7219224Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2025-09-07T08:25:46.7220001Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2025-09-07T08:25:46.7220703Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2025-09-07T08:25:46.7221384Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2025-09-07T08:25:46.7222111Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/InstallScripts.json 2025-09-07T08:25:46.7222674Z inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc 2025-09-07T08:25:46.7223646Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2025-09-07T08:25:46.7224721Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2025-09-07T08:25:46.7225269Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2025-09-07T08:25:46.7382678Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2025-09-07T08:25:46.7433497Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2025-09-07T08:25:46.7434500Z creating: build/custom_test_artifacts/jit-hook-build/ 2025-09-07T08:25:46.7435167Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2025-09-07T08:25:46.7435968Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/ 2025-09-07T08:25:46.7442138Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml 2025-09-07T08:25:46.7442914Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/ 2025-09-07T08:25:46.7443473Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeSystem.cmake 2025-09-07T08:25:46.7444213Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/ 2025-09-07T08:25:46.7444801Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/tmp/ 2025-09-07T08:25:46.7446083Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/CMakeCCompilerId.c 2025-09-07T08:25:46.7447377Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/a.out 2025-09-07T08:25:46.7448207Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeCCompiler.cmake 2025-09-07T08:25:46.7448899Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/ 2025-09-07T08:25:46.7449519Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/tmp/ 2025-09-07T08:25:46.7451313Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-09-07T08:25:46.7452444Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/a.out 2025-09-07T08:25:46.7453261Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeCXXCompiler.cmake 2025-09-07T08:25:46.7455238Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_C.bin 2025-09-07T08:25:46.7456139Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CXX.bin 2025-09-07T08:25:46.7456784Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/ 2025-09-07T08:25:46.7457353Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/ 2025-09-07T08:25:46.7497662Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-09-07T08:25:46.7539461Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-09-07T08:25:46.7540944Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-09-07T08:25:46.7585407Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-09-07T08:25:46.7586876Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-09-07T08:25:46.7588299Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-09-07T08:25:46.7589746Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-09-07T08:25:46.7591171Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-09-07T08:25:46.7592576Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-09-07T08:25:46.7593382Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-09-07T08:25:46.7594336Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-09-07T08:25:46.7595121Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-09-07T08:25:46.7595857Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-09-07T08:25:46.7596573Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-09-07T08:25:46.7597318Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-09-07T08:25:46.7598021Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-09-07T08:25:46.7598698Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.o 2025-09-07T08:25:46.7599385Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-09-07T08:25:46.7662738Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCUDA/a.out 2025-09-07T08:25:46.7664116Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeCUDACompiler.cmake 2025-09-07T08:25:46.7730437Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CUDA.bin 2025-09-07T08:25:46.7731103Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/ 2025-09-07T08:25:46.7731625Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2025-09-07T08:25:46.7732671Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2025-09-07T08:25:46.7733933Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2025-09-07T08:25:46.7735010Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts 2025-09-07T08:25:46.7736569Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make 2025-09-07T08:25:46.7737980Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2025-09-07T08:25:46.7739065Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2025-09-07T08:25:46.7740178Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2025-09-07T08:25:46.7741279Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2025-09-07T08:25:46.7742475Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2025-09-07T08:25:46.7743113Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2025-09-07T08:25:46.7743883Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2025-09-07T08:25:46.7756828Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d 2025-09-07T08:25:46.7814226Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2025-09-07T08:25:46.7814951Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-09-07T08:25:46.7815579Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2025-09-07T08:25:46.7816145Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2025-09-07T08:25:46.7816663Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2025-09-07T08:25:46.7817503Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2025-09-07T08:25:46.7818048Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/InstallScripts.json 2025-09-07T08:25:46.7818601Z inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc 2025-09-07T08:25:46.7820754Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2025-09-07T08:25:46.7821352Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2025-09-07T08:25:46.7821940Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2025-09-07T08:25:46.7857113Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2025-09-07T08:25:46.7857857Z creating: build/custom_test_artifacts/custom-backend-build/ 2025-09-07T08:25:46.7858593Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2025-09-07T08:25:46.7859478Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/ 2025-09-07T08:25:46.7865266Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml 2025-09-07T08:25:46.7865860Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/ 2025-09-07T08:25:46.7866438Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeSystem.cmake 2025-09-07T08:25:46.7867045Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/ 2025-09-07T08:25:46.7867629Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/tmp/ 2025-09-07T08:25:46.7869377Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/CMakeCCompilerId.c 2025-09-07T08:25:46.7870544Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/a.out 2025-09-07T08:25:46.7871262Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeCCompiler.cmake 2025-09-07T08:25:46.7871913Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/ 2025-09-07T08:25:46.7872519Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/tmp/ 2025-09-07T08:25:46.7874990Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-09-07T08:25:46.7875908Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/a.out 2025-09-07T08:25:46.7876760Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeCXXCompiler.cmake 2025-09-07T08:25:46.7878180Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_C.bin 2025-09-07T08:25:46.7879449Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CXX.bin 2025-09-07T08:25:46.7880138Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/ 2025-09-07T08:25:46.7880750Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/ 2025-09-07T08:25:46.7920988Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-09-07T08:25:46.7960972Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-09-07T08:25:46.7962697Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-09-07T08:25:46.8009194Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-09-07T08:25:46.8010141Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-09-07T08:25:46.8011098Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-09-07T08:25:46.8012491Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-09-07T08:25:46.8014390Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-09-07T08:25:46.8015916Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-09-07T08:25:46.8017415Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-09-07T08:25:46.8018897Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-09-07T08:25:46.8020332Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-09-07T08:25:46.8021738Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-09-07T08:25:46.8022748Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-09-07T08:25:46.8023519Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-09-07T08:25:46.8024413Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-09-07T08:25:46.8025142Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/tmp/a_dlink.o 2025-09-07T08:25:46.8025886Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-09-07T08:25:46.8086688Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCUDA/a.out 2025-09-07T08:25:46.8087369Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeCUDACompiler.cmake 2025-09-07T08:25:46.8154387Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CUDA.bin 2025-09-07T08:25:46.8155533Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/ 2025-09-07T08:25:46.8156132Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2025-09-07T08:25:46.8156751Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2025-09-07T08:25:46.8157494Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2025-09-07T08:25:46.8158234Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts 2025-09-07T08:25:46.8159049Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make 2025-09-07T08:25:46.8159825Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2025-09-07T08:25:46.8160547Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2025-09-07T08:25:46.8161315Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2025-09-07T08:25:46.8162221Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2025-09-07T08:25:46.8162971Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2025-09-07T08:25:46.8163899Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2025-09-07T08:25:46.8164647Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2025-09-07T08:25:46.8165478Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d 2025-09-07T08:25:46.8274223Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2025-09-07T08:25:46.8274921Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2025-09-07T08:25:46.8275589Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts 2025-09-07T08:25:46.8276309Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make 2025-09-07T08:25:46.8276993Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2025-09-07T08:25:46.8277742Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2025-09-07T08:25:46.8278398Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2025-09-07T08:25:46.8279057Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2025-09-07T08:25:46.8340884Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2025-09-07T08:25:46.8341602Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2025-09-07T08:25:46.8342271Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2025-09-07T08:25:46.8342979Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d 2025-09-07T08:25:46.8348797Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2025-09-07T08:25:46.8349511Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-09-07T08:25:46.8350144Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2025-09-07T08:25:46.8350699Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2025-09-07T08:25:46.8351593Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2025-09-07T08:25:46.8352228Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2025-09-07T08:25:46.8352763Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/InstallScripts.json 2025-09-07T08:25:46.8353295Z inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc 2025-09-07T08:25:46.8355770Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2025-09-07T08:25:46.8356622Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2025-09-07T08:25:46.8357141Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2025-09-07T08:25:46.8450200Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2025-09-07T08:25:46.8485679Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2025-09-07T08:25:46.8486086Z creating: build/lib/ 2025-09-07T08:25:46.8564957Z inflating: build/lib/libprotobuf-lite.a 2025-09-07T08:25:46.8973379Z inflating: build/lib/libprotobuf.a 2025-09-07T08:25:46.8982034Z inflating: build/lib/libpthreadpool.a 2025-09-07T08:25:46.8989181Z inflating: build/lib/libcpuinfo.a 2025-09-07T08:25:46.9438601Z inflating: build/lib/libprotoc.a 2025-09-07T08:25:46.9445156Z inflating: build/lib/libcpuinfo_internals.a 2025-09-07T08:25:46.9445913Z inflating: build/lib/libclog.a 2025-09-07T08:25:46.9447905Z inflating: build/lib/libnnpack_reference_layers.a 2025-09-07T08:25:46.9464776Z inflating: build/lib/libpytorch_qnnpack.a 2025-09-07T08:25:46.9627213Z inflating: build/lib/libmicrokernels-prod.a 2025-09-07T08:25:46.9643105Z inflating: build/lib/libnnpack.a 2025-09-07T08:25:47.0344908Z inflating: build/lib/libmicrokernels-all.a 2025-09-07T08:25:47.0404989Z inflating: build/lib/libgtest.a 2025-09-07T08:25:47.0420317Z inflating: build/lib/libgmock.a 2025-09-07T08:25:47.0420889Z inflating: build/lib/libgmock_main.a 2025-09-07T08:25:47.0421647Z inflating: build/lib/libgtest_main.a 2025-09-07T08:25:47.0490464Z inflating: build/lib/libbenchmark.a 2025-09-07T08:25:47.0490882Z inflating: build/lib/libbenchmark_main.a 2025-09-07T08:25:47.0573432Z inflating: build/lib/libXNNPACK.a 2025-09-07T08:25:47.0574201Z inflating: build/lib/libjitprofiling.a 2025-09-07T08:25:47.0581413Z inflating: build/lib/libittnotify.a 2025-09-07T08:25:47.0640278Z inflating: build/lib/libasmjit.a 2025-09-07T08:25:47.6121460Z inflating: build/lib/libfbgemm.a 2025-09-07T08:25:47.6148706Z inflating: build/lib/libtensorpipe_uv.a 2025-09-07T08:25:47.7374613Z inflating: build/lib/libtensorpipe.a 2025-09-07T08:25:47.7602215Z inflating: build/lib/libtensorpipe_cuda.a 2025-09-07T08:25:47.9678208Z inflating: build/lib/libgloo.a 2025-09-07T08:25:47.9722389Z inflating: build/lib/libonnx_proto.a 2025-09-07T08:25:48.3908695Z inflating: build/lib/libonnx.a 2025-09-07T08:25:48.4320773Z inflating: build/lib/libgloo_cuda.a 2025-09-07T08:25:48.4338141Z inflating: build/lib/libfmt.a 2025-09-07T08:25:53.6694610Z inflating: build/lib/libdnnl.a 2025-09-07T08:25:53.7111315Z inflating: build/lib/libkineto.a 2025-09-07T08:25:53.7214798Z inflating: build/lib/libc10.so 2025-09-07T08:25:53.7215863Z inflating: build/lib/libtorch_global_deps.so 2025-09-07T08:25:53.7217648Z inflating: build/lib/libcaffe2_nvrtc.so 2025-09-07T08:25:53.7272963Z inflating: build/lib/libc10_cuda.so 2025-09-07T08:25:58.9965010Z inflating: build/lib/libtorch_cpu.so 2025-09-07T08:25:59.0635522Z inflating: build/lib/libtorch_nvshmem.so 2025-09-07T08:26:01.5875402Z inflating: build/lib/libtorch_cuda.so 2025-09-07T08:26:01.5875985Z inflating: build/lib/libtorch.so 2025-09-07T08:26:01.5922089Z inflating: build/lib/libtorch_cuda_linalg.so 2025-09-07T08:26:01.5985316Z inflating: build/lib/libtorchbind_test.so 2025-09-07T08:26:01.6002371Z inflating: build/lib/libjitbackend_test.so 2025-09-07T08:26:01.6024439Z inflating: build/lib/libbackend_with_compiler.so 2025-09-07T08:26:01.6049046Z inflating: build/lib/libaoti_custom_ops.so 2025-09-07T08:26:01.6050989Z inflating: build/lib/libc10d_cuda_test.so 2025-09-07T08:26:01.6054988Z inflating: build/lib/libshm.so 2025-09-07T08:26:01.9230760Z inflating: build/lib/libtorch_python.so 2025-09-07T08:26:01.9963072Z inflating: build/lib/libnnapi_backend.so 2025-09-07T08:26:01.9964474Z creating: build/bin/ 2025-09-07T08:26:02.0362409Z inflating: build/bin/protoc-3.13.0.0 2025-09-07T08:26:02.0759563Z inflating: build/bin/protoc 2025-09-07T08:26:02.0810032Z inflating: build/bin/c10_AllocatorConfig_test 2025-09-07T08:26:02.0857911Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2025-09-07T08:26:02.0906924Z inflating: build/bin/c10_Device_test 2025-09-07T08:26:02.0953279Z inflating: build/bin/c10_StreamGuard_test 2025-09-07T08:26:02.1007686Z inflating: build/bin/c10_SymInt_test 2025-09-07T08:26:02.1057204Z inflating: build/bin/c10_DeviceGuard_test 2025-09-07T08:26:02.1113625Z inflating: build/bin/c10_DispatchKeySet_test 2025-09-07T08:26:02.1167063Z inflating: build/bin/c10_SizesAndStrides_test 2025-09-07T08:26:02.2900208Z inflating: build/bin/c10_InlineDeviceGuard_test 2025-09-07T08:26:02.4730692Z inflating: build/bin/c10_cow_test 2025-09-07T08:26:02.6324061Z inflating: build/bin/c10_Scalar_test 2025-09-07T08:26:02.8280463Z inflating: build/bin/c10_InlineStreamGuard_test 2025-09-07T08:26:02.8330095Z inflating: build/bin/c10_Bitset_test 2025-09-07T08:26:02.8377294Z inflating: build/bin/c10_ArrayRef_test 2025-09-07T08:26:02.8431456Z inflating: build/bin/c10_Enumerate_test 2025-09-07T08:26:02.8477927Z inflating: build/bin/c10_ConstexprCrc_test 2025-09-07T08:26:02.8525514Z inflating: build/bin/c10_DeadlockDetection_test 2025-09-07T08:26:02.8574299Z inflating: build/bin/c10_Half_test 2025-09-07T08:26:02.8627589Z inflating: build/bin/c10_LeftRight_test 2025-09-07T08:26:02.8678245Z inflating: build/bin/c10_IntrusiveList_test 2025-09-07T08:26:02.8730208Z inflating: build/bin/c10_Metaprogramming_test 2025-09-07T08:26:02.8780304Z inflating: build/bin/c10_NetworkFlow_test 2025-09-07T08:26:02.8827791Z inflating: build/bin/c10_Semaphore_test 2025-09-07T08:26:02.8875920Z inflating: build/bin/c10_Synchronized_test 2025-09-07T08:26:02.8928869Z inflating: build/bin/c10_ThreadLocal_test 2025-09-07T08:26:02.8978143Z inflating: build/bin/c10_TypeIndex_test 2025-09-07T08:26:02.9026925Z inflating: build/bin/c10_TypeList_test 2025-09-07T08:26:02.9075784Z inflating: build/bin/c10_accumulate_test 2025-09-07T08:26:02.9129998Z inflating: build/bin/c10_bfloat16_test 2025-09-07T08:26:02.9176911Z inflating: build/bin/c10_TypeTraits_test 2025-09-07T08:26:02.9224957Z inflating: build/bin/c10_bit_cast_test 2025-09-07T08:26:02.9278668Z inflating: build/bin/c10_complex_math_test 2025-09-07T08:26:02.9328423Z inflating: build/bin/c10_exception_test 2025-09-07T08:26:02.9376252Z inflating: build/bin/c10_generic_math_test 2025-09-07T08:26:02.9428125Z inflating: build/bin/c10_complex_test 2025-09-07T08:26:02.9476436Z inflating: build/bin/c10_irange_test 2025-09-07T08:26:02.9530425Z inflating: build/bin/c10_logging_test 2025-09-07T08:26:02.9581113Z inflating: build/bin/c10_lazy_test 2025-09-07T08:26:02.9629090Z inflating: build/bin/c10_flags_test 2025-09-07T08:26:02.9782147Z inflating: build/bin/c10_intrusive_ptr_test 2025-09-07T08:26:02.9937932Z inflating: build/bin/c10_error_test 2025-09-07T08:26:03.1546619Z inflating: build/bin/c10_intrusive_ptr_benchmark 2025-09-07T08:26:03.1596630Z inflating: build/bin/c10_registry_test 2025-09-07T08:26:03.1644470Z inflating: build/bin/c10_tempfile_test 2025-09-07T08:26:03.1697875Z inflating: build/bin/c10_string_util_test 2025-09-07T08:26:03.1840460Z inflating: build/bin/c10_small_vector_test 2025-09-07T08:26:03.1886326Z inflating: build/bin/c10_string_view_test 2025-09-07T08:26:03.1935689Z inflating: build/bin/c10_ssize_test 2025-09-07T08:26:03.1994544Z inflating: build/bin/c10_ordered_preserving_dict_test 2025-09-07T08:26:03.2065010Z inflating: build/bin/c10_optional_test 2025-09-07T08:26:03.2117680Z inflating: build/bin/c10_typeid_test 2025-09-07T08:26:03.2164448Z inflating: build/bin/c10_cuda_CUDATest 2025-09-07T08:26:03.4043136Z inflating: build/bin/vec_test_all_types_DEFAULT 2025-09-07T08:26:03.5583268Z inflating: build/bin/vec_test_all_types_AVX2 2025-09-07T08:26:03.6129703Z inflating: build/bin/vec_test_all_types_AVX512 2025-09-07T08:26:03.6179681Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_stream 2025-09-07T08:26:03.6229956Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_1_var_test 2025-09-07T08:26:03.6280426Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads 2025-09-07T08:26:03.6330988Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block 2025-09-07T08:26:03.6380141Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_from_2_processes 2025-09-07T08:26:03.6430247Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device 2025-09-07T08:26:03.8579910Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks 2025-09-07T08:26:03.9779958Z inflating: build/bin/BackoffTest 2025-09-07T08:26:03.9833029Z inflating: build/bin/TCPStoreTest 2025-09-07T08:26:03.9884380Z inflating: build/bin/HashStoreTest 2025-09-07T08:26:04.1436245Z inflating: build/bin/FileStoreTest 2025-09-07T08:26:04.2533582Z inflating: build/bin/ProcessGroupMPITest 2025-09-07T08:26:04.2535991Z inflating: build/bin/example_allreduce 2025-09-07T08:26:04.2605482Z inflating: build/bin/Dict_test 2025-09-07T08:26:04.2655685Z inflating: build/bin/Dimname_test 2025-09-07T08:26:04.2709898Z inflating: build/bin/NamedTensor_test 2025-09-07T08:26:04.2770559Z inflating: build/bin/MaybeOwned_test 2025-09-07T08:26:04.2826270Z inflating: build/bin/atest 2025-09-07T08:26:04.2886548Z inflating: build/bin/basic 2025-09-07T08:26:04.2942466Z inflating: build/bin/apply_utils_test 2025-09-07T08:26:04.2994597Z inflating: build/bin/broadcast_test 2025-09-07T08:26:04.3043194Z inflating: build/bin/cpu_allocator_test 2025-09-07T08:26:04.3098159Z inflating: build/bin/cpu_generator_test 2025-09-07T08:26:04.3148521Z inflating: build/bin/cpu_profiling_allocator_test 2025-09-07T08:26:04.5472609Z inflating: build/bin/cpu_rng_test 2025-09-07T08:26:04.6513180Z inflating: build/bin/dlconvertor_test 2025-09-07T08:26:04.6568004Z inflating: build/bin/extension_backend_test 2025-09-07T08:26:04.6620156Z inflating: build/bin/half_test 2025-09-07T08:26:04.6667528Z inflating: build/bin/lazy_tensor_test 2025-09-07T08:26:04.6719167Z inflating: build/bin/memory_format_test 2025-09-07T08:26:04.6770235Z inflating: build/bin/math_kernel_test 2025-09-07T08:26:04.6860249Z inflating: build/bin/ivalue_test 2025-09-07T08:26:04.6910822Z inflating: build/bin/memory_overlapping_test 2025-09-07T08:26:04.9357475Z inflating: build/bin/mobile_memory_cleanup 2025-09-07T08:26:05.1058562Z inflating: build/bin/native_test 2025-09-07T08:26:05.1107707Z inflating: build/bin/operator_name_test 2025-09-07T08:26:05.3336820Z inflating: build/bin/operators_test 2025-09-07T08:26:05.4712899Z inflating: build/bin/packedtensoraccessor_test 2025-09-07T08:26:05.4775969Z inflating: build/bin/pow_test 2025-09-07T08:26:05.4830126Z inflating: build/bin/quantized_test 2025-09-07T08:26:05.4878132Z inflating: build/bin/reduce_ops_test 2025-09-07T08:26:05.4926890Z inflating: build/bin/reportMemoryUsage_test 2025-09-07T08:26:05.6478861Z inflating: build/bin/scalar_tensor_test 2025-09-07T08:26:05.8096747Z inflating: build/bin/scalar_test 2025-09-07T08:26:05.8145711Z inflating: build/bin/StorageUtils_test 2025-09-07T08:26:05.8195327Z inflating: build/bin/stride_properties_test 2025-09-07T08:26:05.8270420Z inflating: build/bin/tensor_iterator_test 2025-09-07T08:26:05.8322204Z inflating: build/bin/test_parallel 2025-09-07T08:26:05.8370341Z inflating: build/bin/thread_init_test 2025-09-07T08:26:05.8422914Z inflating: build/bin/type_ptr_test 2025-09-07T08:26:05.8479319Z inflating: build/bin/type_test 2025-09-07T08:26:05.8528743Z inflating: build/bin/undefined_tensor_test 2025-09-07T08:26:05.8575891Z inflating: build/bin/verify_api_visibility 2025-09-07T08:26:05.8641484Z inflating: build/bin/legacy_vmap_test 2025-09-07T08:26:05.8690208Z inflating: build/bin/weakref_test 2025-09-07T08:26:05.8739071Z inflating: build/bin/xla_tensor_test 2025-09-07T08:26:05.8787752Z inflating: build/bin/wrapdim_test 2025-09-07T08:26:06.1300669Z inflating: build/bin/IListRef_test 2025-09-07T08:26:06.2488902Z inflating: build/bin/kernel_function_legacy_test 2025-09-07T08:26:06.2588340Z inflating: build/bin/List_test 2025-09-07T08:26:06.2650791Z inflating: build/bin/KernelFunction_test 2025-09-07T08:26:06.2740843Z inflating: build/bin/kernel_function_test 2025-09-07T08:26:06.2859172Z inflating: build/bin/kernel_lambda_legacy_test 2025-09-07T08:26:06.2955737Z inflating: build/bin/kernel_lambda_test 2025-09-07T08:26:06.3013547Z inflating: build/bin/kernel_stackbased_test 2025-09-07T08:26:06.3103021Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2025-09-07T08:26:06.3151844Z inflating: build/bin/CppSignature_test 2025-09-07T08:26:06.3204636Z inflating: build/bin/backend_fallback_test 2025-09-07T08:26:06.3250995Z inflating: build/bin/op_allowlist_test 2025-09-07T08:26:06.3526195Z inflating: build/bin/op_registration_test 2025-09-07T08:26:06.3588411Z inflating: build/bin/inline_container_test 2025-09-07T08:26:06.3637749Z inflating: build/bin/cuda_allocator_test 2025-09-07T08:26:06.3687644Z inflating: build/bin/cuda_apply_test 2025-09-07T08:26:06.3744023Z inflating: build/bin/cuda_atomic_ops_test 2025-09-07T08:26:06.6009657Z inflating: build/bin/cuda_caching_host_allocator_test 2025-09-07T08:26:06.6076084Z inflating: build/bin/cuda_complex_math_test 2025-09-07T08:26:06.6132030Z inflating: build/bin/cuda_complex_test 2025-09-07T08:26:06.6190833Z inflating: build/bin/cuda_cub_test 2025-09-07T08:26:06.6238941Z inflating: build/bin/cuda_device_test 2025-09-07T08:26:06.6300386Z inflating: build/bin/cuda_distributions_test 2025-09-07T08:26:06.6349815Z inflating: build/bin/cuda_dlconvertor_test 2025-09-07T08:26:06.6397472Z inflating: build/bin/cuda_exchange_device_test 2025-09-07T08:26:06.6445220Z inflating: build/bin/cuda_half_test 2025-09-07T08:26:06.6498514Z inflating: build/bin/cuda_generator_test 2025-09-07T08:26:06.6547567Z inflating: build/bin/cuda_integer_divider_test 2025-09-07T08:26:06.6594801Z inflating: build/bin/cuda_optional_test 2025-09-07T08:26:06.6644133Z inflating: build/bin/cuda_packedtensoraccessor_test 2025-09-07T08:26:06.6693690Z inflating: build/bin/cuda_reportMemoryUsage_test 2025-09-07T08:26:06.6741449Z inflating: build/bin/cuda_allocatorTraceTracker_test 2025-09-07T08:26:06.6798327Z inflating: build/bin/cuda_stream_test 2025-09-07T08:26:06.6845473Z inflating: build/bin/cuda_cudnn_test 2025-09-07T08:26:06.6895472Z inflating: build/bin/cuda_vectorized_test 2025-09-07T08:26:06.7239949Z inflating: build/bin/test_nativert 2025-09-07T08:26:06.7292157Z inflating: build/bin/test_dist_autograd 2025-09-07T08:26:06.7357251Z inflating: build/bin/test_cpp_rpc 2025-09-07T08:26:06.8432326Z inflating: build/bin/test_api 2025-09-07T08:26:06.8434573Z inflating: build/bin/parallel_benchmark 2025-09-07T08:26:06.8497210Z inflating: build/bin/ProcessGroupGlooTest 2025-09-07T08:26:06.8558164Z inflating: build/bin/ProcessGroupNCCLTest 2025-09-07T08:26:06.8612980Z inflating: build/bin/ProcessGroupGlooAsyncTest 2025-09-07T08:26:06.8671646Z inflating: build/bin/ProcessGroupNCCLErrorsTest 2025-09-07T08:26:07.1671732Z inflating: build/bin/test_jit 2025-09-07T08:26:07.2394199Z inflating: build/bin/test_lazy 2025-09-07T08:26:07.2398826Z inflating: build/bin/torch_shm_manager 2025-09-07T08:26:07.2400253Z creating: .additional_ci_files/ 2025-09-07T08:26:07.2502241Z inflating: .additional_ci_files/test-times.json 2025-09-07T08:26:07.2841887Z inflating: .additional_ci_files/test-class-times.json 2025-09-07T08:26:07.3069249Z ##[group]Run rm artifacts.zip 2025-09-07T08:26:07.3069502Z rm artifacts.zip 2025-09-07T08:26:07.3085649Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:26:07.3085945Z env: 2025-09-07T08:26:07.3086122Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:26:07.3086371Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:26:07.3086732Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:26:07.3087010Z ##[endgroup] 2025-09-07T08:26:08.1931304Z ##[group]Run df -H 2025-09-07T08:26:08.1931539Z df -H 2025-09-07T08:26:08.1947759Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:26:08.1948055Z env: 2025-09-07T08:26:08.1948227Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:26:08.1948483Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:26:08.1948825Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:26:08.1949100Z ##[endgroup] 2025-09-07T08:26:08.2164205Z Filesystem Size Used Avail Use% Mounted on 2025-09-07T08:26:08.2164600Z overlay 11T 701G 9.7T 7% / 2025-09-07T08:26:08.2164914Z tmpfs 68M 0 68M 0% /dev 2025-09-07T08:26:08.2165215Z shm 68M 0 68M 0% /dev/shm 2025-09-07T08:26:08.2165541Z /dev/root 11T 701G 9.7T 7% /home/charlie/_work 2025-09-07T08:26:08.2166001Z tmpfs 215G 119k 215G 1% /run/docker.sock 2025-09-07T08:26:08.2166410Z tmpfs 1.1T 13k 1.1T 1% /proc/driver/nvidia 2025-09-07T08:26:08.2166828Z tmpfs 430G 3.0M 430G 1% /run/.ro3822819532/nvidia-persistenced/socket 2025-09-07T08:26:08.2167228Z tmpfs 1.1T 0 1.1T 0% /proc/acpi 2025-09-07T08:26:08.2167528Z tmpfs 1.1T 0 1.1T 0% /proc/scsi 2025-09-07T08:26:08.2167848Z tmpfs 1.1T 0 1.1T 0% /sys/firmware 2025-09-07T08:26:08.5460845Z Prepare all required actions 2025-09-07T08:26:08.5461493Z Getting action download info 2025-09-07T08:26:09.1564883Z ##[group]Run ./.github/actions/download-td-artifacts 2025-09-07T08:26:09.1565146Z with: 2025-09-07T08:26:09.1565297Z env: 2025-09-07T08:26:09.1565452Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:26:09.1565693Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:26:09.1566033Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:26:09.1566321Z ##[endgroup] 2025-09-07T08:26:09.6196887Z ##[group]Run seemethere/download-artifact-s3@v4 2025-09-07T08:26:09.6197148Z with: 2025-09-07T08:26:09.6197400Z name: td_results 2025-09-07T08:26:09.6197578Z s3-bucket: gha-artifacts 2025-09-07T08:26:09.6197773Z region: us-east-1 2025-09-07T08:26:09.6197928Z env: 2025-09-07T08:26:09.6198081Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:26:09.6198340Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:26:09.6198830Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:26:09.6199123Z ##[endgroup] 2025-09-07T08:26:10.0991368Z (node:9247) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-09-07T08:26:10.0992097Z 2025-09-07T08:26:10.0992388Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-09-07T08:26:10.0993185Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-09-07T08:26:10.0994284Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-09-07T08:26:10.2057081Z Found 0 objects with prefix pytorch/pytorch/17525296438/td_results/ 2025-09-07T08:26:10.2063165Z Artifact download has finished successfully 2025-09-07T08:26:10.5484614Z ##[group]Run mkdir -p .additional_ci_files 2025-09-07T08:26:10.5484912Z mkdir -p .additional_ci_files 2025-09-07T08:26:10.5485237Z mv td_results.json .additional_ci_files/td_results.json || true 2025-09-07T08:26:10.5501295Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:26:10.5501575Z env: 2025-09-07T08:26:10.5501738Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:26:10.5501987Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:26:10.5502343Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:26:10.5502627Z ##[endgroup] 2025-09-07T08:26:11.0057141Z mv: cannot stat 'td_results.json': No such file or directory 2025-09-07T08:26:11.7564690Z ##[group]Run .github/scripts/parse_ref.py 2025-09-07T08:26:11.7565044Z .github/scripts/parse_ref.py 2025-09-07T08:26:11.7580851Z shell: /usr/bin/bash -e {0} 2025-09-07T08:26:11.7581073Z env: 2025-09-07T08:26:11.7581247Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:26:11.7581509Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:26:11.7581842Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:26:11.7582122Z ##[endgroup] 2025-09-07T08:26:11.9740349Z Setting output branch=main 2025-09-07T08:26:12.0645330Z Prepare all required actions 2025-09-07T08:26:12.0645663Z Getting action download info 2025-09-07T08:26:12.9934467Z ##[group]Run ./.github/actions/filter-test-configs 2025-09-07T08:26:12.9934804Z with: 2025-09-07T08:26:12.9935325Z github-token: *** 2025-09-07T08:26:12.9941347Z test-matrix: {"include": [{"config": "inductor_huggingface_perf_cuda_h100", "shard": 1, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 2, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 3, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 4, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 5, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 1, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 2, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 3, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 4, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 5, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 6, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 7, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 1, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 2, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 3, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 4, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 5, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 6, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 7, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 8, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 9, "num_shards": 9, "runner": "linux.aws.h100"}]} 2025-09-07T08:26:12.9946843Z job-name: test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100) 2025-09-07T08:26:12.9947465Z env: 2025-09-07T08:26:12.9947631Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:26:12.9947889Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:26:12.9948233Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:26:12.9948523Z ##[endgroup] 2025-09-07T08:26:13.1704182Z ##[group]Run nick-fields/retry@v3.0.0 2025-09-07T08:26:13.1704434Z with: 2025-09-07T08:26:13.1704586Z shell: bash 2025-09-07T08:26:13.1704756Z timeout_minutes: 10 2025-09-07T08:26:13.1704940Z max_attempts: 5 2025-09-07T08:26:13.1705118Z retry_wait_seconds: 30 2025-09-07T08:26:13.1705718Z command: set -eux # PyYAML 6.0 doesn't work with MacOS x86 anymore # This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2 python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-09-07T08:26:13.1706345Z polling_interval_seconds: 1 2025-09-07T08:26:13.1706558Z warning_on_retry: true 2025-09-07T08:26:13.1706767Z continue_on_error: false 2025-09-07T08:26:13.1706964Z env: 2025-09-07T08:26:13.1707124Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:26:13.1707375Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:26:13.1707723Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:26:13.1708161Z GITHUB_TOKEN: *** 2025-09-07T08:26:13.1708350Z ##[endgroup] 2025-09-07T08:26:13.2429178Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-09-07T08:26:13.5074811Z Defaulting to user installation because normal site-packages is not writeable 2025-09-07T08:26:14.8139929Z Collecting requests==2.27.1 2025-09-07T08:26:15.0987149Z Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB) 2025-09-07T08:26:15.8197872Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.1/63.1 KB 73.7 kB/s eta 0:00:00 2025-09-07T08:26:16.7417018Z Collecting pyyaml==6.0.2 2025-09-07T08:26:16.7532665Z Downloading PyYAML-6.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (751 kB) 2025-09-07T08:26:17.8599974Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 751.2/751.2 KB 671.3 kB/s eta 0:00:00 2025-09-07T08:26:17.8720907Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3/dist-packages (from requests==2.27.1) (1.26.5) 2025-09-07T08:26:17.8727531Z Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests==2.27.1) (2020.6.20) 2025-09-07T08:26:17.8736448Z Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests==2.27.1) (3.3) 2025-09-07T08:26:18.7055748Z Collecting charset-normalizer~=2.0.0 2025-09-07T08:26:18.8874043Z Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB) 2025-09-07T08:26:19.1464649Z Installing collected packages: pyyaml, charset-normalizer, requests 2025-09-07T08:26:21.3690116Z WARNING: The script normalizer is installed in '/home/charlie/.local/bin' which is not on PATH. 2025-09-07T08:26:21.3690883Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2025-09-07T08:26:23.9100155Z Successfully installed charset-normalizer-2.0.12 pyyaml-6.0.2 requests-2.27.1 2025-09-07T08:26:24.2471588Z Command completed after 1 attempt(s). 2025-09-07T08:26:24.3518411Z ##[group]Run set -x 2025-09-07T08:26:24.3518616Z set -x 2025-09-07T08:26:24.3518784Z  2025-09-07T08:26:24.3519076Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-09-07T08:26:24.3519441Z # in runner workspace 2025-09-07T08:26:24.3519744Z python3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py" 2025-09-07T08:26:24.3535789Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:26:24.3536086Z env: 2025-09-07T08:26:24.3536244Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:26:24.3536496Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:26:24.3536838Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:26:24.3537123Z ##[endgroup] 2025-09-07T08:26:24.6208521Z + python3 /home/charlie/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py 2025-09-07T08:26:24.6355873Z Setting output branch=main 2025-09-07T08:26:24.7603151Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}" 2025-09-07T08:26:24.7603482Z echo "Workflow: ${GITHUB_WORKFLOW}" 2025-09-07T08:26:24.7603898Z echo "Job name: ${JOB_NAME}" 2025-09-07T08:26:24.7604143Z  2025-09-07T08:26:24.7604422Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-09-07T08:26:24.7604782Z # in runner workspace 2025-09-07T08:26:24.7605112Z python3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \ 2025-09-07T08:26:24.7605484Z  --workflow "${GITHUB_WORKFLOW}" \ 2025-09-07T08:26:24.7605741Z  --job-name "${JOB_NAME}" \ 2025-09-07T08:26:24.7611118Z  --test-matrix "{"include": [{"config": "inductor_huggingface_perf_cuda_h100", "shard": 1, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 2, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 3, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 4, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 5, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 1, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 2, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 3, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 4, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 5, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 6, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 7, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 1, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 2, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 3, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 4, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 5, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 6, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 7, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 8, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 9, "num_shards": 9, "runner": "linux.aws.h100"}]}" \ 2025-09-07T08:26:24.7616629Z  --selected-test-configs "" \ 2025-09-07T08:26:24.7616882Z  --pr-number "${PR_NUMBER}" \ 2025-09-07T08:26:24.7617115Z  --tag "${TAG}" \ 2025-09-07T08:26:24.7617333Z  --event-name "${EVENT_NAME}" \ 2025-09-07T08:26:24.7617567Z  --schedule "${SCHEDULE}" \ 2025-09-07T08:26:24.7617788Z  --branch "${HEAD_BRANCH}" 2025-09-07T08:26:24.7634041Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:26:24.7634347Z env: 2025-09-07T08:26:24.7634520Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:26:24.7634778Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:26:24.7635124Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:26:24.7635610Z GITHUB_TOKEN: *** 2025-09-07T08:26:24.7635908Z JOB_NAME: test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100) 2025-09-07T08:26:24.7636501Z PR_NUMBER: 2025-09-07T08:26:24.7636662Z TAG: 2025-09-07T08:26:24.7636810Z EVENT_NAME: schedule 2025-09-07T08:26:24.7636992Z SCHEDULE: 0 7 * * 0 2025-09-07T08:26:24.7637250Z HEAD_BRANCH: main 2025-09-07T08:26:24.7637441Z ##[endgroup] 2025-09-07T08:26:24.8004135Z Workflow: inductor-perf-nightly-h100 2025-09-07T08:26:24.8004639Z Job name: test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100) 2025-09-07T08:26:25.0540703Z Setting output keep-going=True 2025-09-07T08:26:25.0541229Z Setting output ci-verbose-test-logs=False 2025-09-07T08:26:25.0541762Z Setting output ci-test-showlocals=False 2025-09-07T08:26:25.0542252Z Setting output ci-no-test-timeout=False 2025-09-07T08:26:25.0542699Z Setting output ci-no-td=False 2025-09-07T08:26:25.0543158Z Setting output ci-td-distributed=False 2025-09-07T08:26:25.0543616Z Setting output is-unstable=False 2025-09-07T08:26:25.0544661Z Setting output reenabled-issues= 2025-09-07T08:26:25.2768190Z Setting output test-matrix={"include": [{"config": "inductor_huggingface_perf_cuda_h100", "shard": 1, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 2, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 3, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 4, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 5, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 1, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 2, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 3, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 4, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 5, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 6, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 7, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 1, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 2, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 3, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 4, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 5, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 6, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 7, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 8, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 9, "num_shards": 9, "runner": "linux.aws.h100"}]} 2025-09-07T08:26:25.2774220Z Setting output is-test-matrix-empty=False 2025-09-07T08:26:25.3230999Z ##[group]Run echo "Filtered matrix:" 2025-09-07T08:26:25.3231257Z echo "Filtered matrix:" 2025-09-07T08:26:25.3236472Z echo "{"include": [{"config": "inductor_huggingface_perf_cuda_h100", "shard": 1, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 2, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 3, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 4, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_huggingface_perf_cuda_h100", "shard": 5, "num_shards": 5, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 1, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 2, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 3, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 4, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 5, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 6, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_timm_perf_cuda_h100", "shard": 7, "num_shards": 7, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 1, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 2, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 3, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 4, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 5, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 6, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 7, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 8, "num_shards": 9, "runner": "linux.aws.h100"}, {"config": "inductor_torchbench_perf_cuda_h100", "shard": 9, "num_shards": 9, "runner": "linux.aws.h100"}]}" 2025-09-07T08:26:25.3241523Z  2025-09-07T08:26:25.3241678Z echo 2025-09-07T08:26:25.3241889Z echo "Is the current job unstable? False" 2025-09-07T08:26:25.3242125Z  2025-09-07T08:26:25.3242269Z echo 2025-09-07T08:26:25.3242451Z echo "Is keep-going label set? True" 2025-09-07T08:26:25.3242676Z  2025-09-07T08:26:25.3242817Z echo 2025-09-07T08:26:25.3242981Z echo "Reenabled issues? " 2025-09-07T08:26:25.3258064Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:26:25.3258350Z env: 2025-09-07T08:26:25.3258505Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:26:25.3258760Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:26:25.3259100Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:26:25.3259384Z ##[endgroup] 2025-09-07T08:26:25.3661400Z Filtered matrix: 2025-09-07T08:26:25.3667930Z {include: [{config: inductor_huggingface_perf_cuda_h100, shard: 1, num_shards: 5, runner: linux.aws.h100}, {config: inductor_huggingface_perf_cuda_h100, shard: 2, num_shards: 5, runner: linux.aws.h100}, {config: inductor_huggingface_perf_cuda_h100, shard: 3, num_shards: 5, runner: linux.aws.h100}, {config: inductor_huggingface_perf_cuda_h100, shard: 4, num_shards: 5, runner: linux.aws.h100}, {config: inductor_huggingface_perf_cuda_h100, shard: 5, num_shards: 5, runner: linux.aws.h100}, {config: inductor_timm_perf_cuda_h100, shard: 1, num_shards: 7, runner: linux.aws.h100}, {config: inductor_timm_perf_cuda_h100, shard: 2, num_shards: 7, runner: linux.aws.h100}, {config: inductor_timm_perf_cuda_h100, shard: 3, num_shards: 7, runner: linux.aws.h100}, {config: inductor_timm_perf_cuda_h100, shard: 4, num_shards: 7, runner: linux.aws.h100}, {config: inductor_timm_perf_cuda_h100, shard: 5, num_shards: 7, runner: linux.aws.h100}, {config: inductor_timm_perf_cuda_h100, shard: 6, num_shards: 7, runner: linux.aws.h100}, {config: inductor_timm_perf_cuda_h100, shard: 7, num_shards: 7, runner: linux.aws.h100}, {config: inductor_torchbench_perf_cuda_h100, shard: 1, num_shards: 9, runner: linux.aws.h100}, {config: inductor_torchbench_perf_cuda_h100, shard: 2, num_shards: 9, runner: linux.aws.h100}, {config: inductor_torchbench_perf_cuda_h100, shard: 3, num_shards: 9, runner: linux.aws.h100}, {config: inductor_torchbench_perf_cuda_h100, shard: 4, num_shards: 9, runner: linux.aws.h100}, {config: inductor_torchbench_perf_cuda_h100, shard: 5, num_shards: 9, runner: linux.aws.h100}, {config: inductor_torchbench_perf_cuda_h100, shard: 6, num_shards: 9, runner: linux.aws.h100}, {config: inductor_torchbench_perf_cuda_h100, shard: 7, num_shards: 9, runner: linux.aws.h100}, {config: inductor_torchbench_perf_cuda_h100, shard: 8, num_shards: 9, runner: linux.aws.h100}, {config: inductor_torchbench_perf_cuda_h100, shard: 9, num_shards: 9, runner: linux.aws.h100}]} 2025-09-07T08:26:25.3673348Z 2025-09-07T08:26:25.3673449Z Is the current job unstable? False 2025-09-07T08:26:25.3673606Z 2025-09-07T08:26:25.3673690Z Is keep-going label set? True 2025-09-07T08:26:25.3674024Z 2025-09-07T08:26:25.3674097Z Reenabled issues? 2025-09-07T08:26:25.7407263Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-09-07T08:26:25.7407725Z echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-09-07T08:26:25.7423564Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:26:25.7424074Z env: 2025-09-07T08:26:25.7424256Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:26:25.7424518Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:26:25.7424866Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:26:25.7425160Z JOB_TIMEOUT: 1440 2025-09-07T08:26:25.7425350Z ##[endgroup] 2025-09-07T08:26:25.7961581Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-09-07T08:26:25.7962012Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-09-07T08:26:25.7962362Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-09-07T08:26:25.7977441Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T08:26:25.7977733Z env: 2025-09-07T08:26:25.7977898Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:26:25.7978148Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:26:25.7978484Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:26:25.7978766Z ##[endgroup] 2025-09-07T08:26:26.1727177Z ##[group]Run set -x 2025-09-07T08:26:26.1727482Z set -x 2025-09-07T08:26:26.1727676Z  2025-09-07T08:26:26.1727896Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2025-09-07T08:26:26.1728233Z  TEST_COMMAND=.ci/pytorch/multigpu-test.sh 2025-09-07T08:26:26.1728556Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2025-09-07T08:26:26.1728856Z  TEST_COMMAND=.ci/onnx/test.sh 2025-09-07T08:26:26.1729108Z else 2025-09-07T08:26:26.1729316Z  TEST_COMMAND=.ci/pytorch/test.sh 2025-09-07T08:26:26.1729563Z fi 2025-09-07T08:26:26.1729757Z  2025-09-07T08:26:26.1729978Z # Leaving 1GB for the runner and other things 2025-09-07T08:26:26.1730453Z TOTAL_AVAILABLE_MEMORY_IN_GB=$(awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo) 2025-09-07T08:26:26.1731172Z # https://docs.docker.com/engine/containers/resource_constraints/#--memory-swap-details, the 3GB swap 2025-09-07T08:26:26.1731752Z # comes from https://github.com/pytorch/test-infra/pull/6058 2025-09-07T08:26:26.1732181Z TOTAL_MEMORY_WITH_SWAP=$(("${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}" + 3)) 2025-09-07T08:26:26.1732524Z  2025-09-07T08:26:26.1732740Z if [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then 2025-09-07T08:26:26.1733025Z  SHM_OPTS= 2025-09-07T08:26:26.1733239Z  JENKINS_USER= 2025-09-07T08:26:26.1733521Z  # ensure that docker container cleanly exits in 12 hours 2025-09-07T08:26:26.1734109Z  # if for some reason cleanup action doesn't stop container 2025-09-07T08:26:26.1734433Z  # when job is cancelled 2025-09-07T08:26:26.1734689Z  DOCKER_SHELL_CMD="sleep 12h" 2025-09-07T08:26:26.1734928Z else 2025-09-07T08:26:26.1735134Z  SHM_OPTS="--shm-size=${SHM_SIZE}" 2025-09-07T08:26:26.1735418Z  JENKINS_USER="--user jenkins" 2025-09-07T08:26:26.1735930Z  DOCKER_SHELL_CMD= 2025-09-07T08:26:26.1736144Z fi 2025-09-07T08:26:26.1736316Z  2025-09-07T08:26:26.1736589Z # detached container should get cleaned up by teardown_ec2_linux 2025-09-07T08:26:26.1736982Z # TODO: Stop building test binaries as part of the build phase 2025-09-07T08:26:26.1737424Z # Used for GPU_FLAG, SHM_OPTS, JENKINS_USER and DOCKER_SHELL_CMD since that doesn't play nice 2025-09-07T08:26:26.1737799Z # shellcheck disable=SC2086,SC2090 2025-09-07T08:26:26.1738043Z container_name=$(docker run \ 2025-09-07T08:26:26.1738278Z  ${GPU_FLAG:-} \ 2025-09-07T08:26:26.1738506Z  ${SCCACHE_SERVER_PORT_DOCKER_FLAG:-} \ 2025-09-07T08:26:26.1738755Z  -e BUILD_ENVIRONMENT \ 2025-09-07T08:26:26.1738976Z  -e PR_NUMBER \ 2025-09-07T08:26:26.1739192Z  -e GITHUB_ACTIONS \ 2025-09-07T08:26:26.1739416Z  -e GITHUB_REPOSITORY \ 2025-09-07T08:26:26.1739640Z  -e GITHUB_WORKFLOW \ 2025-09-07T08:26:26.1739858Z  -e GITHUB_JOB \ 2025-09-07T08:26:26.1740063Z  -e GITHUB_RUN_ID \ 2025-09-07T08:26:26.1740273Z  -e GITHUB_RUN_NUMBER \ 2025-09-07T08:26:26.1740489Z  -e GITHUB_RUN_ATTEMPT \ 2025-09-07T08:26:26.1740721Z  -e JOB_ID \ 2025-09-07T08:26:26.1740919Z  -e JOB_NAME \ 2025-09-07T08:26:26.1741118Z  -e BASE_SHA \ 2025-09-07T08:26:26.1741308Z  -e BRANCH \ 2025-09-07T08:26:26.1741496Z  -e SHA1 \ 2025-09-07T08:26:26.1741687Z  -e AWS_DEFAULT_REGION \ 2025-09-07T08:26:26.1741913Z  -e IN_WHEEL_TEST \ 2025-09-07T08:26:26.1742113Z  -e SHARD_NUMBER \ 2025-09-07T08:26:26.1742318Z  -e TEST_CONFIG \ 2025-09-07T08:26:26.1742520Z  -e NUM_TEST_SHARDS \ 2025-09-07T08:26:26.1742735Z  -e REENABLED_ISSUES \ 2025-09-07T08:26:26.1742959Z  -e CONTINUE_THROUGH_ERROR \ 2025-09-07T08:26:26.1743398Z  -e VERBOSE_TEST_LOGS \ 2025-09-07T08:26:26.1743627Z  -e TEST_SHOWLOCALS \ 2025-09-07T08:26:26.1744009Z  -e NO_TEST_TIMEOUT \ 2025-09-07T08:26:26.1744218Z  -e NO_TD \ 2025-09-07T08:26:26.1744406Z  -e TD_DISTRIBUTED \ 2025-09-07T08:26:26.1744615Z  -e PR_LABELS \ 2025-09-07T08:26:26.1744875Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2025-09-07T08:26:26.1745123Z  -e SCCACHE_BUCKET \ 2025-09-07T08:26:26.1745332Z  -e SCCACHE_REGION \ 2025-09-07T08:26:26.1745538Z  -e XLA_CUDA \ 2025-09-07T08:26:26.1745755Z  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ 2025-09-07T08:26:26.1746021Z  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \ 2025-09-07T08:26:26.1746299Z  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \ 2025-09-07T08:26:26.1746580Z  -e SKIP_SCCACHE_INITIALIZATION=1 \ 2025-09-07T08:26:26.1746841Z  -e HUGGING_FACE_HUB_TOKEN \ 2025-09-07T08:26:26.1747087Z  -e VLLM_TEST_HUGGING_FACE_TOKEN \ 2025-09-07T08:26:26.1747340Z  -e SCRIBE_GRAPHQL_ACCESS_TOKEN \ 2025-09-07T08:26:26.1747580Z  -e DASHBOARD_TAG \ 2025-09-07T08:26:26.1747787Z  -e ARTIFACTS_FILE_SUFFIX \ 2025-09-07T08:26:26.1748057Z  --memory="${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}g" \ 2025-09-07T08:26:26.1748363Z  --memory-swap="${TOTAL_MEMORY_WITH_SWAP}g" \ 2025-09-07T08:26:26.1748670Z  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ 2025-09-07T08:26:26.1748957Z  --security-opt seccomp=unconfined \ 2025-09-07T08:26:26.1749207Z  --cap-add=SYS_PTRACE \ 2025-09-07T08:26:26.1749418Z  --ipc=host \ 2025-09-07T08:26:26.1749611Z  ${SHM_OPTS} \ 2025-09-07T08:26:26.1749800Z  --tty \ 2025-09-07T08:26:26.1749975Z  --detach \ 2025-09-07T08:26:26.1750167Z  --name="${container_name}" \ 2025-09-07T08:26:26.1750403Z  ${JENKINS_USER} \ 2025-09-07T08:26:26.1750831Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2025-09-07T08:26:26.1751127Z  -w /var/lib/jenkins/workspace \ 2025-09-07T08:26:26.1751355Z  "${DOCKER_IMAGE}" \ 2025-09-07T08:26:26.1751559Z  ${DOCKER_SHELL_CMD} 2025-09-07T08:26:26.1751763Z ) 2025-09-07T08:26:26.1751984Z # Propagate download.pytorch.org IP to container 2025-09-07T08:26:26.1752477Z grep download.pytorch.org /etc/hosts | docker exec -i "${container_name}" sudo bash -c "/bin/cat >> /etc/hosts" 2025-09-07T08:26:26.1753054Z echo "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}" 2025-09-07T08:26:26.1753351Z  2025-09-07T08:26:26.1753547Z if [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then 2025-09-07T08:26:26.1754108Z  docker exec -t "${container_name}" sh -c "python3 -m pip install -r .ci/docker/requirements-ci.txt" 2025-09-07T08:26:26.1754478Z fi 2025-09-07T08:26:26.1754634Z  2025-09-07T08:26:26.1754999Z docker exec -t "${container_name}" sh -c "python3 -m pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}" 2025-09-07T08:26:26.1770568Z shell: /usr/bin/bash -e {0} 2025-09-07T08:26:26.1770784Z env: 2025-09-07T08:26:26.1770950Z GIT_DEFAULT_BRANCH: main 2025-09-07T08:26:26.1771211Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:26:26.1771562Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T08:26:26.1771915Z BUILD_ENVIRONMENT: linux-jammy-cuda12.8-py3.10-gcc9-sm90 2025-09-07T08:26:26.1772193Z PR_NUMBER: 2025-09-07T08:26:26.1772379Z GITHUB_REPOSITORY: pytorch/pytorch 2025-09-07T08:26:26.1772639Z GITHUB_WORKFLOW: inductor-perf-nightly-h100 2025-09-07T08:26:26.1772876Z GITHUB_JOB: test 2025-09-07T08:26:26.1773055Z GITHUB_RUN_ID: 17525296438 2025-09-07T08:26:26.1773252Z GITHUB_RUN_NUMBER: 662 2025-09-07T08:26:26.1773444Z GITHUB_RUN_ATTEMPT: 1 2025-09-07T08:26:26.1773626Z JOB_ID: 49775781833 2025-09-07T08:26:26.1774253Z JOB_NAME: test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100) 2025-09-07T08:26:26.1774604Z BRANCH: main 2025-09-07T08:26:26.1774803Z SHA1: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T08:26:26.1775076Z BASE_SHA: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T08:26:26.1775342Z TEST_CONFIG: inductor_timm_perf_cuda_h100 2025-09-07T08:26:26.1775573Z SHARD_NUMBER: 2 2025-09-07T08:26:26.1775745Z NUM_TEST_SHARDS: 7 2025-09-07T08:26:26.1775916Z REENABLED_ISSUES: 2025-09-07T08:26:26.1776100Z CONTINUE_THROUGH_ERROR: True 2025-09-07T08:26:26.1776320Z VERBOSE_TEST_LOGS: False 2025-09-07T08:26:26.1776522Z TEST_SHOWLOCALS: False 2025-09-07T08:26:26.1776708Z NO_TEST_TIMEOUT: False 2025-09-07T08:26:26.1776888Z NO_TD: False 2025-09-07T08:26:26.1777056Z TD_DISTRIBUTED: False 2025-09-07T08:26:26.1777287Z SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 2025-09-07T08:26:26.1777549Z SCCACHE_REGION: us-east-1 2025-09-07T08:26:26.1777742Z SHM_SIZE: 2g 2025-09-07T08:26:26.1778386Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:26:26.1779058Z XLA_CUDA: 2025-09-07T08:26:26.1779318Z XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla 2025-09-07T08:26:26.1779640Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 0 2025-09-07T08:26:26.1779883Z PYTORCH_TEST_RERUN_DISABLED_TESTS: 0 2025-09-07T08:26:26.1780780Z DASHBOARD_TAG: training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true 2025-09-07T08:26:26.1781825Z VLLM_TEST_HUGGING_FACE_TOKEN: *** 2025-09-07T08:26:26.1782139Z HUGGING_FACE_HUB_TOKEN: *** 2025-09-07T08:26:26.1782441Z SCRIBE_GRAPHQL_ACCESS_TOKEN: *** 2025-09-07T08:26:26.1782799Z ARTIFACTS_FILE_SUFFIX: test-inductor_timm_perf_cuda_h100-2-7-linux.aws.h100_49775781833 2025-09-07T08:26:26.1783333Z ##[endgroup] 2025-09-07T08:26:26.2193132Z + [[ inductor_timm_perf_cuda_h100 == \m\u\l\t\i\g\p\u ]] 2025-09-07T08:26:26.2193505Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *onnx* ]] 2025-09-07T08:26:26.2194039Z + TEST_COMMAND=.ci/pytorch/test.sh 2025-09-07T08:26:26.2197358Z ++ awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo 2025-09-07T08:26:26.2209102Z + TOTAL_AVAILABLE_MEMORY_IN_GB='1998.946 ' 2025-09-07T08:26:26.2209391Z + TOTAL_MEMORY_WITH_SWAP=2001 2025-09-07T08:26:26.2209701Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *\s\3\9\0\x* ]] 2025-09-07T08:26:26.2210026Z + SHM_OPTS=--shm-size=2g 2025-09-07T08:26:26.2210261Z + JENKINS_USER='--user jenkins' 2025-09-07T08:26:26.2210492Z + DOCKER_SHELL_CMD= 2025-09-07T08:26:26.2219038Z +++ nproc --ignore=2 2025-09-07T08:26:26.2233242Z ++ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all -e SCCACHE_SERVER_PORT=5229 -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e TD_DISTRIBUTED -e PR_LABELS -e MAX_JOBS=22 -e SCCACHE_BUCKET -e SCCACHE_REGION -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e SKIP_SCCACHE_INITIALIZATION=1 -e HUGGING_FACE_HUB_TOKEN -e VLLM_TEST_HUGGING_FACE_TOKEN -e SCRIBE_GRAPHQL_ACCESS_TOKEN -e DASHBOARD_TAG -e ARTIFACTS_FILE_SUFFIX --memory=1998g --memory-swap=2001g --env-file=/tmp/github_env_17525296438 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/charlie/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T08:26:55.1778271Z + container_name=041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T08:26:55.1781757Z + grep download.pytorch.org /etc/hosts 2025-09-07T08:26:55.1783614Z + docker exec -i 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 sudo bash -c '/bin/cat >> /etc/hosts' 2025-09-07T08:26:55.2398847Z + echo DOCKER_CONTAINER_ID=041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T08:26:55.2399405Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *\s\3\9\0\x* ]] 2025-09-07T08:26:55.2402702Z ++ echo dist/torch-2.9.0a0+git93fb23d-cp310-cp310-linux_x86_64.whl 2025-09-07T08:26:55.2405143Z + docker exec -t 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 sh -c 'python3 -m pip install dist/torch-2.9.0a0+git93fb23d-cp310-cp310-linux_x86_64.whl[opt-einsum] && .ci/pytorch/test.sh' 2025-09-07T08:26:55.6554716Z Processing ./dist/torch-2.9.0a0+git93fb23d-cp310-cp310-linux_x86_64.whl (from torch==2.9.0a0+git93fb23d) 2025-09-07T08:26:55.9571580Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (3.19.1) 2025-09-07T08:26:55.9575974Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (4.15.0) 2025-09-07T08:26:55.9578855Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (1.13.3) 2025-09-07T08:26:55.9582802Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (2.8.8) 2025-09-07T08:26:55.9586571Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (3.1.6) 2025-09-07T08:26:55.9590332Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (2025.3.0) 2025-09-07T08:26:55.9602775Z Requirement already satisfied: opt-einsum>=3.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (3.3.0) 2025-09-07T08:26:55.9923416Z Requirement already satisfied: numpy>=1.7 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from opt-einsum>=3.3->torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (1.22.4) 2025-09-07T08:26:55.9940539Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (1.3.0) 2025-09-07T08:26:55.9972305Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch==2.9.0a0+git93fb23d->torch==2.9.0a0+git93fb23d) (3.0.2) 2025-09-07T08:26:56.7843453Z Installing collected packages: torch 2025-09-07T08:27:07.1467731Z ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. 2025-09-07T08:27:07.1468633Z dall-e 0.1 requires torchvision, which is not installed. 2025-09-07T08:27:07.1469046Z effdet 0.4.1 requires torchvision, which is not installed. 2025-09-07T08:27:07.1469533Z python-doctr 1.0.0 requires torchvision>=0.15.0, which is not installed. 2025-09-07T08:27:07.1470085Z pytorch-labs-segment-anything-fast 0.2 requires torchao, which is not installed. 2025-09-07T08:27:07.1470750Z pytorch-labs-segment-anything-fast 0.2 requires torchvision>=0.17.0.dev20231026, which is not installed. 2025-09-07T08:27:07.1471414Z timm 1.0.14 requires torchvision, which is not installed. 2025-09-07T08:27:07.1472335Z Successfully installed torch-2.9.0a0+git93fb23d 2025-09-07T08:27:07.2142875Z + export TERM=vt100 2025-09-07T08:27:07.2143135Z + TERM=vt100 2025-09-07T08:27:07.2147581Z ++ dirname .ci/pytorch/test.sh 2025-09-07T08:27:07.2161828Z + source .ci/pytorch/common.sh 2025-09-07T08:27:07.2165414Z +++ dirname .ci/pytorch/common.sh 2025-09-07T08:27:07.2175249Z ++ source .ci/pytorch/common_utils.sh 2025-09-07T08:27:07.2176122Z +++ declare -f -t trap_add 2025-09-07T08:27:07.2180048Z ++ set -ex -o pipefail 2025-09-07T08:27:07.2180365Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *rocm* ]] 2025-09-07T08:27:07.2180711Z ++ BUILD_TEST_LIBTORCH=0 2025-09-07T08:27:07.2184125Z ++ dirname .ci/pytorch/test.sh 2025-09-07T08:27:07.2193153Z + source .ci/pytorch/common-build.sh 2025-09-07T08:27:07.2194623Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 != *win-* ]] 2025-09-07T08:27:07.2201674Z ++++ dirname .ci/pytorch/common-build.sh 2025-09-07T08:27:07.2214116Z +++ cd .ci/pytorch 2025-09-07T08:27:07.2214435Z +++ pwd -P 2025-09-07T08:27:07.2217006Z ++ script_dir=/var/lib/jenkins/workspace/.ci/pytorch 2025-09-07T08:27:07.2217516Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *-pch* ]] 2025-09-07T08:27:07.2217830Z ++ which sccache 2025-09-07T08:27:07.2233094Z ++ [[ -z ossci-compiler-cache-circleci-v2 ]] 2025-09-07T08:27:07.2233402Z ++ sccache --stop-server 2025-09-07T08:27:07.2261456Z ++ true 2025-09-07T08:27:07.2261697Z ++ rm -f /var/lib/jenkins/sccache_error.log 2025-09-07T08:27:07.2276279Z ++ trap_add sccache_epilogue EXIT 2025-09-07T08:27:07.2276547Z ++ trap_add_cmd=sccache_epilogue 2025-09-07T08:27:07.2276769Z ++ shift 2025-09-07T08:27:07.2276942Z ++ for trap_add_name in "$@" 2025-09-07T08:27:07.2283142Z ++++ trap -p EXIT 2025-09-07T08:27:07.2286125Z +++ eval 'extract_trap_cmd ' 2025-09-07T08:27:07.2286402Z ++++ extract_trap_cmd 2025-09-07T08:27:07.2286644Z ++++ printf '%s\n' '' 2025-09-07T08:27:07.2286885Z +++ printf '%s\n' sccache_epilogue 2025-09-07T08:27:07.2288541Z ++ trap -- ' 2025-09-07T08:27:07.2289221Z sccache_epilogue' EXIT 2025-09-07T08:27:07.2289486Z ++ [[ -n 1 ]] 2025-09-07T08:27:07.2289902Z ++ echo 'Skipping sccache server initialization, setting environment variables' 2025-09-07T08:27:07.2290439Z Skipping sccache server initialization, setting environment variables 2025-09-07T08:27:07.2290823Z ++ export SCCACHE_IDLE_TIMEOUT=0 2025-09-07T08:27:07.2291076Z ++ SCCACHE_IDLE_TIMEOUT=0 2025-09-07T08:27:07.2291384Z ++ export SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-09-07T08:27:07.2291771Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-09-07T08:27:07.2292149Z ++ export RUST_LOG=sccache::server=error 2025-09-07T08:27:07.2292438Z ++ RUST_LOG=sccache::server=error 2025-09-07T08:27:07.2292693Z ++ sccache --zero-stats 2025-09-07T08:27:07.4137617Z Statistics zeroed. 2025-09-07T08:27:07.4146637Z ++ which ccache 2025-09-07T08:27:07.4161979Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 != *rocm* ]] 2025-09-07T08:27:07.4162326Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 != *s390x* ]] 2025-09-07T08:27:07.4162659Z + [[ -d /var/lib/jenkins/workspace ]] 2025-09-07T08:27:07.4166196Z ++ stat -c %u /var/lib/jenkins/workspace 2025-09-07T08:27:07.4181632Z + WORKSPACE_ORIGINAL_OWNER_ID=1000 2025-09-07T08:27:07.4181895Z + trap_add cleanup_workspace EXIT 2025-09-07T08:27:07.4182134Z + trap_add_cmd=cleanup_workspace 2025-09-07T08:27:07.4182356Z + shift 2025-09-07T08:27:07.4182531Z + for trap_add_name in "$@" 2025-09-07T08:27:07.4191591Z +++ trap -p EXIT 2025-09-07T08:27:07.4192906Z ++ eval 'extract_trap_cmd trap -- '\'' 2025-09-07T08:27:07.4193177Z sccache_epilogue'\'' EXIT' 2025-09-07T08:27:07.4193397Z +++ extract_trap_cmd trap -- ' 2025-09-07T08:27:07.4193620Z sccache_epilogue' EXIT 2025-09-07T08:27:07.4193976Z +++ printf '%s\n' ' 2025-09-07T08:27:07.4194164Z sccache_epilogue' 2025-09-07T08:27:07.4194383Z ++ printf '%s\n' cleanup_workspace 2025-09-07T08:27:07.4195789Z + trap -- ' 2025-09-07T08:27:07.4195970Z sccache_epilogue 2025-09-07T08:27:07.4196151Z cleanup_workspace' EXIT 2025-09-07T08:27:07.4196691Z + sudo chown -R jenkins /var/lib/jenkins/workspace 2025-09-07T08:27:10.6057172Z + git config --global --add safe.directory /var/lib/jenkins/workspace 2025-09-07T08:27:10.6082295Z + echo 'Environment variables:' 2025-09-07T08:27:10.6082549Z Environment variables: 2025-09-07T08:27:10.6082745Z + env 2025-09-07T08:27:10.6096463Z GITHUB_WORKSPACE=/home/charlie/_work/pytorch/pytorch 2025-09-07T08:27:10.6096818Z CONTINUE_THROUGH_ERROR=True 2025-09-07T08:27:10.6097120Z BUILD_ENVIRONMENT=linux-jammy-cuda12.8-py3.10-gcc9-sm90 2025-09-07T08:27:10.6100310Z VLLM_TEST_HUGGING_FACE_TOKEN=*** 2025-09-07T08:27:10.6100582Z HOSTNAME=041e022c010b 2025-09-07T08:27:10.6101011Z GITHUB_PATH=/home/charlie/_work/_temp/_runner_file_commands/add_path_57c093f1-306a-4e62-aab8-17cd28a16377 2025-09-07T08:27:10.6101449Z GITHUB_ACTION=__run_2 2025-09-07T08:27:10.6101653Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=0 2025-09-07T08:27:10.6101886Z GITHUB_RUN_NUMBER=662 2025-09-07T08:27:10.6102098Z TEST_CONFIG=inductor_timm_perf_cuda_h100 2025-09-07T08:27:10.6102405Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-09-07T08:27:10.6102659Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2025-09-07T08:27:10.6102904Z SCCACHE_IDLE_TIMEOUT=0 2025-09-07T08:27:10.6103219Z SCRIBE_GRAPHQL_ACCESS_TOKEN=*** 2025-09-07T08:27:10.6103455Z GITHUB_TRIGGERING_ACTOR=pytorchmergebot 2025-09-07T08:27:10.6103694Z GITHUB_REF_TYPE=branch 2025-09-07T08:27:10.6104149Z BASE_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T08:27:10.6104399Z XLA_CUDA= 2025-09-07T08:27:10.6104572Z NCCL_LIB_DIR=/usr/local/cuda/lib64/ 2025-09-07T08:27:10.6104897Z HUGGING_FACE_HUB_TOKEN=*** 2025-09-07T08:27:10.6105251Z *** 2025-09-07T08:27:10.6105424Z GITHUB_REPOSITORY_ID=65600975 2025-09-07T08:27:10.6105636Z GITHUB_ACTIONS=true 2025-09-07T08:27:10.6105825Z NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:27:10.6106089Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-09-07T08:27:10.6106373Z SHA1=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T08:27:10.6106656Z GITHUB_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T08:27:10.6107578Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/inductor-perf-test-nightly-h100.yml@refs/heads/main 2025-09-07T08:27:10.6108030Z UCC_HOME=/usr 2025-09-07T08:27:10.6108193Z VERBOSE_TEST_LOGS=False 2025-09-07T08:27:10.6108388Z GITHUB_REF=refs/heads/main 2025-09-07T08:27:10.6108584Z SHARD_NUMBER=2 2025-09-07T08:27:10.6108757Z GITHUB_REF_PROTECTED=true 2025-09-07T08:27:10.6108945Z HOME=/var/lib/jenkins 2025-09-07T08:27:10.6109130Z SCCACHE_SERVER_PORT=5229 2025-09-07T08:27:10.6109345Z GITHUB_API_URL=https://api.github.com 2025-09-07T08:27:10.6109594Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-09-07T08:27:10.6109890Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152 2025-09-07T08:27:10.6110148Z USE_SYSTEM_NCCL=1 2025-09-07T08:27:10.6110320Z NUM_TEST_SHARDS=7 2025-09-07T08:27:10.6110485Z UCX_HOME=/usr 2025-09-07T08:27:10.6110852Z GITHUB_STATE=/home/charlie/_work/_temp/_runner_file_commands/save_state_57c093f1-306a-4e62-aab8-17cd28a16377 2025-09-07T08:27:10.6111413Z JOB_NAME=test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100) 2025-09-07T08:27:10.6111916Z GITHUB_ENV=/home/charlie/_work/_temp/_runner_file_commands/set_env_57c093f1-306a-4e62-aab8-17cd28a16377 2025-09-07T08:27:10.6112386Z GITHUB_EVENT_PATH=/home/charlie/_work/_temp/_github_workflow/event.json 2025-09-07T08:27:10.6112692Z GITHUB_EVENT_NAME=schedule 2025-09-07T08:27:10.6113556Z DASHBOARD_TAG=training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true 2025-09-07T08:27:10.6114599Z GITHUB_RUN_ID=17525296438 2025-09-07T08:27:10.6114788Z INSTALLED_OPENBLAS= 2025-09-07T08:27:10.6115184Z GITHUB_STEP_SUMMARY=/home/charlie/_work/_temp/_runner_file_commands/step_summary_57c093f1-306a-4e62-aab8-17cd28a16377 2025-09-07T08:27:10.6115613Z GITHUB_ACTOR=pytorchmergebot 2025-09-07T08:27:10.6115803Z PR_NUMBER= 2025-09-07T08:27:10.6115956Z DESIRED_CUDA=12.8.1 2025-09-07T08:27:10.6116330Z GITHUB_RUN_ATTEMPT=1 2025-09-07T08:27:10.6116535Z ANACONDA_PYTHON_VERSION=3.10 2025-09-07T08:27:10.6116780Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-09-07T08:27:10.6117025Z TERM=vt100 2025-09-07T08:27:10.6117297Z INSTALLED_VISION=yes 2025-09-07T08:27:10.6117474Z BRANCH=main 2025-09-07T08:27:10.6117632Z SCCACHE_REGION=us-east-1 2025-09-07T08:27:10.6117834Z OPENSSL_ROOT_DIR=/opt/openssl 2025-09-07T08:27:10.6118040Z CUDA_PATH=/usr/local/cuda 2025-09-07T08:27:10.6118353Z GITHUB_ACTION_PATH=/home/charlie/_work/pytorch/pytorch/./.github/actions/setup-linux 2025-09-07T08:27:10.6118719Z GITHUB_SERVER_URL=https://github.com 2025-09-07T08:27:10.6118973Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96 2025-09-07T08:27:10.6119212Z REENABLED_ISSUES= 2025-09-07T08:27:10.6119384Z DOCS= 2025-09-07T08:27:10.6119525Z SHLVL=1 2025-09-07T08:27:10.6119666Z MAX_JOBS=22 2025-09-07T08:27:10.6119821Z GITHUB_ACTOR_ID=97764156 2025-09-07T08:27:10.6120066Z GITHUB_WORKFLOW_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T08:27:10.6120353Z GITHUB_REF_NAME=main 2025-09-07T08:27:10.6120627Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2025-09-07T08:27:10.6120937Z GITHUB_JOB=test 2025-09-07T08:27:10.6121105Z NO_TEST_TIMEOUT=False 2025-09-07T08:27:10.6121274Z TD_DISTRIBUTED=False 2025-09-07T08:27:10.6121457Z GITHUB_REPOSITORY=pytorch/pytorch 2025-09-07T08:27:10.6121670Z GITHUB_RETENTION_DAYS=90 2025-09-07T08:27:10.6121855Z OPENSSL_DIR=/opt/openssl 2025-09-07T08:27:10.6122036Z GITHUB_ACTION_REPOSITORY= 2025-09-07T08:27:10.6122589Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-09-07T08:27:10.6123159Z GITHUB_BASE_REF= 2025-09-07T08:27:10.6123323Z INSTALLED_ACL= 2025-09-07T08:27:10.6123612Z ARTIFACTS_FILE_SUFFIX=test-inductor_timm_perf_cuda_h100-2-7-linux.aws.h100_49775781833 2025-09-07T08:27:10.6124091Z CI=true 2025-09-07T08:27:10.6124262Z GITHUB_REPOSITORY_OWNER=pytorch 2025-09-07T08:27:10.6124702Z RUST_LOG=sccache::server=error 2025-09-07T08:27:10.6124898Z JOB_ID=49775781833 2025-09-07T08:27:10.6125060Z GITHUB_HEAD_REF= 2025-09-07T08:27:10.6125222Z GITHUB_ACTION_REF= 2025-09-07T08:27:10.6125428Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2025-09-07T08:27:10.6125673Z TEST_SHOWLOCALS=False 2025-09-07T08:27:10.6125879Z GITHUB_WORKFLOW=inductor-perf-nightly-h100 2025-09-07T08:27:10.6126122Z DEBIAN_FRONTEND=noninteractive 2025-09-07T08:27:10.6126522Z GITHUB_OUTPUT=/home/charlie/_work/_temp/_runner_file_commands/set_output_57c093f1-306a-4e62-aab8-17cd28a16377 2025-09-07T08:27:10.6126912Z NO_TD=False 2025-09-07T08:27:10.6127082Z SKIP_SCCACHE_INITIALIZATION=1 2025-09-07T08:27:10.6127308Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/ 2025-09-07T08:27:10.6127526Z _=/usr/bin/env 2025-09-07T08:27:10.6127750Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2025-09-07T08:27:10.6376248Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch 2025-09-07T08:27:10.6376828Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-09-07T08:27:10.6377362Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib 2025-09-07T08:27:10.6377879Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/test 2025-09-07T08:27:10.6378279Z + BUILD_DIR=build 2025-09-07T08:27:10.6378500Z + BUILD_RENAMED_DIR=build_renamed 2025-09-07T08:27:10.6378763Z + BUILD_BIN_DIR=build/bin 2025-09-07T08:27:10.6378989Z + SHARD_NUMBER=2 2025-09-07T08:27:10.6379188Z + NUM_TEST_SHARDS=7 2025-09-07T08:27:10.6379415Z + export TORCH_SERIALIZATION_DEBUG=1 2025-09-07T08:27:10.6379694Z + TORCH_SERIALIZATION_DEBUG=1 2025-09-07T08:27:10.6379955Z + export VALGRIND=ON 2025-09-07T08:27:10.6380160Z + VALGRIND=ON 2025-09-07T08:27:10.6380431Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *clang9* ]] 2025-09-07T08:27:10.6380841Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *xpu* ]] 2025-09-07T08:27:10.6381113Z + detect_cuda_arch 2025-09-07T08:27:10.6381523Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *cuda* ]] 2025-09-07T08:27:10.6381831Z + command -v nvidia-smi 2025-09-07T08:27:10.6382040Z /usr/bin/nvidia-smi 2025-09-07T08:27:10.6387460Z ++ nvidia-smi --query-gpu=compute_cap --format=csv 2025-09-07T08:27:10.6388797Z ++ tail -n 1 2025-09-07T08:27:10.6605944Z + TORCH_CUDA_ARCH_LIST=9.0 2025-09-07T08:27:10.6606182Z + export TORCH_CUDA_ARCH_LIST 2025-09-07T08:27:10.6606529Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *s390x* ]] 2025-09-07T08:27:10.6606817Z + [[ 0 == \1 ]] 2025-09-07T08:27:10.6606988Z + [[ True == \1 ]] 2025-09-07T08:27:10.6607218Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 != *bazel* ]] 2025-09-07T08:27:10.6611598Z ++ realpath build/custom_test_artifacts 2025-09-07T08:27:10.6622022Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts 2025-09-07T08:27:10.6622438Z + [[ -n '' ]] 2025-09-07T08:27:10.6622646Z + echo 'Environment variables' 2025-09-07T08:27:10.6622876Z Environment variables 2025-09-07T08:27:10.6623077Z + env 2025-09-07T08:27:10.6631617Z GITHUB_WORKSPACE=/home/charlie/_work/pytorch/pytorch 2025-09-07T08:27:10.6631951Z CONTINUE_THROUGH_ERROR=True 2025-09-07T08:27:10.6632244Z BUILD_ENVIRONMENT=linux-jammy-cuda12.8-py3.10-gcc9-sm90 2025-09-07T08:27:10.6632714Z VLLM_TEST_HUGGING_FACE_TOKEN=*** 2025-09-07T08:27:10.6632959Z HOSTNAME=041e022c010b 2025-09-07T08:27:10.6633375Z GITHUB_PATH=/home/charlie/_work/_temp/_runner_file_commands/add_path_57c093f1-306a-4e62-aab8-17cd28a16377 2025-09-07T08:27:10.6634032Z GITHUB_ACTION=__run_2 2025-09-07T08:27:10.6634257Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=0 2025-09-07T08:27:10.6634495Z GITHUB_RUN_NUMBER=662 2025-09-07T08:27:10.6634721Z TEST_CONFIG=inductor_timm_perf_cuda_h100 2025-09-07T08:27:10.6634981Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-09-07T08:27:10.6635240Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2025-09-07T08:27:10.6635490Z SCCACHE_IDLE_TIMEOUT=0 2025-09-07T08:27:10.6635811Z SCRIBE_GRAPHQL_ACCESS_TOKEN=*** 2025-09-07T08:27:10.6636330Z GITHUB_TRIGGERING_ACTOR=pytorchmergebot 2025-09-07T08:27:10.6636585Z GITHUB_REF_TYPE=branch 2025-09-07T08:27:10.6636785Z TORCH_CUDA_ARCH_LIST=9.0 2025-09-07T08:27:10.6637028Z BASE_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T08:27:10.6637392Z XLA_CUDA= 2025-09-07T08:27:10.6637577Z NCCL_LIB_DIR=/usr/local/cuda/lib64/ 2025-09-07T08:27:10.6637904Z HUGGING_FACE_HUB_TOKEN=*** 2025-09-07T08:27:10.6638258Z *** 2025-09-07T08:27:10.6638442Z GITHUB_REPOSITORY_ID=65600975 2025-09-07T08:27:10.6638664Z GITHUB_ACTIONS=true 2025-09-07T08:27:10.6638869Z NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T08:27:10.6639136Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-09-07T08:27:10.6639462Z SHA1=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T08:27:10.6639753Z GITHUB_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T08:27:10.6640279Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/inductor-perf-test-nightly-h100.yml@refs/heads/main 2025-09-07T08:27:10.6640756Z UCC_HOME=/usr 2025-09-07T08:27:10.6640956Z TORCH_SERIALIZATION_DEBUG=1 2025-09-07T08:27:10.6641170Z VERBOSE_TEST_LOGS=False 2025-09-07T08:27:10.6641377Z GITHUB_REF=refs/heads/main 2025-09-07T08:27:10.6641594Z SHARD_NUMBER=2 2025-09-07T08:27:10.6641768Z GITHUB_REF_PROTECTED=true 2025-09-07T08:27:10.6641957Z HOME=/var/lib/jenkins 2025-09-07T08:27:10.6642153Z SCCACHE_SERVER_PORT=5229 2025-09-07T08:27:10.6642371Z GITHUB_API_URL=https://api.github.com 2025-09-07T08:27:10.6642618Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-09-07T08:27:10.6642868Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152 2025-09-07T08:27:10.6643143Z USE_SYSTEM_NCCL=1 2025-09-07T08:27:10.6643317Z NUM_TEST_SHARDS=7 2025-09-07T08:27:10.6643481Z UCX_HOME=/usr 2025-09-07T08:27:10.6644038Z GITHUB_STATE=/home/charlie/_work/_temp/_runner_file_commands/save_state_57c093f1-306a-4e62-aab8-17cd28a16377 2025-09-07T08:27:10.6644587Z JOB_NAME=test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100) 2025-09-07T08:27:10.6645274Z GITHUB_ENV=/home/charlie/_work/_temp/_runner_file_commands/set_env_57c093f1-306a-4e62-aab8-17cd28a16377 2025-09-07T08:27:10.6645787Z GITHUB_EVENT_PATH=/home/charlie/_work/_temp/_github_workflow/event.json 2025-09-07T08:27:10.6646121Z GITHUB_EVENT_NAME=schedule 2025-09-07T08:27:10.6647057Z DASHBOARD_TAG=training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true 2025-09-07T08:27:10.6648013Z GITHUB_RUN_ID=17525296438 2025-09-07T08:27:10.6648206Z INSTALLED_OPENBLAS= 2025-09-07T08:27:10.6648615Z GITHUB_STEP_SUMMARY=/home/charlie/_work/_temp/_runner_file_commands/step_summary_57c093f1-306a-4e62-aab8-17cd28a16377 2025-09-07T08:27:10.6649089Z GITHUB_ACTOR=pytorchmergebot 2025-09-07T08:27:10.6649295Z PR_NUMBER= 2025-09-07T08:27:10.6649457Z DESIRED_CUDA=12.8.1 2025-09-07T08:27:10.6649632Z GITHUB_RUN_ATTEMPT=1 2025-09-07T08:27:10.6649808Z VALGRIND=ON 2025-09-07T08:27:10.6649970Z ANACONDA_PYTHON_VERSION=3.10 2025-09-07T08:27:10.6650234Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-09-07T08:27:10.6650492Z TERM=vt100 2025-09-07T08:27:10.6650663Z INSTALLED_VISION=yes 2025-09-07T08:27:10.6650830Z BRANCH=main 2025-09-07T08:27:10.6650997Z SCCACHE_REGION=us-east-1 2025-09-07T08:27:10.6651198Z OPENSSL_ROOT_DIR=/opt/openssl 2025-09-07T08:27:10.6651411Z CUDA_PATH=/usr/local/cuda 2025-09-07T08:27:10.6651722Z GITHUB_ACTION_PATH=/home/charlie/_work/pytorch/pytorch/./.github/actions/setup-linux 2025-09-07T08:27:10.6652095Z GITHUB_SERVER_URL=https://github.com 2025-09-07T08:27:10.6652351Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96 2025-09-07T08:27:10.6652591Z REENABLED_ISSUES= 2025-09-07T08:27:10.6652748Z DOCS= 2025-09-07T08:27:10.6652907Z SHLVL=1 2025-09-07T08:27:10.6653051Z MAX_JOBS=22 2025-09-07T08:27:10.6653204Z GITHUB_ACTOR_ID=97764156 2025-09-07T08:27:10.6653439Z GITHUB_WORKFLOW_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T08:27:10.6653844Z GITHUB_REF_NAME=main 2025-09-07T08:27:10.6654125Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2025-09-07T08:27:10.6654590Z GITHUB_JOB=test 2025-09-07T08:27:10.6654758Z NO_TEST_TIMEOUT=False 2025-09-07T08:27:10.6654943Z TD_DISTRIBUTED=False 2025-09-07T08:27:10.6655131Z GITHUB_REPOSITORY=pytorch/pytorch 2025-09-07T08:27:10.6655345Z GITHUB_RETENTION_DAYS=90 2025-09-07T08:27:10.6655527Z OPENSSL_DIR=/opt/openssl 2025-09-07T08:27:10.6655715Z GITHUB_ACTION_REPOSITORY= 2025-09-07T08:27:10.6656265Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-09-07T08:27:10.6656829Z GITHUB_BASE_REF= 2025-09-07T08:27:10.6656984Z INSTALLED_ACL= 2025-09-07T08:27:10.6657279Z ARTIFACTS_FILE_SUFFIX=test-inductor_timm_perf_cuda_h100-2-7-linux.aws.h100_49775781833 2025-09-07T08:27:10.6657612Z CI=true 2025-09-07T08:27:10.6657774Z GITHUB_REPOSITORY_OWNER=pytorch 2025-09-07T08:27:10.6658012Z RUST_LOG=sccache::server=error 2025-09-07T08:27:10.6658212Z JOB_ID=49775781833 2025-09-07T08:27:10.6658381Z GITHUB_HEAD_REF= 2025-09-07T08:27:10.6658541Z GITHUB_ACTION_REF= 2025-09-07T08:27:10.6658742Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2025-09-07T08:27:10.6658997Z TEST_SHOWLOCALS=False 2025-09-07T08:27:10.6659207Z GITHUB_WORKFLOW=inductor-perf-nightly-h100 2025-09-07T08:27:10.6659456Z DEBIAN_FRONTEND=noninteractive 2025-09-07T08:27:10.6659838Z GITHUB_OUTPUT=/home/charlie/_work/_temp/_runner_file_commands/set_output_57c093f1-306a-4e62-aab8-17cd28a16377 2025-09-07T08:27:10.6660235Z NO_TD=False 2025-09-07T08:27:10.6660413Z SKIP_SCCACHE_INITIALIZATION=1 2025-09-07T08:27:10.6660644Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/ 2025-09-07T08:27:10.6660856Z _=/usr/bin/env 2025-09-07T08:27:10.6661021Z + echo 'Testing pytorch' 2025-09-07T08:27:10.6661211Z Testing pytorch 2025-09-07T08:27:10.6661385Z + export LANG=C.UTF-8 2025-09-07T08:27:10.6661550Z + LANG=C.UTF-8 2025-09-07T08:27:10.6661706Z + PR_NUMBER= 2025-09-07T08:27:10.6661912Z + [[ inductor_timm_perf_cuda_h100 == \d\e\f\a\u\l\t ]] 2025-09-07T08:27:10.6662365Z + [[ inductor_timm_perf_cuda_h100 == \d\i\s\t\r\i\b\u\t\e\d ]] 2025-09-07T08:27:10.6662649Z + [[ inductor_timm_perf_cuda_h100 == \s\l\o\w ]] 2025-09-07T08:27:10.6662974Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *slow-gradcheck* ]] 2025-09-07T08:27:10.6663306Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *cuda* ]] 2025-09-07T08:27:10.6663588Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-09-07T08:27:10.6664049Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-09-07T08:27:10.6664302Z + [[ inductor_timm_perf_cuda_h100 == *crossref* ]] 2025-09-07T08:27:10.6664586Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *rocm* ]] 2025-09-07T08:27:10.6664878Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *xpu* ]] 2025-09-07T08:27:10.6665169Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 != *-bazel-* ]] 2025-09-07T08:27:10.6665451Z + pip_install ninja==1.10.2 2025-09-07T08:27:10.6665709Z + pip_install_pkg='python3 -m pip install --progress-bar off' 2025-09-07T08:27:10.6666053Z + python3 -m pip install --progress-bar off ninja==1.10.2 2025-09-07T08:27:13.5365486Z Collecting ninja==1.10.2 2025-09-07T08:27:13.6989055Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB) 2025-09-07T08:27:15.8201219Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) 2025-09-07T08:27:17.9098913Z Installing collected packages: ninja 2025-09-07T08:27:17.9099328Z Attempting uninstall: ninja 2025-09-07T08:27:17.9106980Z Found existing installation: ninja 1.11.1.3 2025-09-07T08:27:17.9128348Z Uninstalling ninja-1.11.1.3: 2025-09-07T08:27:19.4747321Z Successfully uninstalled ninja-1.11.1.3 2025-09-07T08:27:20.4024255Z Successfully installed ninja-1.10.2 2025-09-07T08:27:20.4874879Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-09-07T08:27:20.4877438Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-09-07T08:27:20.4878530Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *aarch64* ]] 2025-09-07T08:27:20.4879006Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *asan* ]] 2025-09-07T08:27:20.4879458Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *-debug* ]] 2025-09-07T08:27:20.4879910Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 != *-bazel-* ]] 2025-09-07T08:27:20.4880545Z + echo 'We are not in debug mode: linux-jammy-cuda12.8-py3.10-gcc9-sm90. Expect the assertion to pass' 2025-09-07T08:27:20.4881333Z We are not in debug mode: linux-jammy-cuda12.8-py3.10-gcc9-sm90. Expect the assertion to pass 2025-09-07T08:27:20.4881878Z + cd test 2025-09-07T08:27:20.4882240Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)' 2025-09-07T08:27:21.0300775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:27:21.0302135Z import pynvml # type: ignore[import] 2025-09-07T08:27:22.0413392Z + [[ inductor_timm_perf_cuda_h100 == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2025-09-07T08:27:22.0414282Z + [[ inductor_timm_perf_cuda_h100 == \n\o\g\p\u\_\A\V\X\5\1\2 ]] 2025-09-07T08:27:22.0414720Z + [[ inductor_timm_perf_cuda_h100 == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]] 2025-09-07T08:27:22.0416211Z + DYNAMO_BENCHMARK_FLAGS=() 2025-09-07T08:27:22.0416924Z + [[ inductor_timm_perf_cuda_h100 == *pr_time_benchmarks* ]] 2025-09-07T08:27:22.0417315Z + [[ inductor_timm_perf_cuda_h100 == *dynamo_eager* ]] 2025-09-07T08:27:22.0417643Z + [[ inductor_timm_perf_cuda_h100 == *aot_eager* ]] 2025-09-07T08:27:22.0417947Z + [[ inductor_timm_perf_cuda_h100 == *aot_inductor* ]] 2025-09-07T08:27:22.0418897Z + [[ inductor_timm_perf_cuda_h100 == *max_autotune_inductor* ]] 2025-09-07T08:27:22.0419260Z + [[ inductor_timm_perf_cuda_h100 == *inductor* ]] 2025-09-07T08:27:22.0419553Z + [[ inductor_timm_perf_cuda_h100 != *perf* ]] 2025-09-07T08:27:22.0419851Z + [[ inductor_timm_perf_cuda_h100 == *dynamic* ]] 2025-09-07T08:27:22.0420134Z + [[ inductor_timm_perf_cuda_h100 == *cpu* ]] 2025-09-07T08:27:22.0420424Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda) 2025-09-07T08:27:22.0435011Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *libtorch* ]] 2025-09-07T08:27:22.0435377Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *-bazel-* ]] 2025-09-07T08:27:22.0438929Z + cd test 2025-09-07T08:27:22.0439641Z + python -c 'import torch; print(torch.__config__.show())' 2025-09-07T08:27:22.5577608Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:27:22.5578672Z import pynvml # type: ignore[import] 2025-09-07T08:27:23.7621621Z PyTorch built with: 2025-09-07T08:27:23.7622032Z - GCC 9.5 2025-09-07T08:27:23.7622372Z - C++ Version: 201703 2025-09-07T08:27:23.7623220Z - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-09-07T08:27:23.7624549Z - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-09-07T08:27:23.7625220Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-09-07T08:27:23.7625781Z - LAPACK is enabled (usually provided by MKL) 2025-09-07T08:27:23.7626113Z - NNPACK is enabled 2025-09-07T08:27:23.7626331Z - CPU capability usage: AVX2 2025-09-07T08:27:23.7626559Z - CUDA Runtime 12.8 2025-09-07T08:27:23.7626842Z - NVCC architecture flags: -gencode;arch=compute_90,code=sm_90 2025-09-07T08:27:23.7627162Z - CuDNN 90.8 2025-09-07T08:27:23.7630936Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=93fb23d6fae7c4e82c4239a1033e522088742634, CUDA_VERSION=12.8, CUDNN_VERSION=9.8.0, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Werror -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=ON, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 2025-09-07T08:27:23.7635397Z 2025-09-07T08:27:24.2609963Z + cd test 2025-09-07T08:27:24.2610271Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2025-09-07T08:27:24.7941940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:27:24.7943392Z import pynvml # type: ignore[import] 2025-09-07T08:27:25.5227104Z ATen/Parallel: 2025-09-07T08:27:25.5227414Z at::get_num_threads() : 24 2025-09-07T08:27:25.5227688Z at::get_num_interop_threads() : 96 2025-09-07T08:27:25.5227998Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-09-07T08:27:25.5228275Z omp_get_max_threads() : 24 2025-09-07T08:27:25.5229104Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-09-07T08:27:25.5229662Z mkl_get_max_threads() : 24 2025-09-07T08:27:25.5230018Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-09-07T08:27:25.5230423Z std::thread::hardware_concurrency() : 192 2025-09-07T08:27:25.5230710Z Environment variables: 2025-09-07T08:27:25.5230944Z OMP_NUM_THREADS : [not set] 2025-09-07T08:27:25.5231191Z MKL_NUM_THREADS : [not set] 2025-09-07T08:27:25.5231456Z ATen parallel backend: OpenMP 2025-09-07T08:27:25.5231621Z 2025-09-07T08:27:25.7705489Z + [[ inductor_timm_perf_cuda_h100 == *numpy_2* ]] 2025-09-07T08:27:25.7706084Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *aarch64* ]] 2025-09-07T08:27:25.7706589Z + [[ inductor_timm_perf_cuda_h100 == *backward* ]] 2025-09-07T08:27:25.7707017Z + [[ inductor_timm_perf_cuda_h100 == *xla* ]] 2025-09-07T08:27:25.7707412Z + [[ inductor_timm_perf_cuda_h100 == *vllm* ]] 2025-09-07T08:27:25.7707844Z + [[ inductor_timm_perf_cuda_h100 == *executorch* ]] 2025-09-07T08:27:25.7708362Z + [[ inductor_timm_perf_cuda_h100 == \j\i\t\_\l\e\g\a\c\y ]] 2025-09-07T08:27:25.7708879Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *libtorch* ]] 2025-09-07T08:27:25.7709352Z + [[ inductor_timm_perf_cuda_h100 == distributed ]] 2025-09-07T08:27:25.7709833Z + [[ inductor_timm_perf_cuda_h100 == *operator_benchmark* ]] 2025-09-07T08:27:25.7710361Z + [[ inductor_timm_perf_cuda_h100 == *inductor_distributed* ]] 2025-09-07T08:27:25.7710871Z + [[ inductor_timm_perf_cuda_h100 == *inductor-halide* ]] 2025-09-07T08:27:25.7711378Z + [[ inductor_timm_perf_cuda_h100 == *inductor-triton-cpu* ]] 2025-09-07T08:27:25.7711922Z + [[ inductor_timm_perf_cuda_h100 == *inductor-micro-benchmark* ]] 2025-09-07T08:27:25.7712444Z + [[ inductor_timm_perf_cuda_h100 == *huggingface* ]] 2025-09-07T08:27:25.7712874Z + [[ inductor_timm_perf_cuda_h100 == *timm* ]] 2025-09-07T08:27:25.7713256Z + install_torchvision 2025-09-07T08:27:25.7713544Z + local orig_preload 2025-09-07T08:27:25.7714620Z + local commit 2025-09-07T08:27:25.7714913Z ++ get_pinned_commit vision 2025-09-07T08:27:25.7715263Z ++ cat .github/ci_commit_pins/vision.txt 2025-09-07T08:27:25.7728908Z + commit=966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-09-07T08:27:25.7729344Z + orig_preload= 2025-09-07T08:27:25.7729624Z + '[' -n '' ']' 2025-09-07T08:27:25.7729973Z + [[ linux-jammy-cuda12.8-py3.10-gcc9-sm90 == *cuda* ]] 2025-09-07T08:27:25.7730402Z + export FORCE_CUDA=1 2025-09-07T08:27:25.7730690Z + FORCE_CUDA=1 2025-09-07T08:27:25.7730950Z + export WITH_CUDA=1 2025-09-07T08:27:25.7731225Z + WITH_CUDA=1 2025-09-07T08:27:25.7731885Z + pip_build_and_install git+https://github.com/pytorch/vision.git@966da7e46f65d6d49df3e31214470a4fe5cc8e66 dist/vision 2025-09-07T08:27:25.7732957Z + local build_target=git+https://github.com/pytorch/vision.git@966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-09-07T08:27:25.7733636Z + local wheel_dir=dist/vision 2025-09-07T08:27:25.7734144Z + local found_whl=0 2025-09-07T08:27:25.7734454Z + for file in "${wheel_dir}"/*.whl 2025-09-07T08:27:25.7735006Z + [[ -f dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl ]] 2025-09-07T08:27:25.7735550Z + found_whl=1 2025-09-07T08:27:25.7735806Z + break 2025-09-07T08:27:25.7736045Z + '[' 1 == 0 ']' 2025-09-07T08:27:25.7736323Z + for file in "${wheel_dir}"/*.whl 2025-09-07T08:27:25.7736908Z + pip_install_whl dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl 2025-09-07T08:27:25.7737697Z + args=('dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl') 2025-09-07T08:27:25.7738253Z + local args 2025-09-07T08:27:25.7738721Z + [[ dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl == *\ * ]] 2025-09-07T08:27:25.7739297Z + for path in "${args[@]}" 2025-09-07T08:27:25.7739855Z + echo 'Installing dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl' 2025-09-07T08:27:25.7740671Z Installing dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl 2025-09-07T08:27:25.7741958Z + python3 -mpip install --no-index --no-deps dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl 2025-09-07T08:27:26.1209647Z Processing ./dist/vision/torchvision-0.22.0a0+966da7e-cp310-cp310-linux_x86_64.whl 2025-09-07T08:27:26.1293205Z Installing collected packages: torchvision 2025-09-07T08:27:27.3361107Z Successfully installed torchvision-0.22.0a0+966da7e 2025-09-07T08:27:27.3705297Z + '[' -n '' ']' 2025-09-07T08:27:27.3705540Z + id=1 2025-09-07T08:27:27.3705759Z + test_dynamo_benchmark timm_models 1 2025-09-07T08:27:27.3710685Z ++ pwd 2025-09-07T08:27:27.3714828Z + TEST_REPORTS_DIR=/var/lib/jenkins/workspace/test/test-reports 2025-09-07T08:27:27.3715218Z + local suite=timm_models 2025-09-07T08:27:27.3715462Z + shift 2025-09-07T08:27:27.3715643Z + local shard_id=1 2025-09-07T08:27:27.3715853Z + shift 2025-09-07T08:27:27.3716094Z + [[ inductor_timm_perf_cuda_h100 == *perf_compare* ]] 2025-09-07T08:27:27.3716469Z + [[ inductor_timm_perf_cuda_h100 == *perf* ]] 2025-09-07T08:27:27.3716841Z + [[ inductor_timm_perf_cuda_h100 == *b200* ]] 2025-09-07T08:27:27.3717271Z + test_single_dynamo_benchmark dashboard timm_models 1 2025-09-07T08:27:27.3719622Z ++ pwd 2025-09-07T08:27:27.3721927Z + TEST_REPORTS_DIR=/var/lib/jenkins/workspace/test/test-reports 2025-09-07T08:27:27.3722327Z + mkdir -p /var/lib/jenkins/workspace/test/test-reports 2025-09-07T08:27:27.3744979Z + local name=dashboard 2025-09-07T08:27:27.3745211Z + shift 2025-09-07T08:27:27.3745385Z + local suite=timm_models 2025-09-07T08:27:27.3745589Z + shift 2025-09-07T08:27:27.3745750Z + local shard_id=1 2025-09-07T08:27:27.3745935Z + shift 2025-09-07T08:27:27.3746103Z + partition_flags=() 2025-09-07T08:27:27.3746305Z + local partition_flags 2025-09-07T08:27:27.3746499Z + [[ -n 7 ]] 2025-09-07T08:27:27.3746673Z + [[ -n 1 ]] 2025-09-07T08:27:27.3746998Z + partition_flags=(--total-partitions "$NUM_TEST_SHARDS" --partition-id "$shard_id") 2025-09-07T08:27:27.3747439Z + [[ inductor_timm_perf_cuda_h100 == *perf_compare* ]] 2025-09-07T08:27:27.3748130Z + [[ inductor_timm_perf_cuda_h100 == *perf* ]] 2025-09-07T08:27:27.3748515Z + test_perf_for_dashboard timm_models --device cuda --total-partitions 7 --partition-id 1 2025-09-07T08:27:27.3750192Z ++ pwd 2025-09-07T08:27:27.3752729Z + TEST_REPORTS_DIR=/var/lib/jenkins/workspace/test/test-reports 2025-09-07T08:27:27.3753125Z + mkdir -p /var/lib/jenkins/workspace/test/test-reports 2025-09-07T08:27:27.3770424Z + local suite=timm_models 2025-09-07T08:27:27.3770805Z + shift 2025-09-07T08:27:27.3771013Z + local backend=inductor 2025-09-07T08:27:27.3771238Z + modes=() 2025-09-07T08:27:27.3771414Z + local modes 2025-09-07T08:27:27.3772511Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *training-true* ]] 2025-09-07T08:27:27.3773857Z + modes+=(training) 2025-09-07T08:27:27.3774964Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *inference-true* ]] 2025-09-07T08:27:27.3776128Z + modes+=(inference) 2025-09-07T08:27:27.3776360Z + targets=('accuracy' 'performance') 2025-09-07T08:27:27.3776628Z + local targets 2025-09-07T08:27:27.3776819Z + local device=cuda 2025-09-07T08:27:27.3777041Z + [[ inductor_timm_perf_cuda_h100 == *cpu* ]] 2025-09-07T08:27:27.3777344Z + [[ inductor_timm_perf_cuda_h100 == *cuda_a10g* ]] 2025-09-07T08:27:27.3777649Z + [[ inductor_timm_perf_cuda_h100 == *h100* ]] 2025-09-07T08:27:27.3777906Z + device=cuda_h100 2025-09-07T08:27:27.3778097Z + for mode in "${modes[@]}" 2025-09-07T08:27:27.3778316Z + [[ training == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T08:27:27.3778563Z + [[ training == \t\r\a\i\n\i\n\g ]] 2025-09-07T08:27:27.3778787Z + dtype=amp 2025-09-07T08:27:27.3778967Z + for target in "${targets[@]}" 2025-09-07T08:27:27.3779196Z + target_flag=('--accuracy') 2025-09-07T08:27:27.3779407Z + local target_flag 2025-09-07T08:27:27.3779888Z + [[ accuracy == \p\e\r\f\o\r\m\a\n\c\e ]] 2025-09-07T08:27:27.3780165Z + [[ accuracy == \a\c\c\u\r\a\c\y ]] 2025-09-07T08:27:27.3780424Z + target_flag+=(--no-translation-validation) 2025-09-07T08:27:27.3781500Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *freezing-true* ]] 2025-09-07T08:27:27.3783373Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *default-true* ]] 2025-09-07T08:27:27.3785539Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --training --amp --backend inductor --disable-cudagraphs --device cuda --total-partitions 7 --partition-id 1 --output /var/lib/jenkins/workspace/test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.csv 2025-09-07T08:27:28.3931471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:27:28.3932636Z import pynvml # type: ignore[import] 2025-09-07T08:27:32.7595263Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:27:32.7596510Z import pynvml # type: ignore[import] 2025-09-07T08:27:35.7048515Z 2025-09-07T08:27:36.0428648Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:27:36.0429052Z 2025-09-07T08:27:36.1515934Z model.safetensors: 0% 0.00/34.2M [00:00 will be ignored 2025-09-07T08:28:55.5848685Z pass 2025-09-07T08:29:01.8305135Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:29:01.8306462Z import pynvml # type: ignore[import] 2025-09-07T08:29:04.7843981Z 2025-09-07T08:29:05.3612753Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:29:05.3612991Z 2025-09-07T08:29:05.4720558Z model.safetensors: 0% 0.00/111M [00:00 will be ignored 2025-09-07T08:30:20.4651060Z pass 2025-09-07T08:30:25.8610506Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:30:25.8611725Z import pynvml # type: ignore[import] 2025-09-07T08:30:28.8350896Z 2025-09-07T08:30:29.8475967Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:30:29.8476208Z 2025-09-07T08:30:29.9627806Z model.safetensors: 0% 0.00/349M [00:00 will be ignored 2025-09-07T08:31:14.4465286Z pass 2025-09-07T08:31:19.2571011Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:31:19.2572206Z import pynvml # type: ignore[import] 2025-09-07T08:31:22.2115690Z 2025-09-07T08:31:23.0607967Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:31:23.0608474Z 2025-09-07T08:31:23.1746990Z model.safetensors: 0% 0.00/133M [00:00 will be ignored 2025-09-07T08:33:14.8395224Z pass 2025-09-07T08:33:21.8837136Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:33:21.8838413Z import pynvml # type: ignore[import] 2025-09-07T08:33:24.8330724Z 2025-09-07T08:33:25.7342457Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:33:25.7342814Z 2025-09-07T08:33:25.8476309Z model.safetensors: 0% 0.00/286M [00:00 will be ignored 2025-09-07T08:34:40.1071161Z pass 2025-09-07T08:34:45.9490309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:34:45.9491626Z import pynvml # type: ignore[import] 2025-09-07T08:34:48.9068009Z 2025-09-07T08:34:49.6939059Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:34:49.6939389Z 2025-09-07T08:34:49.8087004Z model.safetensors: 0% 0.00/349M [00:00 will be ignored 2025-09-07T08:37:03.4212694Z pass 2025-09-07T08:37:08.1389467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:37:08.1390599Z import pynvml # type: ignore[import] 2025-09-07T08:37:11.2129553Z 2025-09-07T08:37:11.4726447Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:37:11.4726781Z 2025-09-07T08:37:11.6006365Z model.safetensors: 0% 0.00/43.2M [00:00 will be ignored 2025-09-07T08:38:12.2715215Z pass 2025-09-07T08:38:16.2758831Z accuracy pass_rate=100.00% 2025-09-07T08:38:16.2763656Z calls_captured gmean=1214.80x mean=1314.000x 2025-09-07T08:38:16.2767420Z unique_graphs gmean=2.85x mean=2.875x 2025-09-07T08:38:16.2770784Z graph_breaks gmean=6.87x mean=6.875x 2025-09-07T08:38:16.2774535Z unique_graph_breaks gmean=5.00x mean=5.000x 2025-09-07T08:38:16.2777546Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T08:38:16.2780661Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T08:38:16.2783979Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T08:38:16.2785356Z compilation_latency mean=68.250 seconds 2025-09-07T08:38:17.2646242Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *cudagraphs-true* ]] 2025-09-07T08:38:17.2648461Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --training --amp --backend inductor --device cuda --total-partitions 7 --partition-id 1 --output /var/lib/jenkins/workspace/test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.csv 2025-09-07T08:38:18.3016443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:38:18.3018971Z import pynvml # type: ignore[import] 2025-09-07T08:38:22.6515735Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:38:22.6516985Z import pynvml # type: ignore[import] 2025-09-07T08:38:25.6392392Z 2025-09-07T08:38:27.1840705Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:38:27.1841263Z loading model: 0it [00:01, ?it/s] 2025-09-07T08:38:27.1841741Z cuda train crossvit_9_240 2025-09-07T08:39:06.2933428Z W0907 08:39:06.292000 19865 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T08:39:44.8384466Z pass 2025-09-07T08:39:51.0112971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:39:51.0115516Z import pynvml # type: ignore[import] 2025-09-07T08:39:53.9771587Z 2025-09-07T08:39:55.6989169Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:39:55.6989673Z loading model: 0it [00:01, ?it/s] 2025-09-07T08:39:55.6990091Z cuda train cspdarknet53 2025-09-07T08:40:40.3205152Z W0907 08:40:40.319000 20126 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T08:41:09.8896142Z pass 2025-09-07T08:41:15.3445845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:41:15.3447610Z import pynvml # type: ignore[import] 2025-09-07T08:41:18.3241618Z 2025-09-07T08:41:20.1241484Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:41:20.1241865Z loading model: 0it [00:01, ?it/s] 2025-09-07T08:41:20.1242191Z cuda train deit_base_distilled_patch16_224 2025-09-07T08:41:40.0627026Z W0907 08:41:40.061000 20385 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T08:42:01.2366389Z pass 2025-09-07T08:42:06.2085098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:42:06.2086376Z import pynvml # type: ignore[import] 2025-09-07T08:42:09.1935502Z 2025-09-07T08:42:10.8123488Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:42:10.8124121Z loading model: 0it [00:01, ?it/s] 2025-09-07T08:42:10.8124409Z cuda train dla102 2025-09-07T08:43:12.9374175Z W0907 08:43:12.936000 20644 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T08:44:02.6629992Z pass 2025-09-07T08:44:09.8313594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:44:09.8314981Z import pynvml # type: ignore[import] 2025-09-07T08:44:12.8021325Z 2025-09-07T08:44:14.7943442Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:44:14.7944253Z loading model: 0it [00:01, ?it/s] 2025-09-07T08:44:14.7944578Z cuda train dm_nfnet_f0 2025-09-07T08:44:51.4806094Z W0907 08:44:51.479000 20903 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T08:45:25.8462820Z pass 2025-09-07T08:45:31.8784283Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:45:31.8786064Z import pynvml # type: ignore[import] 2025-09-07T08:45:34.8815964Z 2025-09-07T08:45:36.7286950Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:45:36.7287277Z loading model: 0it [00:01, ?it/s] 2025-09-07T08:45:36.7287537Z cuda train dpn107 2025-09-07T08:46:42.5438092Z pass 2025-09-07T08:46:49.1676697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:46:49.1678106Z import pynvml # type: ignore[import] 2025-09-07T08:46:52.1417240Z 2025-09-07T08:46:53.3428602Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:46:53.3429126Z loading model: 0it [00:01, ?it/s] 2025-09-07T08:46:53.3429551Z cuda train eca_botnext26ts_256 2025-09-07T08:47:29.7402899Z W0907 08:47:29.738000 21421 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T08:47:44.9844938Z pass 2025-09-07T08:47:49.9407589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:47:49.9409552Z import pynvml # type: ignore[import] 2025-09-07T08:47:52.9437893Z 2025-09-07T08:47:54.3741865Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:47:54.3742236Z loading model: 0it [00:01, ?it/s] 2025-09-07T08:47:54.3742551Z cuda train eca_halonext26ts 2025-09-07T08:48:35.9658270Z skipping cudagraphs due to disabling cudagraphs due to incompatible op aten.index_put_.default Found from File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 442, in torch_dynamo_resume_in_forward_and_backward_pass_at_440 2025-09-07T08:48:35.9659418Z pred = mod(*cloned_inputs) 2025-09-07T08:48:35.9659977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/byobnet.py", line 1433, in forward 2025-09-07T08:48:35.9660521Z x = self.forward_features(x) 2025-09-07T08:48:35.9661068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/byobnet.py", line 1425, in forward_features 2025-09-07T08:48:35.9661625Z x = self.stages(x) 2025-09-07T08:48:35.9662135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/byobnet.py", line 894, in forward 2025-09-07T08:48:35.9662638Z x = self.self_attn(x) 2025-09-07T08:48:35.9663107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/layers/halo_attn.py", line 191, in forward 2025-09-07T08:48:35.9664250Z kv = kv.unfold(2, self.win_size, self.block_size).unfold(3, self.win_size, self.block_size).reshape( 2025-09-07T08:48:35.9664631Z 2025-09-07T08:48:35.9664636Z 2025-09-07T08:48:36.3567526Z W0907 08:48:36.355000 21682 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T08:48:50.3375976Z pass 2025-09-07T08:48:54.2901333Z accuracy pass_rate=100.00% 2025-09-07T08:48:54.2906020Z calls_captured gmean=1214.80x mean=1314.000x 2025-09-07T08:48:54.2909512Z unique_graphs gmean=2.85x mean=2.875x 2025-09-07T08:48:54.2913283Z graph_breaks gmean=6.87x mean=6.875x 2025-09-07T08:48:54.2917974Z unique_graph_breaks gmean=5.00x mean=5.000x 2025-09-07T08:48:54.2920593Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T08:48:54.2924447Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T08:48:54.2929035Z cudagraph_skips gmean=0.00x mean=0.125x 2025-09-07T08:48:54.2930046Z compilation_latency mean=66.412 seconds 2025-09-07T08:48:55.3014313Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *dynamic-true* ]] 2025-09-07T08:48:55.3016757Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --training --amp --backend inductor --dynamic-shapes --dynamic-batch-only --device cuda --total-partitions 7 --partition-id 1 --output /var/lib/jenkins/workspace/test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_accuracy.csv 2025-09-07T08:48:56.2864609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:48:56.2865881Z import pynvml # type: ignore[import] 2025-09-07T08:49:00.5608532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:49:00.5609773Z import pynvml # type: ignore[import] 2025-09-07T08:49:03.5507904Z 2025-09-07T08:49:07.3571688Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:49:07.3572206Z loading model: 0it [00:03, ?it/s] 2025-09-07T08:49:07.3572589Z cuda train crossvit_9_240 2025-09-07T08:49:18.2557281Z W0907 08:49:18.254000 21994 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T08:49:26.8818101Z pass 2025-09-07T08:49:30.6481402Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:49:30.6482661Z import pynvml # type: ignore[import] 2025-09-07T08:49:33.6137960Z 2025-09-07T08:49:35.5149961Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:49:35.5150335Z loading model: 0it [00:01, ?it/s] 2025-09-07T08:49:35.5150653Z cuda train cspdarknet53 2025-09-07T08:49:45.7700747Z W0907 08:49:45.769000 22331 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T08:49:52.5903468Z pass 2025-09-07T08:49:56.3728018Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:49:56.3729360Z import pynvml # type: ignore[import] 2025-09-07T08:49:59.3437450Z 2025-09-07T08:50:01.5673621Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:50:01.5674495Z loading model: 0it [00:02, ?it/s] 2025-09-07T08:50:01.5674832Z cuda train deit_base_distilled_patch16_224 2025-09-07T08:50:08.4914345Z W0907 08:50:08.490000 22601 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T08:50:13.1748117Z pass 2025-09-07T08:50:16.7437425Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:50:16.7438748Z import pynvml # type: ignore[import] 2025-09-07T08:50:19.6972918Z 2025-09-07T08:50:21.3716742Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:50:21.3717442Z loading model: 0it [00:01, ?it/s] 2025-09-07T08:50:21.3717940Z cuda train dla102 2025-09-07T08:50:33.9769119Z W0907 08:50:33.976000 22871 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T08:50:45.1821382Z pass 2025-09-07T08:50:49.1605119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:50:49.1606992Z import pynvml # type: ignore[import] 2025-09-07T08:50:52.1407899Z 2025-09-07T08:50:54.5957158Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:50:54.5957558Z loading model: 0it [00:02, ?it/s] 2025-09-07T08:50:54.5957865Z cuda train dm_nfnet_f0 2025-09-07T08:51:03.5579611Z W0907 08:51:03.557000 23141 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T08:51:11.4957216Z pass 2025-09-07T08:51:15.3264461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:51:15.3266448Z import pynvml # type: ignore[import] 2025-09-07T08:51:18.3189593Z 2025-09-07T08:51:20.7123564Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:51:20.7124778Z loading model: 0it [00:02, ?it/s] 2025-09-07T08:51:20.7125143Z cuda train dpn107 2025-09-07T08:51:35.3932169Z pass 2025-09-07T08:51:39.2194887Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:51:39.2196120Z import pynvml # type: ignore[import] 2025-09-07T08:51:42.2133403Z 2025-09-07T08:51:43.5146965Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:51:43.5147327Z loading model: 0it [00:01, ?it/s] 2025-09-07T08:51:43.5147642Z cuda train eca_botnext26ts_256 2025-09-07T08:51:51.5572477Z W0907 08:51:51.556000 23681 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T08:51:54.9528037Z pass 2025-09-07T08:51:58.6814471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:51:58.6815697Z import pynvml # type: ignore[import] 2025-09-07T08:52:01.6726487Z 2025-09-07T08:52:02.9548952Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:52:02.9549315Z loading model: 0it [00:01, ?it/s] 2025-09-07T08:52:02.9549635Z cuda train eca_halonext26ts 2025-09-07T08:52:10.3878746Z skipping cudagraphs due to disabling cudagraphs due to incompatible op aten.index_put_.default Found from File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 442, in torch_dynamo_resume_in_forward_and_backward_pass_at_440 2025-09-07T08:52:10.3879908Z pred = mod(*cloned_inputs) 2025-09-07T08:52:10.3881071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/byobnet.py", line 1433, in forward 2025-09-07T08:52:10.3881651Z x = self.forward_features(x) 2025-09-07T08:52:10.3882209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/byobnet.py", line 1425, in forward_features 2025-09-07T08:52:10.3882761Z x = self.stages(x) 2025-09-07T08:52:10.3883222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/byobnet.py", line 894, in forward 2025-09-07T08:52:10.3883950Z x = self.self_attn(x) 2025-09-07T08:52:10.3884452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/layers/halo_attn.py", line 191, in forward 2025-09-07T08:52:10.3885153Z kv = kv.unfold(2, self.win_size, self.block_size).unfold(3, self.win_size, self.block_size).reshape( 2025-09-07T08:52:10.3885526Z 2025-09-07T08:52:10.3885538Z 2025-09-07T08:52:11.3899331Z W0907 08:52:11.389000 23951 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T08:52:14.7032113Z pass 2025-09-07T08:52:17.4627811Z accuracy pass_rate=100.00% 2025-09-07T08:52:17.4633569Z calls_captured gmean=1214.80x mean=1314.000x 2025-09-07T08:52:17.4637870Z unique_graphs gmean=2.85x mean=2.875x 2025-09-07T08:52:17.4641469Z graph_breaks gmean=6.87x mean=6.875x 2025-09-07T08:52:17.4645067Z unique_graph_breaks gmean=5.00x mean=5.000x 2025-09-07T08:52:17.4648275Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T08:52:17.4651562Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T08:52:17.4655333Z cudagraph_skips gmean=0.00x mean=0.125x 2025-09-07T08:52:17.4656529Z compilation_latency mean=13.650 seconds 2025-09-07T08:52:18.5541908Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *cppwrapper-true* ]] 2025-09-07T08:52:18.5543137Z + TORCHINDUCTOR_CPP_WRAPPER=1 2025-09-07T08:52:18.5545282Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --training --amp --backend inductor --disable-cudagraphs --device cuda --total-partitions 7 --partition-id 1 --output /var/lib/jenkins/workspace/test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_accuracy.csv 2025-09-07T08:52:19.5123100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:52:19.5124584Z import pynvml # type: ignore[import] 2025-09-07T08:52:23.9035772Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:52:23.9037838Z import pynvml # type: ignore[import] 2025-09-07T08:52:26.8961276Z 2025-09-07T08:52:27.7942663Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:52:27.7943240Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:52:27.7943696Z cuda train crossvit_9_240 2025-09-07T08:53:58.8293153Z W0907 08:53:58.828000 24272 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T08:54:58.4231987Z pass 2025-09-07T08:55:05.2800204Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:55:05.2802097Z import pynvml # type: ignore[import] 2025-09-07T08:55:08.2570727Z 2025-09-07T08:55:09.9529822Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:55:09.9530950Z loading model: 0it [00:01, ?it/s] 2025-09-07T08:55:09.9531332Z cuda train cspdarknet53 2025-09-07T08:56:36.0624825Z W0907 08:56:36.061000 25756 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T08:57:18.8340828Z pass 2025-09-07T08:57:25.6757059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:57:25.6759073Z import pynvml # type: ignore[import] 2025-09-07T08:57:28.8404165Z 2025-09-07T08:57:31.2971325Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:57:31.2971662Z loading model: 0it [00:02, ?it/s] 2025-09-07T08:57:31.2971958Z cuda train deit_base_distilled_patch16_224 2025-09-07T08:58:10.1782666Z W0907 08:58:10.177000 26537 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T08:58:41.5396030Z pass 2025-09-07T08:58:47.0229154Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T08:58:47.0231123Z import pynvml # type: ignore[import] 2025-09-07T08:58:50.0224835Z 2025-09-07T08:58:51.5821226Z loading model: 0it [00:00, ?it/s] 2025-09-07T08:58:51.5821787Z loading model: 0it [00:01, ?it/s] 2025-09-07T08:58:51.5822278Z cuda train dla102 2025-09-07T09:00:49.2233081Z W0907 09:00:49.222000 27158 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T09:02:04.7273177Z pass 2025-09-07T09:02:13.6203560Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T09:02:13.6205157Z import pynvml # type: ignore[import] 2025-09-07T09:02:16.5709483Z 2025-09-07T09:02:18.8968520Z loading model: 0it [00:00, ?it/s] 2025-09-07T09:02:18.8968873Z loading model: 0it [00:02, ?it/s] 2025-09-07T09:02:18.8969159Z cuda train dm_nfnet_f0 2025-09-07T09:03:29.4983305Z W0907 09:03:29.497000 27747 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T09:04:21.4348249Z pass 2025-09-07T09:04:27.9792412Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T09:04:27.9794833Z import pynvml # type: ignore[import] 2025-09-07T09:04:31.0469004Z 2025-09-07T09:04:33.6846572Z loading model: 0it [00:00, ?it/s] 2025-09-07T09:04:33.6846905Z loading model: 0it [00:02, ?it/s] 2025-09-07T09:04:33.6847188Z cuda train dpn107 2025-09-07T09:06:51.5487210Z pass 2025-09-07T09:06:58.2602987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T09:06:58.2604606Z import pynvml # type: ignore[import] 2025-09-07T09:07:01.2507535Z 2025-09-07T09:07:02.8130150Z loading model: 0it [00:00, ?it/s] 2025-09-07T09:07:02.8130477Z loading model: 0it [00:01, ?it/s] 2025-09-07T09:07:02.8131442Z cuda train eca_botnext26ts_256 2025-09-07T09:07:58.5071500Z W0907 09:07:58.506000 29565 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T09:08:18.3496494Z pass 2025-09-07T09:08:23.7757990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T09:08:23.7759240Z import pynvml # type: ignore[import] 2025-09-07T09:08:26.7356321Z 2025-09-07T09:08:28.1437276Z loading model: 0it [00:00, ?it/s] 2025-09-07T09:08:28.1437680Z loading model: 0it [00:01, ?it/s] 2025-09-07T09:08:28.1437993Z cuda train eca_halonext26ts 2025-09-07T09:09:33.3651376Z W0907 09:09:33.364000 30284 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T09:09:54.2248049Z pass 2025-09-07T09:09:58.7789233Z accuracy pass_rate=100.00% 2025-09-07T09:09:58.7796428Z calls_captured gmean=1214.80x mean=1314.000x 2025-09-07T09:09:58.7800292Z unique_graphs gmean=2.85x mean=2.875x 2025-09-07T09:09:58.7804130Z graph_breaks gmean=6.87x mean=6.875x 2025-09-07T09:09:58.7807726Z unique_graph_breaks gmean=5.00x mean=5.000x 2025-09-07T09:09:58.7811069Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T09:09:58.7814750Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T09:09:58.7817976Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T09:09:58.7819437Z compilation_latency mean=119.167 seconds 2025-09-07T09:09:59.7977856Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *freezing_cudagraphs-true* ]] 2025-09-07T09:09:59.7979894Z + [[ training == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T09:09:59.7981137Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *freeze_autotune_cudagraphs-true* ]] 2025-09-07T09:09:59.7982344Z + [[ training == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T09:09:59.7983498Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *aotinductor-true* ]] 2025-09-07T09:09:59.7985205Z + [[ training == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T09:09:59.7986274Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *maxautotune-true* ]] 2025-09-07T09:09:59.7987359Z + TORCHINDUCTOR_MAX_AUTOTUNE=1 2025-09-07T09:09:59.7988475Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --training --amp --backend inductor --device cuda --total-partitions 7 --partition-id 1 --output /var/lib/jenkins/workspace/test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_accuracy.csv 2025-09-07T09:10:00.8322435Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T09:10:00.8324750Z import pynvml # type: ignore[import] 2025-09-07T09:10:05.2231781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T09:10:05.2233557Z import pynvml # type: ignore[import] 2025-09-07T09:10:08.1886905Z 2025-09-07T09:10:09.4310432Z loading model: 0it [00:00, ?it/s] 2025-09-07T09:10:09.4310899Z loading model: 0it [00:01, ?it/s] 2025-09-07T09:10:09.4311331Z cuda train crossvit_9_240 2025-09-07T09:10:36.0377743Z Autotune Choices Stats: 2025-09-07T09:10:36.0378943Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_60", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008608000352978706, "best_triton_pos": 0} 2025-09-07T09:10:36.0385508Z AUTOTUNE addmm(3208x384, 3208x128, 128x384) 2025-09-07T09:10:36.0385869Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T09:10:36.0386196Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T09:10:36.0386981Z triton_mm_60 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:36.0387676Z bias_addmm 0.0090 ms 96.1% 2025-09-07T09:10:36.0388205Z triton_mm_59 0.0091 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:10:36.0389050Z triton_mm_64 0.0091 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:36.0389893Z triton_mm_61 0.0092 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:36.0390725Z triton_mm_63 0.0092 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:36.0391987Z triton_mm_69 0.0092 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:36.0392824Z triton_mm_70 0.0093 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:36.0393657Z triton_mm_65 0.0093 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:36.0394929Z triton_mm_66 0.0094 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:10:36.0395657Z SingleProcess AUTOTUNE benchmarking takes 0.2662 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T09:10:36.6181177Z Autotune Choices Stats: 2025-09-07T09:10:36.6182487Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.009727999567985535, "best_triton_pos": 1, "best_triton_time": 0.00979200005531311, "best_triton_kernel": "triton_mm_135", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8"} 2025-09-07T09:10:36.6187923Z AUTOTUNE addmm(1576x768, 1576x256, 256x768) 2025-09-07T09:10:36.6188183Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T09:10:36.6188477Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T09:10:36.6188771Z bias_addmm 0.0097 ms 100.0% 2025-09-07T09:10:36.6189337Z triton_mm_135 0.0098 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:10:36.6190536Z triton_mm_139 0.0098 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:36.6191491Z triton_mm_146 0.0100 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:36.6192399Z triton_mm_145 0.0102 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:36.6193302Z triton_mm_138 0.0102 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:10:36.6194369Z triton_mm_142 0.0104 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:10:36.6195301Z triton_mm_137 0.0105 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:36.6196222Z triton_mm_141 0.0107 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:36.6197237Z triton_mm_144 0.0108 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:36.6198040Z SingleProcess AUTOTUNE benchmarking takes 0.2570 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T09:10:37.2428681Z Autotune Choices Stats: 2025-09-07T09:10:37.2429866Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_79", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008736000396311283, "best_triton_pos": 0} 2025-09-07T09:10:37.2435498Z AUTOTUNE mm(3208x384, 384x128) 2025-09-07T09:10:37.2435894Z strides: [384, 1], [1, 384] 2025-09-07T09:10:37.2436171Z dtypes: torch.float16, torch.float16 2025-09-07T09:10:37.2436871Z triton_mm_79 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:37.2438156Z triton_mm_78 0.0093 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:10:37.2438824Z mm 0.0094 ms 92.9% 2025-09-07T09:10:37.2439441Z triton_mm_83 0.0096 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:37.2440518Z triton_mm_82 0.0098 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:37.2441554Z triton_mm_74 0.0099 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:37.2442586Z triton_mm_81 0.0102 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:10:37.2443628Z triton_mm_85 0.0102 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:10:37.2445084Z triton_mm_72 0.0103 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:10:37.2446412Z triton_mm_88 0.0104 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:37.2447354Z SingleProcess AUTOTUNE benchmarking takes 0.2339 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:10:38.0168984Z Autotune Choices Stats: 2025-09-07T09:10:38.0170029Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_307", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009088000282645226, "best_triton_pos": 0} 2025-09-07T09:10:38.0177643Z AUTOTUNE mm(1576x768, 768x256) 2025-09-07T09:10:38.0177975Z strides: [768, 1], [1, 768] 2025-09-07T09:10:38.0178236Z dtypes: torch.float16, torch.float16 2025-09-07T09:10:38.0178928Z triton_mm_307 0.0091 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:38.0179551Z mm 0.0097 ms 93.7% 2025-09-07T09:10:38.0180083Z triton_mm_311 0.0104 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:38.0180989Z triton_mm_306 0.0112 ms 81.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:10:38.0181873Z triton_mm_303 0.0113 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:38.0182759Z triton_mm_310 0.0114 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:38.0184687Z triton_mm_317 0.0115 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:38.0185599Z triton_mm_300 0.0124 ms 73.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:10:38.0186494Z triton_mm_302 0.0124 ms 73.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:38.0187379Z triton_mm_309 0.0126 ms 72.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:10:38.0188160Z SingleProcess AUTOTUNE benchmarking takes 0.2346 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:10:39.3456701Z Autotune Choices Stats: 2025-09-07T09:10:39.3457887Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_4", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=12, KERNEL_W=12, PADDING_H=0, PADDING_W=0, STRIDE_H=12, STRIDE_W=12, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.045184001326560974, "best_triton_pos": 0} 2025-09-07T09:10:39.3463171Z AUTOTUNE convolution(8x3x240x240, 128x3x12x12) 2025-09-07T09:10:39.3463508Z strides: [172800, 57600, 240, 1], [432, 144, 12, 1] 2025-09-07T09:10:39.3464252Z dtypes: torch.float16, torch.float16 2025-09-07T09:10:39.3465045Z triton_convolution2d_4 0.0452 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=12, KERNEL_W=12, PADDING_H=0, PADDING_W=0, STRIDE_H=12, STRIDE_W=12, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:10:39.3466715Z triton_convolution2d_0 0.0580 ms 77.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=12, KERNEL_W=12, PADDING_H=0, PADDING_W=0, STRIDE_H=12, STRIDE_W=12, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:10:39.3467551Z convolution 0.0636 ms 71.1% 2025-09-07T09:10:39.3468332Z triton_convolution2d_6 0.0639 ms 70.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=12, KERNEL_W=12, PADDING_H=0, PADDING_W=0, STRIDE_H=12, STRIDE_W=12, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:10:39.3469529Z triton_convolution2d_1 0.0684 ms 66.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=12, KERNEL_W=12, PADDING_H=0, PADDING_W=0, STRIDE_H=12, STRIDE_W=12, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:10:39.3470606Z triton_convolution2d_3 0.0686 ms 65.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=12, KERNEL_W=12, PADDING_H=0, PADDING_W=0, STRIDE_H=12, STRIDE_W=12, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:10:39.3471681Z triton_convolution2d_5 0.0777 ms 58.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=12, KERNEL_W=12, PADDING_H=0, PADDING_W=0, STRIDE_H=12, STRIDE_W=12, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:10:39.3472756Z triton_convolution2d_2 0.1965 ms 23.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=12, KERNEL_W=12, PADDING_H=0, PADDING_W=0, STRIDE_H=12, STRIDE_W=12, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:10:39.3473843Z SingleProcess AUTOTUNE benchmarking takes 0.1331 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:10:39.5410042Z Autotune Choices Stats: 2025-09-07T09:10:39.5411397Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.06534399837255478, "best_triton_pos": 1, "best_triton_time": 0.08188799768686295, "best_triton_kernel": "triton_convolution2d_11", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T09:10:39.5417398Z AUTOTUNE convolution(8x3x224x224, 256x3x16x16) 2025-09-07T09:10:39.5417729Z strides: [150528, 50176, 224, 1], [768, 256, 16, 1] 2025-09-07T09:10:39.5418031Z dtypes: torch.float16, torch.float16 2025-09-07T09:10:39.5418320Z convolution 0.0653 ms 100.0% 2025-09-07T09:10:39.5426778Z triton_convolution2d_11 0.0819 ms 79.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:10:39.5427898Z triton_convolution2d_13 0.1301 ms 50.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:10:39.5429014Z triton_convolution2d_10 0.1395 ms 46.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:10:39.5430105Z triton_convolution2d_8 0.1444 ms 45.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:10:39.5431150Z triton_convolution2d_12 0.2036 ms 32.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:10:39.5432176Z triton_convolution2d_7 0.2278 ms 28.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:10:39.5433214Z triton_convolution2d_9 0.4198 ms 15.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:10:39.5434428Z SingleProcess AUTOTUNE benchmarking takes 0.1949 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:10:39.7662055Z Autotune Choices Stats: 2025-09-07T09:10:39.7663034Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_40", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007007999811321497, "best_triton_pos": 0} 2025-09-07T09:10:39.7670474Z AUTOTUNE mm(3208x128, 128x128) 2025-09-07T09:10:39.7670970Z strides: [128, 1], [1, 128] 2025-09-07T09:10:39.7671215Z dtypes: torch.float16, torch.float16 2025-09-07T09:10:39.7671960Z triton_mm_40 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:10:39.7672832Z triton_mm_43 0.0075 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:10:39.7673687Z triton_mm_44 0.0075 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:39.7674647Z triton_mm_45 0.0076 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:39.7675468Z triton_mm_47 0.0077 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:10:39.7676280Z triton_mm_34 0.0077 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:10:39.7677219Z triton_mm_36 0.0077 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:39.7678343Z triton_mm_46 0.0077 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:39.7679206Z triton_mm_42 0.0077 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:39.7680013Z triton_mm_41 0.0078 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:39.7680718Z SingleProcess AUTOTUNE benchmarking takes 0.2238 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:10:39.9895639Z Autotune Choices Stats: 2025-09-07T09:10:39.9896640Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_117", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007712000049650669, "best_triton_pos": 0} 2025-09-07T09:10:39.9904213Z AUTOTUNE mm(1576x256, 256x256) 2025-09-07T09:10:39.9904496Z strides: [256, 1], [1, 256] 2025-09-07T09:10:39.9904750Z dtypes: torch.float16, torch.float16 2025-09-07T09:10:39.9905369Z triton_mm_117 0.0077 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:39.9906284Z triton_mm_116 0.0079 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:10:39.9906850Z mm 0.0084 ms 92.3% 2025-09-07T09:10:39.9907679Z triton_mm_120 0.0084 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:39.9908606Z triton_mm_121 0.0084 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:39.9909512Z triton_mm_123 0.0085 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:10:39.9910342Z triton_mm_119 0.0087 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:10:39.9911171Z triton_mm_111 0.0088 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:39.9912000Z triton_mm_112 0.0088 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:39.9912837Z triton_mm_110 0.0090 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:10:39.9913573Z SingleProcess AUTOTUNE benchmarking takes 0.2219 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:10:40.2235810Z Autotune Choices Stats: 2025-09-07T09:10:40.2236789Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_320", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006591999903321266, "best_triton_pos": 0} 2025-09-07T09:10:40.2244407Z AUTOTUNE addmm(8x256, 8x128, 128x256) 2025-09-07T09:10:40.2244649Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T09:10:40.2244931Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T09:10:40.2245865Z triton_mm_320 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:40.2246739Z triton_mm_321 0.0067 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:10:40.2247580Z triton_mm_332 0.0068 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:40.2248406Z triton_mm_325 0.0068 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:40.2249338Z triton_mm_331 0.0070 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:40.2250304Z triton_mm_326 0.0071 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:40.2251268Z triton_mm_329 0.0073 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:40.2252223Z triton_mm_334 0.0073 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:40.2253187Z triton_mm_327 0.0075 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:40.2254488Z triton_mm_328 0.0075 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:10:40.2255357Z SingleProcess AUTOTUNE benchmarking takes 0.2305 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T09:10:40.4560786Z Autotune Choices Stats: 2025-09-07T09:10:40.4561777Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_339", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.007135999854654074, "best_triton_pos": 0} 2025-09-07T09:10:40.4569673Z AUTOTUNE addmm(8x128, 8x256, 256x128) 2025-09-07T09:10:40.4569956Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T09:10:40.4570267Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T09:10:40.4570955Z triton_mm_339 0.0071 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:10:40.4571934Z triton_mm_338 0.0075 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:10:40.4572888Z triton_mm_343 0.0075 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:40.4574157Z triton_mm_336 0.0076 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:10:40.4575147Z triton_mm_351 0.0076 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:40.4576142Z triton_mm_342 0.0077 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:40.4577334Z triton_mm_337 0.0078 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:40.4577941Z bias_addmm 0.0079 ms 90.7% 2025-09-07T09:10:40.4578556Z triton_mm_347 0.0080 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:40.4579537Z triton_mm_346 0.0081 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:40.4580369Z SingleProcess AUTOTUNE benchmarking takes 0.2319 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T09:10:40.6844367Z Autotune Choices Stats: 2025-09-07T09:10:40.6845354Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_360", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.0071680000983178616, "best_triton_pos": 0} 2025-09-07T09:10:40.6853938Z AUTOTUNE addmm(8x256, 8x256, 256x256) 2025-09-07T09:10:40.6854247Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T09:10:40.6854559Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T09:10:40.6855264Z triton_mm_360 0.0072 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:40.6856275Z triton_mm_356 0.0072 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:10:40.6857505Z triton_mm_355 0.0074 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:10:40.6858471Z triton_mm_359 0.0076 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:40.6859418Z triton_mm_354 0.0076 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:40.6860027Z bias_addmm 0.0076 ms 93.7% 2025-09-07T09:10:40.6860620Z triton_mm_368 0.0078 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:40.6861511Z triton_mm_353 0.0078 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:10:40.6862420Z triton_mm_363 0.0079 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:40.6863312Z triton_mm_366 0.0079 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:40.6864234Z SingleProcess AUTOTUNE benchmarking takes 0.2279 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T09:10:40.8853157Z Autotune Choices Stats: 2025-09-07T09:10:40.8854303Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_420", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.007968000136315823, "best_triton_pos": 0} 2025-09-07T09:10:40.8864100Z AUTOTUNE bmm(32x1x64, 32x64x197) 2025-09-07T09:10:40.8864395Z strides: [64, 64, 1], [12608, 197, 1] 2025-09-07T09:10:40.8864921Z dtypes: torch.float16, torch.float16 2025-09-07T09:10:40.8865523Z triton_bmm_420 0.0080 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:40.8866116Z bmm 0.0080 ms 99.2% 2025-09-07T09:10:40.8866648Z triton_bmm_421 0.0080 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:40.8867535Z triton_bmm_409 0.0081 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:40.8868429Z triton_bmm_410 0.0082 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:10:40.8869318Z triton_bmm_413 0.0083 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:10:40.8870191Z triton_bmm_415 0.0083 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:40.8871020Z triton_bmm_417 0.0084 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:10:40.8871846Z triton_bmm_414 0.0086 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:40.8872672Z triton_bmm_418 0.0086 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:40.8873576Z SingleProcess AUTOTUNE benchmarking takes 0.1996 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:10:41.0492574Z Autotune Choices Stats: 2025-09-07T09:10:41.0493618Z {"num_choices": 14, "num_triton_choices": 13, "best_kernel": "triton_bmm_428", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.007872000336647034, "best_triton_pos": 0} 2025-09-07T09:10:41.0502751Z AUTOTUNE bmm(32x1x197, 32x197x64) 2025-09-07T09:10:41.0503368Z strides: [197, 0, 1], [12608, 64, 1] 2025-09-07T09:10:41.0503664Z dtypes: torch.float16, torch.float16 2025-09-07T09:10:41.0504468Z triton_bmm_428 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:10:41.0505414Z triton_bmm_435 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:41.0506349Z triton_bmm_432 0.0079 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:41.0507297Z triton_bmm_425 0.0081 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:10:41.0508228Z triton_bmm_436 0.0083 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:41.0509132Z triton_bmm_431 0.0087 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:41.0510042Z triton_bmm_426 0.0096 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:41.0511198Z triton_bmm_434 0.0098 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:41.0512089Z triton_bmm_427 0.0099 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:10:41.0512995Z triton_bmm_433 0.0099 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:41.0513938Z SingleProcess AUTOTUNE benchmarking takes 0.1634 seconds and 0.0002 seconds precompiling for 14 choices 2025-09-07T09:10:41.2809560Z Autotune Choices Stats: 2025-09-07T09:10:41.2810624Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_474", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006591999903321266, "best_triton_pos": 0} 2025-09-07T09:10:41.2820299Z AUTOTUNE addmm(8x128, 8x128, 128x128) 2025-09-07T09:10:41.2820626Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T09:10:41.2820946Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T09:10:41.2821665Z triton_mm_474 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:10:41.2822691Z triton_mm_473 0.0067 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:41.2824315Z triton_mm_478 0.0067 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:41.2825377Z triton_mm_485 0.0069 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:41.2826396Z triton_mm_479 0.0070 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:41.2827417Z triton_mm_487 0.0070 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:41.2828428Z triton_mm_482 0.0070 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:41.2829437Z triton_mm_484 0.0072 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:41.2831706Z triton_mm_481 0.0073 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:10:41.2832724Z triton_mm_480 0.0074 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:41.2833620Z SingleProcess AUTOTUNE benchmarking takes 0.2304 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T09:10:41.4527588Z Autotune Choices Stats: 2025-09-07T09:10:41.4528605Z {"num_choices": 14, "num_triton_choices": 13, "best_kernel": "triton_bmm_532", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.007679999805986881, "best_triton_pos": 0} 2025-09-07T09:10:41.4537834Z AUTOTUNE bmm(32x1x32, 32x32x401) 2025-09-07T09:10:41.4538194Z strides: [32, 32, 1], [12864, 401, 1] 2025-09-07T09:10:41.4538458Z dtypes: torch.float16, torch.float16 2025-09-07T09:10:41.4539092Z triton_bmm_532 0.0077 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:41.4540085Z triton_bmm_536 0.0077 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:41.4540985Z triton_bmm_531 0.0077 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:10:41.4541884Z triton_bmm_528 0.0078 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:41.4542779Z triton_bmm_535 0.0080 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:41.4543670Z triton_bmm_530 0.0081 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:10:41.4544852Z triton_bmm_533 0.0083 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:41.4545754Z triton_bmm_537 0.0083 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:10:41.4546861Z triton_bmm_538 0.0083 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:41.4547757Z triton_bmm_529 0.0085 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:10:41.4548531Z SingleProcess AUTOTUNE benchmarking takes 0.1702 seconds and 0.0002 seconds precompiling for 14 choices 2025-09-07T09:10:41.5910260Z Autotune Choices Stats: 2025-09-07T09:10:41.5911462Z {"num_choices": 12, "num_triton_choices": 11, "best_kernel": "bmm", "best_time": 0.00848000030964613, "best_triton_pos": 1, "best_triton_time": 0.009344000369310379, "best_triton_kernel": "triton_bmm_542", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2"} 2025-09-07T09:10:41.5920727Z AUTOTUNE bmm(32x1x401, 32x401x32) 2025-09-07T09:10:41.5921018Z strides: [401, 12864, 1], [12864, 32, 1] 2025-09-07T09:10:41.5921307Z dtypes: torch.float16, torch.float16 2025-09-07T09:10:41.5921548Z bmm 0.0085 ms 100.0% 2025-09-07T09:10:41.5922078Z triton_bmm_542 0.0093 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:10:41.5922922Z triton_bmm_548 0.0096 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T09:10:41.5924037Z triton_bmm_540 0.0104 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:10:41.5924937Z triton_bmm_549 0.0109 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:10:41.5925801Z triton_bmm_545 0.0113 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T09:10:41.5926973Z triton_bmm_541 0.0135 ms 62.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:10:41.5927834Z triton_bmm_546 0.0138 ms 61.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T09:10:41.5928696Z triton_bmm_547 0.0138 ms 61.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T09:10:41.5929530Z triton_bmm_544 0.0139 ms 60.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:10:41.5930286Z SingleProcess AUTOTUNE benchmarking takes 0.1378 seconds and 0.0002 seconds precompiling for 12 choices 2025-09-07T09:10:41.8487945Z Autotune Choices Stats: 2025-09-07T09:10:41.8488957Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_610", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007391999941319227, "best_triton_pos": 0} 2025-09-07T09:10:41.8499156Z AUTOTUNE addmm(3208x128, 3208x128, 128x128) 2025-09-07T09:10:41.8499469Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T09:10:41.8499770Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T09:10:41.8500441Z triton_mm_610 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:10:41.8501646Z triton_mm_604 0.0079 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:10:41.8502558Z triton_mm_611 0.0079 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:41.8503444Z triton_mm_614 0.0080 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:41.8504625Z triton_mm_615 0.0081 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:41.8505518Z triton_mm_606 0.0081 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:41.8506415Z triton_mm_612 0.0082 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:41.8507296Z triton_mm_605 0.0083 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:41.8508180Z triton_mm_613 0.0084 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:10:41.8509078Z triton_mm_616 0.0084 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:41.8509887Z SingleProcess AUTOTUNE benchmarking takes 0.2554 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T09:10:42.1033976Z Autotune Choices Stats: 2025-09-07T09:10:42.1034995Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_687", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008031999692320824, "best_triton_pos": 0} 2025-09-07T09:10:42.1045566Z AUTOTUNE addmm(1576x256, 1576x256, 256x256) 2025-09-07T09:10:42.1045992Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T09:10:42.1046286Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T09:10:42.1046896Z triton_mm_687 0.0080 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:42.1047772Z triton_mm_686 0.0083 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:10:42.1048312Z bias_addmm 0.0085 ms 94.0% 2025-09-07T09:10:42.1048848Z triton_mm_690 0.0085 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:42.1049782Z triton_mm_691 0.0088 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:42.1050756Z triton_mm_682 0.0090 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:42.1051711Z triton_mm_689 0.0091 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:10:42.1052673Z triton_mm_693 0.0092 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:10:42.1054090Z triton_mm_680 0.0092 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:10:42.1055093Z triton_mm_681 0.0093 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:42.1055940Z SingleProcess AUTOTUNE benchmarking takes 0.2531 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T09:10:42.3582408Z Autotune Choices Stats: 2025-09-07T09:10:42.3583412Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_1726", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006943999789655209, "best_triton_pos": 0} 2025-09-07T09:10:42.3594271Z AUTOTUNE addmm(8x1000, 8x128, 128x1000) 2025-09-07T09:10:42.3594564Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T09:10:42.3594843Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T09:10:42.3595452Z triton_mm_1726 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:42.3596319Z triton_mm_1727 0.0070 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:10:42.3597272Z triton_mm_1731 0.0070 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:42.3598110Z triton_mm_1738 0.0072 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:42.3598945Z triton_mm_1735 0.0072 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:42.3600147Z triton_mm_1740 0.0073 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:42.3600989Z triton_mm_1732 0.0074 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:42.3601822Z triton_mm_1734 0.0074 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:10:42.3602653Z triton_mm_1737 0.0074 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:42.3603493Z triton_mm_1733 0.0075 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:42.3604412Z SingleProcess AUTOTUNE benchmarking takes 0.2305 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T09:10:42.5899057Z Autotune Choices Stats: 2025-09-07T09:10:42.5900159Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_1745", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.007552000228315592, "best_triton_pos": 0} 2025-09-07T09:10:42.5911665Z AUTOTUNE addmm(8x1000, 8x256, 256x1000) 2025-09-07T09:10:42.5912072Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T09:10:42.5912356Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T09:10:42.5913326Z triton_mm_1745 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:10:42.5914485Z triton_mm_1749 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:42.5915345Z triton_mm_1742 0.0078 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:10:42.5916183Z triton_mm_1743 0.0078 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:10:42.5917126Z triton_mm_1753 0.0079 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:42.5917986Z triton_mm_1748 0.0079 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:42.5918819Z triton_mm_1744 0.0080 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:10:42.5919645Z triton_mm_1757 0.0080 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:10:42.5920462Z triton_mm_1755 0.0081 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:10:42.5921262Z triton_mm_1752 0.0083 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:10:42.5922166Z SingleProcess AUTOTUNE benchmarking takes 0.2312 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T09:11:07.0187533Z Autotune Choices Stats: 2025-09-07T09:11:07.0188642Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2819", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.008352000266313553, "best_triton_pos": 0} 2025-09-07T09:11:07.0200338Z AUTOTUNE mm(3208x128, 128x384) 2025-09-07T09:11:07.0200622Z strides: [128, 1], [384, 1] 2025-09-07T09:11:07.0200850Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:07.0203685Z triton_mm_2819 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:07.0204881Z triton_mm_2825 0.0084 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:07.0205765Z triton_mm_2816 0.0084 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:07.0206608Z triton_mm_2817 0.0084 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:07.0207440Z triton_mm_2821 0.0085 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:07.0208334Z triton_mm_2824 0.0085 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:07.0209688Z triton_mm_2826 0.0085 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:07.0210698Z triton_mm_2815 0.0085 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:07.0211680Z triton_mm_2820 0.0085 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:07.0212651Z triton_mm_2822 0.0085 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:07.0213499Z SingleProcess AUTOTUNE benchmarking takes 0.1840 seconds and 0.0004 seconds precompiling for 20 choices 2025-09-07T09:11:07.5518732Z Autotune Choices Stats: 2025-09-07T09:11:07.5520046Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.008511999621987343, "best_triton_pos": 1, "best_triton_time": 0.008991999551653862, "best_triton_kernel": "triton_mm_2363", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T09:11:07.5531803Z AUTOTUNE mm(1576x256, 256x768) 2025-09-07T09:11:07.5532058Z strides: [256, 1], [768, 1] 2025-09-07T09:11:07.5532346Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:07.5532611Z mm 0.0085 ms 100.0% 2025-09-07T09:11:07.5533192Z triton_mm_2363 0.0090 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:07.5534514Z triton_mm_2369 0.0091 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:07.5535927Z triton_mm_2370 0.0092 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:07.5536897Z triton_mm_2359 0.0092 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:07.5537858Z triton_mm_2362 0.0092 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:07.5538821Z triton_mm_2366 0.0094 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:07.5539780Z triton_mm_2361 0.0095 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:07.5540687Z triton_mm_2368 0.0097 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:07.5541587Z triton_mm_2365 0.0099 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:07.5542364Z SingleProcess AUTOTUNE benchmarking takes 0.1851 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:11:08.1546818Z Autotune Choices Stats: 2025-09-07T09:11:08.1547874Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_1777", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.006144000217318535, "best_triton_pos": 0} 2025-09-07T09:11:08.1559678Z AUTOTUNE mm(1000x8, 8x256) 2025-09-07T09:11:08.1560541Z strides: [1, 1000], [256, 1] 2025-09-07T09:11:08.1560802Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:08.1561412Z triton_mm_1777 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:08.1562340Z triton_mm_1776 0.0062 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:08.1563267Z triton_mm_1779 0.0062 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:08.1564595Z triton_mm_1778 0.0062 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:08.1565503Z triton_mm_1782 0.0062 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:08.1566407Z triton_mm_1781 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:08.1567309Z triton_mm_1775 0.0063 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:11:08.1568208Z triton_mm_1780 0.0063 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:08.1569170Z triton_mm_1783 0.0065 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:08.1570379Z triton_mm_1786 0.0065 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:08.1571239Z SingleProcess AUTOTUNE benchmarking takes 0.1507 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T09:11:08.5624693Z Autotune Choices Stats: 2025-09-07T09:11:08.5625992Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.009920000098645687, "best_triton_pos": 1, "best_triton_time": 0.010879999957978725, "best_triton_kernel": "triton_mm_2375", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T09:11:08.5638957Z AUTOTUNE mm(256x1576, 1576x768) 2025-09-07T09:11:08.5639220Z strides: [1, 256], [768, 1] 2025-09-07T09:11:08.5639453Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:08.5639680Z mm 0.0099 ms 100.0% 2025-09-07T09:11:08.5640241Z triton_mm_2375 0.0109 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:08.5641133Z triton_mm_2379 0.0114 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:08.5641965Z triton_mm_2383 0.0134 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:08.5642785Z triton_mm_2374 0.0154 ms 64.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:08.5644205Z triton_mm_2373 0.0159 ms 62.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:08.5645090Z triton_mm_2378 0.0162 ms 61.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:08.5645906Z triton_mm_2382 0.0164 ms 60.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:08.5646722Z triton_mm_2389 0.0171 ms 58.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:08.5647554Z triton_mm_2381 0.0184 ms 53.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:08.5648270Z SingleProcess AUTOTUNE benchmarking takes 0.2129 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:11:08.9643126Z Autotune Choices Stats: 2025-09-07T09:11:08.9644911Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.009952000342309475, "best_triton_pos": 1, "best_triton_time": 0.010784000158309937, "best_triton_kernel": "triton_mm_2413", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T09:11:08.9658275Z AUTOTUNE mm(768x1576, 1576x256) 2025-09-07T09:11:08.9658533Z strides: [1, 768], [256, 1] 2025-09-07T09:11:08.9658782Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:08.9659065Z mm 0.0100 ms 100.0% 2025-09-07T09:11:08.9659665Z triton_mm_2413 0.0108 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:08.9660643Z triton_mm_2417 0.0112 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:08.9662052Z triton_mm_2421 0.0136 ms 73.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:08.9662963Z triton_mm_2412 0.0151 ms 65.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:08.9664050Z triton_mm_2411 0.0156 ms 63.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:08.9664956Z triton_mm_2416 0.0160 ms 62.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:08.9665877Z triton_mm_2420 0.0164 ms 60.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:08.9666793Z triton_mm_2427 0.0171 ms 58.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:08.9667701Z triton_mm_2419 0.0182 ms 54.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:08.9668489Z SingleProcess AUTOTUNE benchmarking takes 0.2122 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:11:09.4034615Z Autotune Choices Stats: 2025-09-07T09:11:09.4035996Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_1809", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006144000217318535, "best_triton_pos": 0} 2025-09-07T09:11:09.4048486Z AUTOTUNE mm(1000x8, 8x128) 2025-09-07T09:11:09.4048724Z strides: [1, 1000], [128, 1] 2025-09-07T09:11:09.4048972Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:09.4049644Z triton_mm_1809 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:09.4050657Z triton_mm_1810 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:09.4051638Z triton_mm_1811 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:09.4052624Z triton_mm_1812 0.0062 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:09.4053617Z triton_mm_1808 0.0062 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:11:09.4054764Z triton_mm_1813 0.0063 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:09.4055732Z triton_mm_1814 0.0063 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:09.4056697Z triton_mm_1815 0.0063 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:09.4057680Z triton_mm_1816 0.0064 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:09.4058918Z triton_mm_1819 0.0065 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:09.4059785Z SingleProcess AUTOTUNE benchmarking takes 0.1508 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T09:11:09.7651013Z Autotune Choices Stats: 2025-09-07T09:11:09.7652025Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_2104", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.006111999973654747, "best_triton_pos": 0} 2025-09-07T09:11:09.7667022Z AUTOTUNE mm(256x8, 8x256) 2025-09-07T09:11:09.7667269Z strides: [1, 256], [256, 1] 2025-09-07T09:11:09.7667513Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:09.7668172Z triton_mm_2104 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:09.7669110Z triton_mm_2106 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:09.7670021Z triton_mm_2102 0.0061 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:09.7670912Z triton_mm_2103 0.0061 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:09.7672236Z triton_mm_2105 0.0061 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:09.7673184Z triton_mm_2107 0.0061 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:09.7674286Z triton_mm_2108 0.0061 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:09.7675184Z triton_mm_2101 0.0062 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:11:09.7676083Z triton_mm_2110 0.0063 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:09.7677113Z triton_mm_2109 0.0064 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:09.7677908Z SingleProcess AUTOTUNE benchmarking takes 0.1497 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T09:11:10.2234735Z Autotune Choices Stats: 2025-09-07T09:11:10.2236008Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.009568000212311745, "best_triton_pos": 1, "best_triton_time": 0.01065600011497736, "best_triton_kernel": "triton_mm_2204", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T09:11:10.2251372Z AUTOTUNE mm(256x1576, 1576x256) 2025-09-07T09:11:10.2251659Z strides: [1, 256], [256, 1] 2025-09-07T09:11:10.2251907Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:10.2252159Z mm 0.0096 ms 100.0% 2025-09-07T09:11:10.2252778Z triton_mm_2204 0.0107 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:10.2254346Z triton_mm_2200 0.0108 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:10.2255347Z triton_mm_2208 0.0129 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:10.2256322Z triton_mm_2199 0.0147 ms 65.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:10.2257301Z triton_mm_2198 0.0153 ms 62.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:10.2258265Z triton_mm_2203 0.0156 ms 61.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:10.2259254Z triton_mm_2207 0.0161 ms 59.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:10.2260208Z triton_mm_2214 0.0166 ms 57.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:10.2261116Z triton_mm_2206 0.0179 ms 53.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:10.2261904Z SingleProcess AUTOTUNE benchmarking takes 0.2127 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:11:10.6279208Z Autotune Choices Stats: 2025-09-07T09:11:10.6281132Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.013439999893307686, "best_triton_pos": 1, "best_triton_time": 0.014240000396966934, "best_triton_kernel": "triton_mm_2831", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T09:11:10.6294823Z AUTOTUNE mm(128x3208, 3208x384) 2025-09-07T09:11:10.6295098Z strides: [1, 128], [384, 1] 2025-09-07T09:11:10.6295358Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:10.6295620Z mm 0.0134 ms 100.0% 2025-09-07T09:11:10.6296215Z triton_mm_2831 0.0142 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:10.6297216Z triton_mm_2835 0.0152 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:10.6298233Z triton_mm_2839 0.0193 ms 69.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:10.6299211Z triton_mm_2830 0.0241 ms 55.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:10.6300172Z triton_mm_2829 0.0248 ms 54.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:10.6301086Z triton_mm_2834 0.0257 ms 52.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:10.6301986Z triton_mm_2838 0.0265 ms 50.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:10.6303217Z triton_mm_2845 0.0266 ms 50.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:10.6304479Z triton_mm_2837 0.0298 ms 45.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:10.6305284Z SingleProcess AUTOTUNE benchmarking takes 0.2478 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:11:11.0658942Z Autotune Choices Stats: 2025-09-07T09:11:11.0660728Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01321600005030632, "best_triton_pos": 1, "best_triton_time": 0.014431999996304512, "best_triton_kernel": "triton_mm_2869", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T09:11:11.0674372Z AUTOTUNE mm(384x3208, 3208x128) 2025-09-07T09:11:11.0674738Z strides: [1, 384], [128, 1] 2025-09-07T09:11:11.0675068Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:11.0675424Z mm 0.0132 ms 100.0% 2025-09-07T09:11:11.0676247Z triton_mm_2869 0.0144 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:11.0677716Z triton_mm_2873 0.0150 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:11.0679099Z triton_mm_2877 0.0190 ms 69.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:11.0681159Z triton_mm_2868 0.0234 ms 56.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:11.0682555Z triton_mm_2867 0.0245 ms 53.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:11.0684113Z triton_mm_2872 0.0254 ms 52.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:11.0685482Z triton_mm_2883 0.0266 ms 49.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:11.0686856Z triton_mm_2876 0.0273 ms 48.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:11.0688241Z triton_mm_2882 0.0302 ms 43.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:11.0689444Z SingleProcess AUTOTUNE benchmarking takes 0.2466 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:11:11.3330492Z Autotune Choices Stats: 2025-09-07T09:11:11.3331536Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_1841", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2", "best_time": 0.006047999951988459, "best_triton_pos": 0} 2025-09-07T09:11:11.3348196Z AUTOTUNE mm(256x8, 8x128) 2025-09-07T09:11:11.3348424Z strides: [1, 256], [128, 1] 2025-09-07T09:11:11.3348669Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:11.3349338Z triton_mm_1841 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:11:11.3350872Z triton_mm_1842 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:11.3351953Z triton_mm_1845 0.0061 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:11.3352970Z triton_mm_1846 0.0061 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:11.3354412Z triton_mm_1847 0.0061 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:11.3355412Z triton_mm_1844 0.0061 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:11.3356388Z triton_mm_1848 0.0061 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:11.3357485Z triton_mm_1843 0.0062 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:11.3358475Z triton_mm_1850 0.0063 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:11.3359460Z triton_mm_1851 0.0063 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:11.3360487Z SingleProcess AUTOTUNE benchmarking takes 0.1513 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T09:11:11.6772851Z Autotune Choices Stats: 2025-09-07T09:11:11.6774267Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_2069", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-09-07T09:11:11.6790521Z AUTOTUNE mm(128x8, 8x256) 2025-09-07T09:11:11.6790897Z strides: [1, 128], [256, 1] 2025-09-07T09:11:11.6791175Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:11.6791863Z triton_mm_2069 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:11.6792888Z triton_mm_2072 0.0060 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:11.6794119Z triton_mm_2070 0.0061 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:11.6795086Z triton_mm_2073 0.0061 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:11.6796057Z triton_mm_2071 0.0061 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:11.6797121Z triton_mm_2074 0.0061 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:11.6798086Z triton_mm_2068 0.0062 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:11:11.6799509Z triton_mm_2075 0.0062 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:11.6800539Z triton_mm_2076 0.0063 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:11.6801449Z triton_mm_2077 0.0063 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:11.6802259Z SingleProcess AUTOTUNE benchmarking takes 0.1517 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T09:11:11.9805635Z Autotune Choices Stats: 2025-09-07T09:11:11.9806743Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_1876", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.006016000173985958, "best_triton_pos": 0} 2025-09-07T09:11:11.9823922Z AUTOTUNE mm(128x8, 8x128) 2025-09-07T09:11:11.9824234Z strides: [1, 128], [128, 1] 2025-09-07T09:11:11.9824501Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:11.9825203Z triton_mm_1876 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:11.9826214Z triton_mm_1874 0.0060 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:11:11.9827189Z triton_mm_1881 0.0060 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:11.9828619Z triton_mm_1875 0.0061 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:11.9829619Z triton_mm_1879 0.0061 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:11.9830671Z triton_mm_1877 0.0061 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:11.9831561Z triton_mm_1878 0.0061 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:11.9832451Z triton_mm_1880 0.0061 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:11.9833361Z triton_mm_1883 0.0062 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:11.9834425Z triton_mm_1885 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:11.9835220Z SingleProcess AUTOTUNE benchmarking takes 0.1524 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T09:11:12.4679828Z Autotune Choices Stats: 2025-09-07T09:11:12.4681184Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01283199992030859, "best_triton_pos": 1, "best_triton_time": 0.014208000153303146, "best_triton_kernel": "triton_mm_1969", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T09:11:12.4698114Z AUTOTUNE mm(128x3208, 3208x128) 2025-09-07T09:11:12.4698392Z strides: [1, 128], [128, 1] 2025-09-07T09:11:12.4698636Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:12.4698910Z mm 0.0128 ms 100.0% 2025-09-07T09:11:12.4699519Z triton_mm_1969 0.0142 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:12.4700540Z triton_mm_1965 0.0149 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:12.4701526Z triton_mm_1973 0.0183 ms 70.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:12.4702437Z triton_mm_1964 0.0232 ms 55.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:12.4703338Z triton_mm_1963 0.0243 ms 52.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:12.4704412Z triton_mm_1979 0.0244 ms 52.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:12.4705316Z triton_mm_1968 0.0247 ms 52.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:12.4706224Z triton_mm_1972 0.0260 ms 49.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:12.4707390Z triton_mm_1978 0.0288 ms 44.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:12.4708214Z SingleProcess AUTOTUNE benchmarking takes 0.2470 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:11:13.4236057Z Autotune Choices Stats: 2025-09-07T09:11:13.4237517Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "mm", "best_time": 0.008383999578654766, "best_triton_pos": 1, "best_triton_time": 0.008927999995648861, "best_triton_kernel": "triton_mm_1762", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2"} 2025-09-07T09:11:13.4253904Z AUTOTUNE mm(8x1000, 1000x256) 2025-09-07T09:11:13.4254155Z strides: [1000, 1], [256, 1] 2025-09-07T09:11:13.4254393Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:13.4254652Z mm 0.0084 ms 100.0% 2025-09-07T09:11:13.4255238Z triton_mm_1762 0.0089 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:11:13.4256168Z triton_mm_1766 0.0092 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:13.4257082Z triton_mm_1770 0.0098 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:13.4257991Z triton_mm_1761 0.0107 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:11:13.4258888Z triton_mm_1760 0.0107 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:13.4260208Z triton_mm_1765 0.0108 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:13.4261128Z triton_mm_1774 0.0115 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:13.4262004Z triton_mm_1772 0.0119 ms 70.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:13.4262843Z triton_mm_1769 0.0122 ms 68.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:13.4263608Z SingleProcess AUTOTUNE benchmarking takes 0.1771 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:11:13.6001461Z Autotune Choices Stats: 2025-09-07T09:11:13.6002680Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "mm", "best_time": 0.008383999578654766, "best_triton_pos": 1, "best_triton_time": 0.008895999751985073, "best_triton_kernel": "triton_mm_1795", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2"} 2025-09-07T09:11:13.6019171Z AUTOTUNE mm(8x1000, 1000x128) 2025-09-07T09:11:13.6019416Z strides: [1000, 1], [128, 1] 2025-09-07T09:11:13.6019647Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:13.6019895Z mm 0.0084 ms 100.0% 2025-09-07T09:11:13.6020437Z triton_mm_1795 0.0089 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:11:13.6021853Z triton_mm_1799 0.0092 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:13.6022879Z triton_mm_1803 0.0096 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:13.6024028Z triton_mm_1793 0.0107 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:13.6024999Z triton_mm_1794 0.0108 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:11:13.6025997Z triton_mm_1798 0.0111 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:13.6026965Z triton_mm_1807 0.0111 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:13.6027929Z triton_mm_1805 0.0120 ms 69.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:13.6028891Z triton_mm_1802 0.0121 ms 69.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:13.6029739Z SingleProcess AUTOTUNE benchmarking takes 0.1755 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:11:13.7612536Z Autotune Choices Stats: 2025-09-07T09:11:13.7613571Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_1828", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006688000168651342, "best_triton_pos": 0} 2025-09-07T09:11:13.7630638Z AUTOTUNE mm(8x256, 256x128) 2025-09-07T09:11:13.7630885Z strides: [256, 1], [128, 1] 2025-09-07T09:11:13.7631119Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:13.7631725Z triton_mm_1828 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:11:13.7632632Z triton_mm_1832 0.0068 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:13.7633475Z triton_mm_1827 0.0070 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:11:13.7634464Z triton_mm_1826 0.0070 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:13.7635297Z triton_mm_1831 0.0070 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:13.7635831Z mm 0.0071 ms 93.7% 2025-09-07T09:11:13.7636315Z triton_mm_1838 0.0072 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:13.7637220Z triton_mm_1840 0.0074 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:13.7638058Z triton_mm_1825 0.0074 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:11:13.7639110Z triton_mm_1835 0.0074 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:13.7639862Z SingleProcess AUTOTUNE benchmarking takes 0.1600 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:11:13.9205865Z Autotune Choices Stats: 2025-09-07T09:11:13.9206842Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_1859", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.0063680000603199005, "best_triton_pos": 0} 2025-09-07T09:11:13.9222982Z AUTOTUNE mm(8x128, 128x128) 2025-09-07T09:11:13.9223204Z strides: [128, 1], [128, 1] 2025-09-07T09:11:13.9223431Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:13.9224167Z triton_mm_1859 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:13.9225094Z triton_mm_1860 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:11:13.9225984Z triton_mm_1864 0.0064 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:13.9226866Z triton_mm_1870 0.0065 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:13.9227769Z triton_mm_1871 0.0066 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:13.9228685Z triton_mm_1865 0.0068 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:13.9229837Z triton_mm_1868 0.0068 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:13.9230727Z triton_mm_1866 0.0068 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:13.9231620Z triton_mm_1867 0.0069 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:13.9232469Z triton_mm_1873 0.0069 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:13.9233209Z SingleProcess AUTOTUNE benchmarking takes 0.1588 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:11:14.0529357Z Autotune Choices Stats: 2025-09-07T09:11:14.0530324Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_bmm_1895", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-09-07T09:11:14.0548926Z AUTOTUNE bmm(32x401x1, 32x1x32) 2025-09-07T09:11:14.0549169Z strides: [401, 1, 401], [32, 32, 1] 2025-09-07T09:11:14.0549422Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:14.0550023Z triton_bmm_1895 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:14.0551026Z triton_bmm_1892 0.0063 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:14.0552374Z triton_bmm_1890 0.0063 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:11:14.0553364Z triton_bmm_1891 0.0063 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:14.0554725Z triton_bmm_1893 0.0063 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:14.0555724Z triton_bmm_1894 0.0063 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:14.0556708Z triton_bmm_1897 0.0063 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:14.0557813Z triton_bmm_1898 0.0063 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:14.0558807Z triton_bmm_1902 0.0063 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:11:14.0559814Z triton_bmm_1896 0.0064 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:14.0560681Z SingleProcess AUTOTUNE benchmarking takes 0.1321 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T09:11:14.1778830Z Autotune Choices Stats: 2025-09-07T09:11:14.1779737Z {"num_choices": 14, "num_triton_choices": 13, "best_kernel": "triton_bmm_1906", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006719999946653843, "best_triton_pos": 0} 2025-09-07T09:11:14.1798720Z AUTOTUNE bmm(32x1x32, 32x32x401) 2025-09-07T09:11:14.1799016Z strides: [32, 32, 1], [12864, 1, 32] 2025-09-07T09:11:14.1799271Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:14.1799845Z triton_bmm_1906 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:14.1800782Z triton_bmm_1905 0.0068 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:11:14.1801761Z triton_bmm_1907 0.0068 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:11:14.1802733Z triton_bmm_1910 0.0068 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:14.1804114Z triton_bmm_1909 0.0068 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:14.1805079Z triton_bmm_1914 0.0068 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:14.1806046Z triton_bmm_1915 0.0068 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:11:14.1807024Z triton_bmm_1912 0.0068 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:14.1808343Z triton_bmm_1916 0.0068 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:14.1809337Z triton_bmm_1904 0.0069 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:11:14.1810186Z SingleProcess AUTOTUNE benchmarking takes 0.1245 seconds and 0.0002 seconds precompiling for 14 choices 2025-09-07T09:11:14.3150594Z Autotune Choices Stats: 2025-09-07T09:11:14.3151763Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_bmm_1917", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2", "best_time": 0.006752000190317631, "best_triton_pos": 0} 2025-09-07T09:11:14.3171044Z AUTOTUNE bmm(32x32x1, 32x1x401) 2025-09-07T09:11:14.3171368Z strides: [32, 1, 128], [401, 0, 1] 2025-09-07T09:11:14.3171685Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:14.3172386Z triton_bmm_1917 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:11:14.3173476Z triton_bmm_1921 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:14.3174877Z triton_bmm_1924 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:14.3176035Z triton_bmm_1918 0.0068 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:14.3177098Z triton_bmm_1920 0.0068 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:14.3178433Z triton_bmm_1923 0.0068 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:14.3179508Z triton_bmm_1927 0.0068 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:14.3180565Z triton_bmm_1926 0.0068 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:14.3181616Z triton_bmm_1919 0.0069 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:14.3182585Z triton_bmm_1922 0.0069 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:14.3183428Z SingleProcess AUTOTUNE benchmarking takes 0.1368 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T09:11:14.4414412Z Autotune Choices Stats: 2025-09-07T09:11:14.4416039Z {"num_choices": 12, "num_triton_choices": 11, "best_kernel": "bmm", "best_time": 0.013728000223636627, "best_triton_pos": 1, "best_triton_time": 0.01398400031030178, "best_triton_kernel": "triton_bmm_1940", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2"} 2025-09-07T09:11:14.4435157Z AUTOTUNE bmm(32x1x401, 32x401x32) 2025-09-07T09:11:14.4435465Z strides: [401, 0, 1], [12864, 1, 401] 2025-09-07T09:11:14.4435754Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:14.4436053Z bmm 0.0137 ms 100.0% 2025-09-07T09:11:14.4436910Z triton_bmm_1940 0.0140 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T09:11:14.4438062Z triton_bmm_1934 0.0142 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:11:14.4439114Z triton_bmm_1932 0.0142 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:11:14.4440166Z triton_bmm_1941 0.0192 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:11:14.4441295Z triton_bmm_1937 0.0195 ms 70.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T09:11:14.4442448Z triton_bmm_1933 0.0199 ms 69.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:11:14.4443594Z triton_bmm_1938 0.0199 ms 69.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T09:11:14.4445055Z triton_bmm_1939 0.0199 ms 69.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T09:11:14.4446212Z triton_bmm_1936 0.0201 ms 68.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:11:14.4447226Z SingleProcess AUTOTUNE benchmarking takes 0.1259 seconds and 0.0002 seconds precompiling for 12 choices 2025-09-07T09:11:14.6207450Z Autotune Choices Stats: 2025-09-07T09:11:14.6208477Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1949", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007424000184983015, "best_triton_pos": 0} 2025-09-07T09:11:14.6227674Z AUTOTUNE mm(3208x128, 128x128) 2025-09-07T09:11:14.6227963Z strides: [128, 1], [128, 1] 2025-09-07T09:11:14.6228234Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:14.6228911Z triton_mm_1949 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:14.6229960Z triton_mm_1950 0.0075 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:14.6231021Z triton_mm_1953 0.0075 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:14.6232075Z triton_mm_1954 0.0076 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:14.6233038Z triton_mm_1951 0.0077 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:14.6234371Z triton_mm_1956 0.0077 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:14.6235566Z triton_mm_1952 0.0078 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:14.6236553Z triton_mm_1944 0.0078 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:14.6237250Z mm 0.0079 ms 94.3% 2025-09-07T09:11:14.6237810Z triton_mm_1945 0.0079 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:14.6238658Z SingleProcess AUTOTUNE benchmarking takes 0.1782 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:11:14.7807168Z Autotune Choices Stats: 2025-09-07T09:11:14.7808261Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_2054", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006399999838322401, "best_triton_pos": 0} 2025-09-07T09:11:14.7828023Z AUTOTUNE mm(8x128, 128x256) 2025-09-07T09:11:14.7828434Z strides: [128, 1], [256, 1] 2025-09-07T09:11:14.7828710Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:14.7829440Z triton_mm_2054 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:11:14.7830516Z triton_mm_2053 0.0065 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:14.7831596Z triton_mm_2065 0.0065 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:14.7832554Z triton_mm_2058 0.0065 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:14.7834486Z triton_mm_2062 0.0067 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:14.7835470Z triton_mm_2059 0.0068 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:14.7836433Z triton_mm_2064 0.0068 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:14.7837479Z triton_mm_2061 0.0068 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:14.7838471Z triton_mm_2067 0.0068 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:14.7839441Z triton_mm_2060 0.0070 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:14.7840286Z SingleProcess AUTOTUNE benchmarking takes 0.1582 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:11:14.9455591Z Autotune Choices Stats: 2025-09-07T09:11:14.9456772Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_2088", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006783999968320131, "best_triton_pos": 0} 2025-09-07T09:11:14.9477483Z AUTOTUNE mm(8x256, 256x256) 2025-09-07T09:11:14.9477789Z strides: [256, 1], [256, 1] 2025-09-07T09:11:14.9478035Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:14.9479107Z triton_mm_2088 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:11:14.9480111Z triton_mm_2092 0.0069 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:14.9481116Z triton_mm_2087 0.0070 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:11:14.9482159Z triton_mm_2091 0.0070 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:14.9483216Z triton_mm_2086 0.0071 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:14.9484243Z mm 0.0072 ms 93.8% 2025-09-07T09:11:14.9484859Z triton_mm_2100 0.0073 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:14.9485903Z triton_mm_2095 0.0074 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:14.9486947Z triton_mm_2085 0.0075 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:11:14.9487978Z triton_mm_2098 0.0075 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:14.9488888Z SingleProcess AUTOTUNE benchmarking takes 0.1644 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:11:15.1051955Z Autotune Choices Stats: 2025-09-07T09:11:15.1053279Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_2145", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.00687999976798892, "best_triton_pos": 0} 2025-09-07T09:11:15.1073428Z AUTOTUNE bmm(32x1x64, 32x64x197) 2025-09-07T09:11:15.1073851Z strides: [64, 64, 1], [12608, 1, 64] 2025-09-07T09:11:15.1074152Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:15.1074847Z triton_bmm_2145 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:15.1075940Z triton_bmm_2135 0.0069 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:11:15.1077089Z triton_bmm_2146 0.0069 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:15.1078133Z triton_bmm_2134 0.0070 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:15.1079185Z triton_bmm_2148 0.0070 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:15.1080233Z triton_bmm_2139 0.0071 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:15.1081334Z triton_bmm_2140 0.0071 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:15.1082741Z triton_bmm_2142 0.0071 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:15.1084128Z triton_bmm_2133 0.0072 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:11:15.1085306Z triton_bmm_2136 0.0072 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:11:15.1086317Z SingleProcess AUTOTUNE benchmarking takes 0.1590 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:11:15.2510667Z Autotune Choices Stats: 2025-09-07T09:11:15.2511782Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_bmm_2149", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2", "best_time": 0.006624000146985054, "best_triton_pos": 0} 2025-09-07T09:11:15.2531351Z AUTOTUNE bmm(32x64x1, 32x1x197) 2025-09-07T09:11:15.2531781Z strides: [64, 1, 256], [197, 0, 1] 2025-09-07T09:11:15.2540006Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:15.2540732Z triton_bmm_2149 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:11:15.2541817Z triton_bmm_2150 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:15.2542886Z triton_bmm_2152 0.0068 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:15.2544175Z triton_bmm_2153 0.0068 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:15.2545494Z triton_bmm_2151 0.0069 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:15.2546560Z triton_bmm_2154 0.0069 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:15.2547626Z triton_bmm_2156 0.0069 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:15.2548667Z triton_bmm_2160 0.0069 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:15.2549723Z triton_bmm_2161 0.0070 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:15.2550774Z triton_bmm_2155 0.0070 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:15.2551689Z SingleProcess AUTOTUNE benchmarking takes 0.1454 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T09:11:15.4322929Z Autotune Choices Stats: 2025-09-07T09:11:15.4324313Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2223", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007552000228315592, "best_triton_pos": 0} 2025-09-07T09:11:15.4344371Z AUTOTUNE mm(1576x256, 256x256) 2025-09-07T09:11:15.4344992Z strides: [256, 1], [256, 1] 2025-09-07T09:11:15.4345283Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:15.4345985Z triton_mm_2223 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:15.4347047Z triton_mm_2222 0.0079 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:15.4348091Z triton_mm_2226 0.0081 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:15.4348752Z mm 0.0083 ms 91.5% 2025-09-07T09:11:15.4349366Z triton_mm_2227 0.0083 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:15.4350436Z triton_mm_2225 0.0085 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:15.4351639Z triton_mm_2229 0.0085 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:15.4352686Z triton_mm_2218 0.0087 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:15.4353962Z triton_mm_2217 0.0089 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:15.4355017Z triton_mm_2232 0.0090 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:15.4356130Z SingleProcess AUTOTUNE benchmarking takes 0.1807 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:11:15.5670107Z Autotune Choices Stats: 2025-09-07T09:11:15.5671111Z {"num_choices": 14, "num_triton_choices": 13, "best_kernel": "triton_bmm_2171", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.009279999881982803, "best_triton_pos": 0} 2025-09-07T09:11:15.5692320Z AUTOTUNE bmm(32x1x197, 32x197x64) 2025-09-07T09:11:15.5692687Z strides: [197, 0, 1], [12608, 1, 197] 2025-09-07T09:11:15.5693019Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:15.5694191Z triton_bmm_2171 0.0093 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:15.5695480Z triton_bmm_2176 0.0093 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:15.5696709Z triton_bmm_2167 0.0105 ms 88.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:11:15.5697933Z triton_bmm_2168 0.0121 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:11:15.5699160Z triton_bmm_2165 0.0122 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:11:15.5699934Z bmm 0.0124 ms 74.6% 2025-09-07T09:11:15.5700644Z triton_bmm_2175 0.0125 ms 74.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:15.5702067Z triton_bmm_2172 0.0125 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:15.5703144Z triton_bmm_2166 0.0133 ms 69.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:15.5704378Z triton_bmm_2169 0.0133 ms 69.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:15.5705312Z SingleProcess AUTOTUNE benchmarking takes 0.1343 seconds and 0.0002 seconds precompiling for 14 choices 2025-09-07T09:11:15.7087890Z Autotune Choices Stats: 2025-09-07T09:11:15.7089041Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_bmm_2124", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-09-07T09:11:15.7110747Z AUTOTUNE bmm(32x197x1, 32x1x64) 2025-09-07T09:11:15.7111035Z strides: [197, 1, 197], [64, 64, 1] 2025-09-07T09:11:15.7111331Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:15.7112204Z triton_bmm_2124 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:15.7113312Z triton_bmm_2117 0.0063 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:11:15.7114672Z triton_bmm_2123 0.0063 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:15.7115983Z triton_bmm_2125 0.0063 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:15.7117126Z triton_bmm_2118 0.0063 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:15.7118185Z triton_bmm_2121 0.0063 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:15.7119229Z triton_bmm_2122 0.0063 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:15.7120268Z triton_bmm_2126 0.0063 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:15.7121431Z triton_bmm_2127 0.0063 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:15.7122558Z triton_bmm_2119 0.0064 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:15.7123478Z SingleProcess AUTOTUNE benchmarking takes 0.1406 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T09:11:15.9046652Z Autotune Choices Stats: 2025-09-07T09:11:15.9047645Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2398", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009312000125646591, "best_triton_pos": 0} 2025-09-07T09:11:15.9069924Z AUTOTUNE mm(1576x768, 768x256) 2025-09-07T09:11:15.9070430Z strides: [768, 1], [256, 1] 2025-09-07T09:11:15.9070700Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:15.9071303Z triton_mm_2398 0.0093 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:15.9072049Z mm 0.0095 ms 97.7% 2025-09-07T09:11:15.9072715Z triton_mm_2402 0.0102 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:15.9074048Z triton_mm_2397 0.0112 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:15.9075096Z triton_mm_2394 0.0116 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:15.9076115Z triton_mm_2401 0.0116 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:15.9077205Z triton_mm_2408 0.0116 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:15.9078185Z triton_mm_2393 0.0124 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:15.9079155Z triton_mm_2400 0.0125 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:15.9080145Z triton_mm_2407 0.0125 ms 74.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:15.9081242Z SingleProcess AUTOTUNE benchmarking takes 0.1924 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:11:16.0961672Z Autotune Choices Stats: 2025-09-07T09:11:16.0962679Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_2854", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008736000396311283, "best_triton_pos": 0} 2025-09-07T09:11:16.0984880Z AUTOTUNE mm(3208x384, 384x128) 2025-09-07T09:11:16.0985113Z strides: [384, 1], [128, 1] 2025-09-07T09:11:16.0985349Z dtypes: torch.float16, torch.float16 2025-09-07T09:11:16.0985911Z triton_mm_2854 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:11:16.0986454Z mm 0.0090 ms 97.2% 2025-09-07T09:11:16.0986955Z triton_mm_2853 0.0092 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:11:16.0987797Z triton_mm_2858 0.0094 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:11:16.0988652Z triton_mm_2857 0.0094 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:11:16.0989487Z triton_mm_2856 0.0099 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:16.0990314Z triton_mm_2849 0.0100 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:16.0991385Z triton_mm_2847 0.0100 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:11:16.0992260Z triton_mm_2860 0.0101 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:11:16.0993114Z triton_mm_2848 0.0103 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:11:16.0994038Z SingleProcess AUTOTUNE benchmarking takes 0.1836 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:11:24.6244747Z W0907 09:11:24.623000 31182 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T09:12:04.1818571Z pass 2025-09-07T09:12:11.4213211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T09:12:11.4214934Z import pynvml # type: ignore[import] 2025-09-07T09:12:14.3900588Z 2025-09-07T09:12:15.7760637Z loading model: 0it [00:00, ?it/s] 2025-09-07T09:12:15.7761014Z loading model: 0it [00:01, ?it/s] 2025-09-07T09:12:15.7761343Z cuda train cspdarknet53 2025-09-07T09:12:43.2554191Z Autotune Choices Stats: 2025-09-07T09:12:43.2556535Z {"num_choices": 7, "num_triton_choices": 6, "best_kernel": "convolution", "best_time": 0.04137599840760231, "best_triton_pos": 1, "best_triton_time": 0.09020800143480301, "best_triton_kernel": "triton_convolution2d_4", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T09:12:43.2578934Z AUTOTUNE convolution(8x3x256x256, 32x3x3x3) 2025-09-07T09:12:43.2579213Z strides: [196608, 1, 768, 3], [27, 1, 9, 3] 2025-09-07T09:12:43.2579477Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:43.2579747Z convolution 0.0414 ms 100.0% 2025-09-07T09:12:43.2580451Z triton_convolution2d_4 0.0902 ms 45.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:43.2581590Z triton_convolution2d_2 0.0911 ms 45.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:12:43.2583020Z triton_convolution2d_0 0.0993 ms 41.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:43.2584456Z triton_convolution2d_3 0.0996 ms 41.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:43.2585663Z triton_convolution2d_1 0.1216 ms 34.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:43.2586863Z triton_convolution2d_5 0.1363 ms 30.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:43.2587842Z SingleProcess AUTOTUNE benchmarking takes 0.1421 seconds and 0.0002 seconds precompiling for 7 choices 2025-09-07T09:12:43.3747012Z Autotune Choices Stats: 2025-09-07T09:12:43.3749647Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.033055998384952545, "best_triton_pos": 1, "best_triton_time": 0.03564799949526787, "best_triton_kernel": "triton_convolution2d_9", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T09:12:43.3769443Z AUTOTUNE convolution(8x32x256x256, 64x32x3x3) 2025-09-07T09:12:43.3769757Z strides: [2097152, 1, 8192, 32], [288, 1, 96, 32] 2025-09-07T09:12:43.3770056Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:43.3770323Z convolution 0.0331 ms 100.0% 2025-09-07T09:12:43.3771056Z triton_convolution2d_9 0.0356 ms 92.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:43.3772752Z triton_convolution2d_12 0.0362 ms 91.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:43.3775206Z triton_convolution2d_10 0.0380 ms 87.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:43.3777106Z triton_convolution2d_11 0.0426 ms 77.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:43.3779008Z triton_convolution2d_7 0.0471 ms 70.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:43.3780930Z triton_convolution2d_6 0.0617 ms 53.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:43.3783097Z triton_convolution2d_8 0.1230 ms 26.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:12:43.3784243Z SingleProcess AUTOTUNE benchmarking takes 0.1178 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:12:43.6110206Z Autotune Choices Stats: 2025-09-07T09:12:43.6111754Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_24", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.02316799946129322, "best_triton_pos": 0} 2025-09-07T09:12:43.6132681Z AUTOTUNE mm(131072x64, 64x128) 2025-09-07T09:12:43.6132945Z strides: [64, 1], [1, 64] 2025-09-07T09:12:43.6133184Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:43.6133904Z triton_mm_24 0.0232 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:43.6134808Z triton_mm_21 0.0234 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:12:43.6135701Z triton_mm_25 0.0236 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:12:43.6136606Z triton_mm_29 0.0241 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:43.6137164Z mm 0.0247 ms 93.8% 2025-09-07T09:12:43.6137898Z triton_mm_26 0.0247 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:43.6138796Z triton_mm_31 0.0251 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:12:43.6139699Z triton_mm_27 0.0252 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:43.6140585Z triton_mm_23 0.0253 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:43.6141464Z triton_mm_22 0.0255 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:43.6142244Z SingleProcess AUTOTUNE benchmarking takes 0.2352 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:12:43.8196277Z Autotune Choices Stats: 2025-09-07T09:12:43.8198013Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_39", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.01532800029963255, "best_triton_pos": 0} 2025-09-07T09:12:43.8219701Z AUTOTUNE mm(131072x64, 64x32) 2025-09-07T09:12:43.8219938Z strides: [64, 1], [1, 64] 2025-09-07T09:12:43.8220183Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:43.8220784Z triton_mm_39 0.0153 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:12:43.8221696Z triton_mm_47 0.0157 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:43.8223587Z triton_mm_45 0.0162 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:43.8225356Z triton_mm_36 0.0162 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:12:43.8226896Z triton_mm_42 0.0164 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:43.8228379Z triton_mm_43 0.0164 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:12:43.8229887Z triton_mm_48 0.0164 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:12:43.8231394Z triton_mm_41 0.0165 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:43.8232431Z mm 0.0165 ms 92.6% 2025-09-07T09:12:43.8233121Z triton_mm_46 0.0167 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:12:43.8234023Z SingleProcess AUTOTUNE benchmarking takes 0.2082 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:12:43.9292203Z Autotune Choices Stats: 2025-09-07T09:12:43.9295431Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.027295999228954315, "best_triton_pos": 1, "best_triton_time": 0.028575999662280083, "best_triton_kernel": "triton_convolution2d_52", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T09:12:43.9314999Z AUTOTUNE convolution(8x32x128x128, 64x32x3x3) 2025-09-07T09:12:43.9315280Z strides: [524288, 1, 4096, 32], [288, 1, 96, 32] 2025-09-07T09:12:43.9315554Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:43.9315798Z convolution 0.0273 ms 100.0% 2025-09-07T09:12:43.9316499Z triton_convolution2d_52 0.0286 ms 95.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:43.9317708Z triton_convolution2d_55 0.0290 ms 94.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:43.9318843Z triton_convolution2d_53 0.0323 ms 84.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:43.9319951Z triton_convolution2d_54 0.0343 ms 79.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:43.9321063Z triton_convolution2d_50 0.0402 ms 68.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:43.9322218Z triton_convolution2d_49 0.0595 ms 45.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:43.9323429Z triton_convolution2d_51 0.0825 ms 33.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:12:43.9324747Z SingleProcess AUTOTUNE benchmarking takes 0.1090 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:12:44.1480238Z Autotune Choices Stats: 2025-09-07T09:12:44.1482277Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "mm", "best_time": 0.01740800030529499, "best_triton_pos": 1, "best_triton_time": 0.01759999990463257, "best_triton_kernel": "triton_mm_73", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T09:12:44.1504190Z AUTOTUNE mm(131072x64, 64x64) 2025-09-07T09:12:44.1504622Z strides: [64, 1], [1, 64] 2025-09-07T09:12:44.1505031Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:44.1505451Z mm 0.0174 ms 100.0% 2025-09-07T09:12:44.1506424Z triton_mm_73 0.0176 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:12:44.1507980Z triton_mm_72 0.0178 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:44.1509508Z triton_mm_68 0.0189 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:12:44.1511008Z triton_mm_64 0.0189 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:12:44.1512559Z triton_mm_63 0.0191 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:12:44.1513910Z triton_mm_67 0.0191 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:44.1514760Z triton_mm_69 0.0191 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:44.1515583Z triton_mm_71 0.0192 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:12:44.1516404Z triton_mm_70 0.0192 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:44.1517206Z SingleProcess AUTOTUNE benchmarking takes 0.2176 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T09:12:44.3699925Z Autotune Choices Stats: 2025-09-07T09:12:44.3701530Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_81", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.024960000067949295, "best_triton_pos": 0} 2025-09-07T09:12:44.3722890Z AUTOTUNE mm(131072x128, 128x64) 2025-09-07T09:12:44.3723155Z strides: [128, 1], [1, 128] 2025-09-07T09:12:44.3723410Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:44.3724195Z triton_mm_81 0.0250 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:12:44.3725160Z triton_mm_90 0.0250 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:44.3726125Z triton_mm_85 0.0251 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:44.3726988Z mm 0.0255 ms 98.0% 2025-09-07T09:12:44.3727549Z triton_mm_87 0.0273 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:44.3728535Z triton_mm_88 0.0275 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:44.3729491Z triton_mm_89 0.0278 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:12:44.3730429Z triton_mm_80 0.0289 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:12:44.3731374Z triton_mm_84 0.0293 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:44.3732322Z triton_mm_86 0.0293 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:12:44.3733152Z SingleProcess AUTOTUNE benchmarking takes 0.2207 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T09:12:44.4815434Z Autotune Choices Stats: 2025-09-07T09:12:44.4817965Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.024639999493956566, "best_triton_pos": 1, "best_triton_time": 0.026335999369621277, "best_triton_kernel": "triton_convolution2d_97", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T09:12:44.4839472Z AUTOTUNE convolution(8x64x128x128, 128x64x3x3) 2025-09-07T09:12:44.4839769Z strides: [1048576, 1, 8192, 64], [576, 1, 192, 64] 2025-09-07T09:12:44.4840035Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:44.4840268Z convolution 0.0246 ms 100.0% 2025-09-07T09:12:44.4840906Z triton_convolution2d_97 0.0263 ms 93.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:44.4841971Z triton_convolution2d_95 0.0266 ms 92.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:44.4843202Z triton_convolution2d_98 0.0300 ms 82.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:44.4844704Z triton_convolution2d_96 0.0350 ms 70.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:44.4845903Z triton_convolution2d_93 0.0383 ms 64.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:44.4847108Z triton_convolution2d_92 0.0427 ms 57.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:44.4848327Z triton_convolution2d_94 0.1060 ms 23.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:12:44.4849486Z SingleProcess AUTOTUNE benchmarking takes 0.1111 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:12:44.7142724Z Autotune Choices Stats: 2025-09-07T09:12:44.7144782Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_116", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.012384000234305859, "best_triton_pos": 0} 2025-09-07T09:12:44.7166926Z AUTOTUNE mm(32768x128, 128x128) 2025-09-07T09:12:44.7167175Z strides: [128, 1], [1, 128] 2025-09-07T09:12:44.7167418Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:44.7168054Z triton_mm_116 0.0124 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:44.7169055Z triton_mm_108 0.0125 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:44.7170010Z triton_mm_109 0.0125 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:44.7170969Z triton_mm_115 0.0128 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:44.7171932Z triton_mm_111 0.0129 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:12:44.7172908Z triton_mm_105 0.0129 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:12:44.7174236Z triton_mm_112 0.0130 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:44.7175151Z triton_mm_113 0.0130 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:44.7175713Z mm 0.0133 ms 93.3% 2025-09-07T09:12:44.7176225Z triton_mm_110 0.0133 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:44.7177015Z SingleProcess AUTOTUNE benchmarking takes 0.2316 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:12:44.9326740Z Autotune Choices Stats: 2025-09-07T09:12:44.9327744Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_134", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.009151999838650227, "best_triton_pos": 0} 2025-09-07T09:12:44.9351585Z AUTOTUNE mm(32768x64, 64x64) 2025-09-07T09:12:44.9351821Z strides: [64, 1], [1, 64] 2025-09-07T09:12:44.9352044Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:44.9352830Z triton_mm_134 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:44.9355077Z triton_mm_126 0.0092 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:12:44.9356686Z triton_mm_135 0.0093 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:12:44.9358407Z triton_mm_130 0.0094 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:12:44.9360336Z triton_mm_129 0.0094 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:44.9361885Z triton_mm_131 0.0095 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:44.9363504Z triton_mm_132 0.0095 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:44.9364622Z triton_mm_122 0.0095 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:12:44.9365583Z triton_mm_124 0.0095 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:12:44.9366547Z triton_mm_133 0.0096 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:12:44.9367390Z SingleProcess AUTOTUNE benchmarking takes 0.2179 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T09:12:45.0355203Z Autotune Choices Stats: 2025-09-07T09:12:45.0356921Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_140", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.019007999449968338, "best_triton_pos": 0} 2025-09-07T09:12:45.0379837Z AUTOTUNE convolution(8x64x64x64, 64x64x3x3) 2025-09-07T09:12:45.0380343Z strides: [262144, 1, 4096, 64], [576, 1, 192, 64] 2025-09-07T09:12:45.0380640Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:45.0381342Z triton_convolution2d_140 0.0190 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:45.0382539Z triton_convolution2d_141 0.0193 ms 98.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:45.0385595Z triton_convolution2d_139 0.0194 ms 97.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:45.0386783Z convolution 0.0196 ms 97.1% 2025-09-07T09:12:45.0387957Z triton_convolution2d_142 0.0236 ms 80.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:45.0389922Z triton_convolution2d_137 0.0324 ms 58.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:45.0391855Z triton_convolution2d_136 0.0325 ms 58.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:45.0393602Z triton_convolution2d_138 0.0531 ms 35.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:12:45.0394586Z SingleProcess AUTOTUNE benchmarking takes 0.1023 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:12:45.1571863Z Autotune Choices Stats: 2025-09-07T09:12:45.1574386Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.018912000581622124, "best_triton_pos": 1, "best_triton_time": 0.03356799855828285, "best_triton_kernel": "triton_convolution2d_209", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T09:12:45.1597660Z AUTOTUNE convolution(8x128x64x64, 256x128x3x3) 2025-09-07T09:12:45.1598010Z strides: [524288, 1, 8192, 128], [1152, 1, 384, 128] 2025-09-07T09:12:45.1598339Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:45.1598618Z convolution 0.0189 ms 100.0% 2025-09-07T09:12:45.1599375Z triton_convolution2d_209 0.0336 ms 56.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:45.1600640Z triton_convolution2d_208 0.0403 ms 47.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:45.1601894Z triton_convolution2d_211 0.0424 ms 44.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:45.1603274Z triton_convolution2d_210 0.0459 ms 41.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:45.1604822Z triton_convolution2d_206 0.0562 ms 33.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:45.1606069Z triton_convolution2d_205 0.0593 ms 31.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:45.1607293Z triton_convolution2d_207 0.1028 ms 18.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:12:45.1608261Z SingleProcess AUTOTUNE benchmarking takes 0.1180 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:12:45.3958822Z Autotune Choices Stats: 2025-09-07T09:12:45.3960411Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_223", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.010367999784648418, "best_triton_pos": 0} 2025-09-07T09:12:45.3985381Z AUTOTUNE mm(8192x256, 256x256) 2025-09-07T09:12:45.3985817Z strides: [256, 1], [1, 256] 2025-09-07T09:12:45.3986236Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:45.3987285Z triton_mm_223 0.0104 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:45.3988303Z mm 0.0106 ms 97.6% 2025-09-07T09:12:45.3989229Z triton_mm_230 0.0109 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:12:45.3990798Z triton_mm_219 0.0109 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:12:45.3992355Z triton_mm_222 0.0111 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:45.3994292Z triton_mm_226 0.0113 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:45.3995126Z triton_mm_225 0.0113 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:45.3995959Z triton_mm_229 0.0114 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:45.3996787Z triton_mm_221 0.0116 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:45.3997679Z triton_mm_228 0.0116 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:45.3998412Z SingleProcess AUTOTUNE benchmarking takes 0.2373 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:12:45.6314331Z Autotune Choices Stats: 2025-09-07T09:12:45.6315859Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_243", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.008320000022649765, "best_triton_pos": 0} 2025-09-07T09:12:45.6340336Z AUTOTUNE mm(8192x128, 128x128) 2025-09-07T09:12:45.6340778Z strides: [128, 1], [1, 128] 2025-09-07T09:12:45.6341184Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:45.6342620Z triton_mm_243 0.0083 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:12:45.6344534Z triton_mm_242 0.0084 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:45.6346055Z triton_mm_240 0.0084 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:45.6347585Z triton_mm_241 0.0084 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:45.6349105Z triton_mm_245 0.0085 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:45.6350641Z triton_mm_239 0.0086 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:12:45.6352208Z triton_mm_238 0.0086 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:12:45.6353547Z triton_mm_244 0.0087 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:45.6354201Z mm 0.0090 ms 92.5% 2025-09-07T09:12:45.6354716Z triton_mm_248 0.0092 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:45.6355461Z SingleProcess AUTOTUNE benchmarking takes 0.2350 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:12:45.7523951Z Autotune Choices Stats: 2025-09-07T09:12:45.7525318Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.02534399926662445, "best_triton_pos": 1, "best_triton_time": 0.030271999537944794, "best_triton_kernel": "triton_convolution2d_254", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T09:12:45.7551936Z AUTOTUNE convolution(8x128x32x32, 128x128x3x3) 2025-09-07T09:12:45.7552519Z strides: [131072, 1, 4096, 128], [1152, 1, 384, 128] 2025-09-07T09:12:45.7553028Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:45.7553476Z convolution 0.0253 ms 100.0% 2025-09-07T09:12:45.7554398Z triton_convolution2d_254 0.0303 ms 83.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:45.7555675Z triton_convolution2d_255 0.0348 ms 72.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:45.7556901Z triton_convolution2d_256 0.0364 ms 69.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:45.7558207Z triton_convolution2d_253 0.0385 ms 65.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:45.7559414Z triton_convolution2d_250 0.0481 ms 52.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:45.7560844Z triton_convolution2d_251 0.0525 ms 48.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:45.7562078Z triton_convolution2d_252 0.0993 ms 25.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:12:45.7563193Z SingleProcess AUTOTUNE benchmarking takes 0.1206 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:12:45.9436674Z Autotune Choices Stats: 2025-09-07T09:12:45.9438928Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.024480000138282776, "best_triton_pos": 1, "best_triton_time": 0.05580800026655197, "best_triton_kernel": "triton_convolution2d_481", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T09:12:45.9463382Z AUTOTUNE convolution(8x256x32x32, 512x256x3x3) 2025-09-07T09:12:45.9464245Z strides: [262144, 1, 8192, 256], [2304, 1, 768, 256] 2025-09-07T09:12:45.9464760Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:45.9465199Z convolution 0.0245 ms 100.0% 2025-09-07T09:12:45.9466367Z triton_convolution2d_481 0.0558 ms 43.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:45.9468318Z triton_convolution2d_480 0.0698 ms 35.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:45.9470268Z triton_convolution2d_483 0.0749 ms 32.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:45.9472633Z triton_convolution2d_482 0.0756 ms 32.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:45.9474390Z triton_convolution2d_477 0.1012 ms 24.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:45.9475530Z triton_convolution2d_478 0.1047 ms 23.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:45.9476666Z triton_convolution2d_479 0.1958 ms 12.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:12:45.9477631Z SingleProcess AUTOTUNE benchmarking takes 0.1709 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:12:46.1797826Z Autotune Choices Stats: 2025-09-07T09:12:46.1799419Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_496", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.009344000369310379, "best_triton_pos": 0} 2025-09-07T09:12:46.1825161Z AUTOTUNE mm(2048x512, 512x512) 2025-09-07T09:12:46.1825591Z strides: [512, 1], [1, 512] 2025-09-07T09:12:46.1825989Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:46.1827035Z triton_mm_496 0.0093 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:12:46.1828040Z mm 0.0095 ms 98.6% 2025-09-07T09:12:46.1829477Z triton_mm_495 0.0103 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:46.1831073Z triton_mm_491 0.0104 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:12:46.1832601Z triton_mm_502 0.0106 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:12:46.1834316Z triton_mm_494 0.0108 ms 86.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:46.1835147Z triton_mm_498 0.0110 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:46.1835994Z triton_mm_501 0.0111 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:46.1836830Z triton_mm_492 0.0114 ms 82.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:12:46.1837739Z triton_mm_493 0.0124 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:46.1838464Z SingleProcess AUTOTUNE benchmarking takes 0.2349 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:12:46.4186666Z Autotune Choices Stats: 2025-09-07T09:12:46.4188294Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_511", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.0077760000713169575, "best_triton_pos": 0} 2025-09-07T09:12:46.4213968Z AUTOTUNE mm(2048x256, 256x256) 2025-09-07T09:12:46.4214476Z strides: [256, 1], [1, 256] 2025-09-07T09:12:46.4214900Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:46.4215943Z triton_mm_511 0.0078 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:12:46.4216934Z mm 0.0080 ms 96.8% 2025-09-07T09:12:46.4217837Z triton_mm_510 0.0080 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:12:46.4219378Z triton_mm_514 0.0083 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:46.4220950Z triton_mm_515 0.0084 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:12:46.4222486Z triton_mm_513 0.0085 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:46.4224175Z triton_mm_517 0.0086 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:46.4225063Z triton_mm_506 0.0087 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:12:46.4225939Z triton_mm_505 0.0089 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:12:46.4227006Z triton_mm_504 0.0090 ms 86.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:12:46.4227826Z SingleProcess AUTOTUNE benchmarking takes 0.2383 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:12:46.5678048Z Autotune Choices Stats: 2025-09-07T09:12:46.5680210Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.017823999747633934, "best_triton_pos": 1, "best_triton_time": 0.05321599915623665, "best_triton_kernel": "triton_convolution2d_526", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T09:12:46.5704959Z AUTOTUNE convolution(8x256x16x16, 256x256x3x3) 2025-09-07T09:12:46.5705483Z strides: [65536, 1, 4096, 256], [2304, 1, 768, 256] 2025-09-07T09:12:46.5706006Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:46.5706486Z convolution 0.0178 ms 100.0% 2025-09-07T09:12:46.5707680Z triton_convolution2d_526 0.0532 ms 33.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:46.5709710Z triton_convolution2d_528 0.0670 ms 26.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:46.5711702Z triton_convolution2d_525 0.0682 ms 26.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:46.5714069Z triton_convolution2d_527 0.0747 ms 23.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:46.5715350Z triton_convolution2d_523 0.1003 ms 17.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:46.5716419Z triton_convolution2d_522 0.1024 ms 17.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:46.5717566Z triton_convolution2d_524 0.2030 ms 8.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:12:46.5718418Z SingleProcess AUTOTUNE benchmarking takes 0.1486 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:12:46.7921314Z Autotune Choices Stats: 2025-09-07T09:12:46.7924013Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.02969600073993206, "best_triton_pos": 1, "best_triton_time": 0.11091200262308121, "best_triton_kernel": "triton_convolution2d_753", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T09:12:46.7948701Z AUTOTUNE convolution(8x512x16x16, 1024x512x3x3) 2025-09-07T09:12:46.7949036Z strides: [131072, 1, 8192, 512], [4608, 1, 1536, 512] 2025-09-07T09:12:46.7949316Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:46.7949569Z convolution 0.0297 ms 100.0% 2025-09-07T09:12:46.7950253Z triton_convolution2d_753 0.1109 ms 26.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:46.7951603Z triton_convolution2d_752 0.1376 ms 21.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:46.7952765Z triton_convolution2d_755 0.1416 ms 21.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:46.7954031Z triton_convolution2d_754 0.1429 ms 20.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:46.7955078Z triton_convolution2d_750 0.2008 ms 14.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:46.7956138Z triton_convolution2d_749 0.2027 ms 14.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:46.7957276Z triton_convolution2d_751 0.2856 ms 10.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:12:46.7958137Z SingleProcess AUTOTUNE benchmarking takes 0.2035 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:12:47.0249920Z Autotune Choices Stats: 2025-09-07T09:12:47.0251101Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.009727999567985535, "best_triton_pos": 1, "best_triton_time": 0.009759999811649323, "best_triton_kernel": "triton_mm_764", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T09:12:47.0278209Z AUTOTUNE mm(512x1024, 1024x1024) 2025-09-07T09:12:47.0278669Z strides: [1024, 1], [1, 1024] 2025-09-07T09:12:47.0279085Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:47.0279517Z mm 0.0097 ms 100.0% 2025-09-07T09:12:47.0280479Z triton_mm_764 0.0098 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:12:47.0282059Z triton_mm_768 0.0108 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:12:47.0284252Z triton_mm_760 0.0124 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:12:47.0285211Z triton_mm_763 0.0125 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:12:47.0286178Z triton_mm_774 0.0128 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:12:47.0287145Z triton_mm_767 0.0132 ms 73.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:47.0288098Z triton_mm_759 0.0141 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:12:47.0289054Z triton_mm_773 0.0143 ms 68.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:47.0290199Z triton_mm_770 0.0147 ms 66.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:12:47.0291054Z SingleProcess AUTOTUNE benchmarking takes 0.2315 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:12:47.2628617Z Autotune Choices Stats: 2025-09-07T09:12:47.2630191Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_779", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007872000336647034, "best_triton_pos": 0} 2025-09-07T09:12:47.2658983Z AUTOTUNE mm(512x512, 512x512) 2025-09-07T09:12:47.2659441Z strides: [512, 1], [1, 512] 2025-09-07T09:12:47.2659853Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:47.2660905Z triton_mm_779 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:12:47.2661928Z mm 0.0084 ms 93.9% 2025-09-07T09:12:47.2662852Z triton_mm_783 0.0084 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:12:47.2664544Z triton_mm_787 0.0089 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:12:47.2665429Z triton_mm_778 0.0094 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:12:47.2666306Z triton_mm_777 0.0095 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:12:47.2667186Z triton_mm_782 0.0095 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:12:47.2668270Z triton_mm_776 0.0097 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:12:47.2669159Z triton_mm_786 0.0097 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:47.2670049Z triton_mm_793 0.0101 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:12:47.2670834Z SingleProcess AUTOTUNE benchmarking takes 0.2366 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:12:47.4651933Z Autotune Choices Stats: 2025-09-07T09:12:47.4653314Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.017472000792622566, "best_triton_pos": 1, "best_triton_time": 0.10777600109577179, "best_triton_kernel": "triton_convolution2d_798", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T09:12:47.4680805Z AUTOTUNE convolution(8x512x8x8, 512x512x3x3) 2025-09-07T09:12:47.4681122Z strides: [32768, 1, 4096, 512], [4608, 1, 1536, 512] 2025-09-07T09:12:47.4681386Z dtypes: torch.float16, torch.float16 2025-09-07T09:12:47.4681612Z convolution 0.0175 ms 100.0% 2025-09-07T09:12:47.4682246Z triton_convolution2d_798 0.1078 ms 16.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:47.4683591Z triton_convolution2d_797 0.1341 ms 13.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:47.4684978Z triton_convolution2d_800 0.1354 ms 12.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:47.4686198Z triton_convolution2d_799 0.1407 ms 12.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:12:47.4687413Z triton_convolution2d_795 0.2053 ms 8.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:47.4688629Z triton_convolution2d_794 0.2062 ms 8.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:12:47.4689847Z triton_convolution2d_796 0.2409 ms 7.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:12:47.4690814Z SingleProcess AUTOTUNE benchmarking takes 0.2017 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:12:47.7150731Z Autotune Choices Stats: 2025-09-07T09:12:47.7152272Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_921", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.008991999551653862, "best_triton_pos": 0} 2025-09-07T09:12:47.7181022Z AUTOTUNE addmm(8x1000, 8x1024, 1024x1000) 2025-09-07T09:12:47.7181501Z strides: [0, 1], [1024, 1], [1, 1024] 2025-09-07T09:12:47.7182009Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T09:12:47.7183566Z triton_mm_921 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:12:47.7185008Z triton_mm_925 0.0095 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:12:47.7185585Z bias_addmm 0.0097 ms 92.4% 2025-09-07T09:12:47.7186139Z triton_mm_929 0.0107 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:12:47.7187040Z triton_mm_933 0.0113 ms 79.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:12:47.7187934Z triton_mm_920 0.0118 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:12:47.7188829Z triton_mm_919 0.0122 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:12:47.7189709Z triton_mm_924 0.0125 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:12:47.7190600Z triton_mm_918 0.0127 ms 71.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:12:47.7191172Z addmm 0.0129 ms 69.7% 2025-09-07T09:12:47.7191591Z SingleProcess AUTOTUNE benchmarking takes 0.2377 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T09:13:05.6999585Z Autotune Choices Stats: 2025-09-07T09:13:05.7001482Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_959", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.0066559999249875546, "best_triton_pos": 0} 2025-09-07T09:13:05.7030715Z AUTOTUNE mm(1000x8, 8x1024) 2025-09-07T09:13:05.7030995Z strides: [1, 1000], [1024, 1] 2025-09-07T09:13:05.7031269Z dtypes: torch.float16, torch.float16 2025-09-07T09:13:05.7031936Z triton_mm_959 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:13:05.7032957Z triton_mm_957 0.0067 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:13:05.7034476Z triton_mm_963 0.0067 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:13:05.7035470Z triton_mm_956 0.0069 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:13:05.7036439Z triton_mm_960 0.0069 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:13:05.7037529Z triton_mm_962 0.0069 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:13:05.7038500Z triton_mm_954 0.0069 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:13:05.7039479Z triton_mm_961 0.0069 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:13:05.7040735Z triton_mm_964 0.0069 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:13:05.7041602Z triton_mm_952 0.0070 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:13:05.7042339Z SingleProcess AUTOTUNE benchmarking takes 0.1584 seconds and 0.0003 seconds precompiling for 17 choices 2025-09-07T09:13:06.4733482Z Autotune Choices Stats: 2025-09-07T09:13:06.4735031Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "mm", "best_time": 0.009279999881982803, "best_triton_pos": 1, "best_triton_time": 0.009727999567985535, "best_triton_kernel": "triton_mm_942", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T09:13:06.4766057Z AUTOTUNE mm(8x1000, 1000x1024) 2025-09-07T09:13:06.4766334Z strides: [1000, 1], [1024, 1] 2025-09-07T09:13:06.4766642Z dtypes: torch.float16, torch.float16 2025-09-07T09:13:06.4766914Z mm 0.0093 ms 100.0% 2025-09-07T09:13:06.4767521Z triton_mm_942 0.0097 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:13:06.4768521Z triton_mm_938 0.0099 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:13:06.4769504Z triton_mm_946 0.0102 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:13:06.4771334Z triton_mm_936 0.0115 ms 81.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:13:06.4772323Z triton_mm_937 0.0115 ms 80.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:13:06.4773291Z triton_mm_950 0.0116 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:13:06.4774448Z triton_mm_941 0.0117 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:13:06.4775408Z triton_mm_948 0.0126 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:13:06.4776385Z triton_mm_945 0.0130 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:13:06.4777245Z SingleProcess AUTOTUNE benchmarking takes 0.1800 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:13:16.0458077Z W0907 09:13:16.044000 43405 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T09:13:45.3868573Z pass 2025-09-07T09:13:52.0371770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T09:13:52.0373006Z import pynvml # type: ignore[import] 2025-09-07T09:13:55.0707803Z 2025-09-07T09:13:56.6579866Z loading model: 0it [00:00, ?it/s] 2025-09-07T09:13:56.6580254Z loading model: 0it [00:01, ?it/s] 2025-09-07T09:13:56.6580577Z cuda train deit_base_distilled_patch16_224 2025-09-07T09:14:11.2177895Z Autotune Choices Stats: 2025-09-07T09:14:11.2180026Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.018848000094294548, "best_triton_pos": 1, "best_triton_time": 0.024224000051617622, "best_triton_kernel": "triton_mm_62", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T09:14:11.2207889Z AUTOTUNE addmm(1584x3072, 1584x768, 768x3072) 2025-09-07T09:14:11.2208220Z strides: [0, 1], [768, 1], [1, 768] 2025-09-07T09:14:11.2208529Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T09:14:11.2208853Z bias_addmm 0.0188 ms 100.0% 2025-09-07T09:14:11.2209524Z triton_mm_62 0.0242 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:11.2210526Z triton_mm_56 0.0243 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:11.2211498Z triton_mm_63 0.0280 ms 67.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:14:11.2212453Z triton_mm_55 0.0282 ms 66.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:11.2213422Z triton_mm_61 0.0286 ms 65.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:11.2214298Z addmm 0.0289 ms 65.2% 2025-09-07T09:14:11.2215221Z triton_mm_59 0.0296 ms 63.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:11.2216130Z triton_mm_57 0.0308 ms 61.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:14:11.2217028Z triton_mm_58 0.0308 ms 61.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:11.2217811Z SingleProcess AUTOTUNE benchmarking takes 0.2985 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T09:14:11.8859849Z Autotune Choices Stats: 2025-09-07T09:14:11.8861037Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_940", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.00825599953532219, "best_triton_pos": 0} 2025-09-07T09:14:11.8890278Z AUTOTUNE mm(8x768, 768x1000) 2025-09-07T09:14:11.8890544Z strides: [768, 1], [1, 768] 2025-09-07T09:14:11.8890789Z dtypes: torch.float16, torch.float16 2025-09-07T09:14:11.8891441Z triton_mm_940 0.0083 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:14:11.8892447Z triton_mm_944 0.0086 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:14:11.8893073Z mm 0.0089 ms 92.8% 2025-09-07T09:14:11.8893646Z triton_mm_948 0.0095 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:14:11.8895380Z triton_mm_952 0.0099 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:14:11.8896281Z triton_mm_939 0.0105 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:14:11.8897170Z triton_mm_943 0.0107 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:11.8898059Z triton_mm_938 0.0108 ms 76.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:14:11.8898976Z triton_mm_937 0.0113 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:14:11.8899885Z triton_mm_947 0.0116 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:11.8900682Z SingleProcess AUTOTUNE benchmarking takes 0.2084 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:14:14.8849232Z Autotune Choices Stats: 2025-09-07T09:14:14.8850689Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.12966400384902954, "best_triton_pos": 1, "best_triton_time": 0.13385599851608276, "best_triton_kernel": "triton_convolution2d_6", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T09:14:14.8881540Z AUTOTUNE convolution(8x3x224x224, 768x3x16x16) 2025-09-07T09:14:14.8881828Z strides: [150528, 50176, 224, 1], [768, 256, 16, 1] 2025-09-07T09:14:14.8882801Z dtypes: torch.float16, torch.float16 2025-09-07T09:14:14.8883072Z convolution 0.1297 ms 100.0% 2025-09-07T09:14:14.8884166Z triton_convolution2d_6 0.1339 ms 96.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:14:14.8885452Z triton_convolution2d_3 0.1444 ms 89.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:14:14.8886724Z triton_convolution2d_1 0.1482 ms 87.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:14:14.8887986Z triton_convolution2d_4 0.1752 ms 74.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:14:14.8889218Z triton_convolution2d_5 0.1969 ms 65.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:14:14.8890455Z triton_convolution2d_0 0.2160 ms 60.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:14:14.8891698Z triton_convolution2d_2 0.4033 ms 32.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:14:14.8892676Z SingleProcess AUTOTUNE benchmarking takes 0.2148 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:14:15.1776633Z Autotune Choices Stats: 2025-09-07T09:14:15.1778676Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.016224000602960587, "best_triton_pos": 1, "best_triton_time": 0.01759999990463257, "best_triton_kernel": "triton_mm_24", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T09:14:15.1808092Z AUTOTUNE addmm(1584x2304, 1584x768, 768x2304) 2025-09-07T09:14:15.1808388Z strides: [0, 1], [768, 1], [1, 768] 2025-09-07T09:14:15.1808683Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T09:14:15.1809001Z bias_addmm 0.0162 ms 100.0% 2025-09-07T09:14:15.1809609Z triton_mm_24 0.0176 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:15.1810590Z triton_mm_23 0.0204 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:15.1811565Z triton_mm_25 0.0209 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:14:15.1812525Z triton_mm_18 0.0212 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:15.1813481Z triton_mm_16 0.0226 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:15.1814428Z addmm 0.0239 ms 67.9% 2025-09-07T09:14:15.1814992Z triton_mm_20 0.0241 ms 67.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:15.1816124Z triton_mm_14 0.0244 ms 66.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:14:15.1817044Z triton_mm_17 0.0246 ms 65.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:15.1817836Z SingleProcess AUTOTUNE benchmarking takes 0.2915 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T09:14:15.4061848Z Autotune Choices Stats: 2025-09-07T09:14:15.4064373Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01056000031530857, "best_triton_pos": 1, "best_triton_time": 0.01196799986064434, "best_triton_kernel": "triton_mm_44", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T09:14:15.4092040Z AUTOTUNE mm(1584x768, 768x768) 2025-09-07T09:14:15.4092287Z strides: [768, 1], [1, 768] 2025-09-07T09:14:15.4092536Z dtypes: torch.float16, torch.float16 2025-09-07T09:14:15.4092811Z mm 0.0106 ms 100.0% 2025-09-07T09:14:15.4093396Z triton_mm_44 0.0120 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:14:15.4094743Z triton_mm_37 0.0127 ms 82.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:15.4095831Z triton_mm_33 0.0128 ms 82.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:14:15.4096831Z triton_mm_43 0.0130 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:15.4098156Z triton_mm_36 0.0139 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:15.4099148Z triton_mm_40 0.0141 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:15.4100130Z triton_mm_38 0.0142 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:14:15.4101109Z triton_mm_34 0.0157 ms 67.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:14:15.4102090Z triton_mm_35 0.0161 ms 65.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:15.4102960Z SingleProcess AUTOTUNE benchmarking takes 0.2276 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:14:15.6799416Z Autotune Choices Stats: 2025-09-07T09:14:15.6801413Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.019231999292969704, "best_triton_pos": 1, "best_triton_time": 0.024224000051617622, "best_triton_kernel": "triton_mm_82", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T09:14:15.6829796Z AUTOTUNE mm(1584x3072, 3072x768) 2025-09-07T09:14:15.6830237Z strides: [3072, 1], [1, 3072] 2025-09-07T09:14:15.6830706Z dtypes: torch.float16, torch.float16 2025-09-07T09:14:15.6831140Z mm 0.0192 ms 100.0% 2025-09-07T09:14:15.6832500Z triton_mm_82 0.0242 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:14:15.6834474Z triton_mm_75 0.0300 ms 64.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:15.6835498Z triton_mm_71 0.0306 ms 62.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:14:15.6836339Z triton_mm_81 0.0322 ms 59.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:15.6837271Z triton_mm_72 0.0332 ms 58.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:14:15.6838129Z triton_mm_76 0.0342 ms 56.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:14:15.6838979Z triton_mm_74 0.0357 ms 53.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:15.6839810Z triton_mm_78 0.0359 ms 53.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:15.6840648Z triton_mm_68 0.0434 ms 44.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:14:15.6841387Z SingleProcess AUTOTUNE benchmarking takes 0.2726 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:14:26.9469586Z Autotune Choices Stats: 2025-09-07T09:14:26.9471527Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.018719999119639397, "best_triton_pos": 1, "best_triton_time": 0.02006400004029274, "best_triton_kernel": "triton_mm_1035", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T09:14:26.9504617Z AUTOTUNE mm(1584x768, 768x3072) 2025-09-07T09:14:26.9505028Z strides: [768, 1], [3072, 1] 2025-09-07T09:14:26.9505409Z dtypes: torch.float16, torch.float16 2025-09-07T09:14:26.9505807Z mm 0.0187 ms 100.0% 2025-09-07T09:14:26.9506712Z triton_mm_1035 0.0201 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:26.9508284Z triton_mm_1028 0.0217 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:26.9509821Z triton_mm_1036 0.0226 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:26.9511368Z triton_mm_1030 0.0235 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:26.9512840Z triton_mm_1029 0.0241 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:26.9514625Z triton_mm_1037 0.0255 ms 73.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:14:26.9516142Z triton_mm_1032 0.0273 ms 68.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:26.9518161Z triton_mm_1033 0.0279 ms 67.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:26.9519707Z triton_mm_1031 0.0287 ms 65.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:14:26.9521025Z SingleProcess AUTOTUNE benchmarking takes 0.2548 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T09:14:27.7131299Z Autotune Choices Stats: 2025-09-07T09:14:27.7143511Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01756799966096878, "best_triton_pos": 1, "best_triton_time": 0.024191999807953835, "best_triton_kernel": "triton_mm_1049", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T09:14:27.7167161Z AUTOTUNE mm(768x1584, 1584x3072) 2025-09-07T09:14:27.7167540Z strides: [1, 768], [3072, 1] 2025-09-07T09:14:27.7167901Z dtypes: torch.float16, torch.float16 2025-09-07T09:14:27.7168265Z mm 0.0176 ms 100.0% 2025-09-07T09:14:27.7169143Z triton_mm_1049 0.0242 ms 72.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:27.7170601Z triton_mm_1055 0.0253 ms 69.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:27.7172041Z triton_mm_1047 0.0274 ms 64.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:27.7173494Z triton_mm_1054 0.0283 ms 62.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:27.7175677Z triton_mm_1048 0.0288 ms 61.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:27.7177140Z triton_mm_1051 0.0297 ms 59.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:27.7178573Z triton_mm_1045 0.0298 ms 58.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:14:27.7180005Z triton_mm_1050 0.0305 ms 57.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:14:27.7181448Z triton_mm_1056 0.0306 ms 57.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:14:27.7182699Z SingleProcess AUTOTUNE benchmarking takes 0.2486 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:14:28.1600326Z Autotune Choices Stats: 2025-09-07T09:14:28.1602226Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.017791999503970146, "best_triton_pos": 1, "best_triton_time": 0.023679999634623528, "best_triton_kernel": "triton_mm_1087", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T09:14:28.1635325Z AUTOTUNE mm(3072x1584, 1584x768) 2025-09-07T09:14:28.1635717Z strides: [1, 3072], [768, 1] 2025-09-07T09:14:28.1636085Z dtypes: torch.float16, torch.float16 2025-09-07T09:14:28.1636469Z mm 0.0178 ms 100.0% 2025-09-07T09:14:28.1638191Z triton_mm_1087 0.0237 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:28.1639775Z triton_mm_1093 0.0246 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:28.1641266Z triton_mm_1085 0.0269 ms 66.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:28.1642703Z triton_mm_1092 0.0276 ms 64.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:28.1644370Z triton_mm_1086 0.0284 ms 62.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:28.1645912Z triton_mm_1089 0.0284 ms 62.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:28.1647398Z triton_mm_1090 0.0294 ms 60.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:28.1648897Z triton_mm_1083 0.0297 ms 60.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:14:28.1650375Z triton_mm_1088 0.0298 ms 59.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:14:28.1651678Z SingleProcess AUTOTUNE benchmarking takes 0.2472 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:14:28.6564971Z Autotune Choices Stats: 2025-09-07T09:14:28.6567371Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.016256000846624374, "best_triton_pos": 1, "best_triton_time": 0.01820800080895424, "best_triton_kernel": "triton_mm_1170", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T09:14:28.6600726Z AUTOTUNE mm(2304x1584, 1584x768) 2025-09-07T09:14:28.6601117Z strides: [1, 2304], [768, 1] 2025-09-07T09:14:28.6601475Z dtypes: torch.float16, torch.float16 2025-09-07T09:14:28.6601851Z mm 0.0163 ms 100.0% 2025-09-07T09:14:28.6602730Z triton_mm_1170 0.0182 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:14:28.6604675Z triton_mm_1163 0.0198 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:28.6606164Z triton_mm_1169 0.0206 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:28.6607630Z triton_mm_1162 0.0218 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:28.6609067Z triton_mm_1164 0.0221 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:14:28.6610511Z triton_mm_1166 0.0224 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:28.6612263Z triton_mm_1159 0.0242 ms 67.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:14:28.6613993Z triton_mm_1161 0.0258 ms 63.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:28.6615467Z triton_mm_1168 0.0266 ms 61.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:28.6616738Z SingleProcess AUTOTUNE benchmarking takes 0.2346 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:14:29.1156704Z Autotune Choices Stats: 2025-09-07T09:14:29.1158360Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_1007", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006752000190317631, "best_triton_pos": 0} 2025-09-07T09:14:29.1193271Z AUTOTUNE mm(1000x8, 8x768) 2025-09-07T09:14:29.1193619Z strides: [1, 1000], [768, 1] 2025-09-07T09:14:29.1194101Z dtypes: torch.float16, torch.float16 2025-09-07T09:14:29.1194843Z triton_mm_1007 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:14:29.1195704Z triton_mm_1010 0.0068 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:14:29.1196508Z triton_mm_1013 0.0068 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:14:29.1197442Z triton_mm_1008 0.0068 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:14:29.1198809Z triton_mm_1009 0.0068 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:14:29.1199616Z triton_mm_1003 0.0069 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:14:29.1200417Z triton_mm_1011 0.0069 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:29.1201212Z triton_mm_1012 0.0069 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:29.1202011Z triton_mm_1004 0.0069 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:14:29.1202809Z triton_mm_1015 0.0071 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:29.1203513Z SingleProcess AUTOTUNE benchmarking takes 0.1554 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T09:14:29.5404874Z Autotune Choices Stats: 2025-09-07T09:14:29.5406273Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.011455999687314034, "best_triton_pos": 1, "best_triton_time": 0.013439999893307686, "best_triton_kernel": "triton_mm_1126", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T09:14:29.5440271Z AUTOTUNE mm(768x1584, 1584x768) 2025-09-07T09:14:29.5440669Z strides: [1, 768], [768, 1] 2025-09-07T09:14:29.5441440Z dtypes: torch.float16, torch.float16 2025-09-07T09:14:29.5441760Z mm 0.0115 ms 100.0% 2025-09-07T09:14:29.5442395Z triton_mm_1126 0.0134 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:14:29.5443420Z triton_mm_1122 0.0164 ms 69.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:14:29.5444622Z triton_mm_1132 0.0165 ms 69.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:14:29.5445635Z triton_mm_1125 0.0170 ms 67.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:29.5446617Z triton_mm_1121 0.0172 ms 66.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:14:29.5447602Z triton_mm_1124 0.0186 ms 61.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:29.5448601Z triton_mm_1128 0.0189 ms 60.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:29.5449799Z triton_mm_1131 0.0190 ms 60.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:29.5450867Z triton_mm_1118 0.0226 ms 50.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:14:29.5451793Z SingleProcess AUTOTUNE benchmarking takes 0.2168 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:14:31.1638848Z Autotune Choices Stats: 2025-09-07T09:14:31.1640169Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "mm", "best_time": 0.008799999952316284, "best_triton_pos": 1, "best_triton_time": 0.009312000125646591, "best_triton_kernel": "triton_mm_957", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2"} 2025-09-07T09:14:31.1674537Z AUTOTUNE mm(8x1000, 1000x768) 2025-09-07T09:14:31.1674853Z strides: [1000, 1], [768, 1] 2025-09-07T09:14:31.1675128Z dtypes: torch.float16, torch.float16 2025-09-07T09:14:31.1675410Z mm 0.0088 ms 100.0% 2025-09-07T09:14:31.1676023Z triton_mm_957 0.0093 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:14:31.1677168Z triton_mm_961 0.0096 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:14:31.1678159Z triton_mm_965 0.0100 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:14:31.1679146Z triton_mm_956 0.0114 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:14:31.1680224Z triton_mm_955 0.0115 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:14:31.1681113Z triton_mm_969 0.0115 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:14:31.1682620Z triton_mm_960 0.0124 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:31.1683523Z triton_mm_967 0.0129 ms 68.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:14:31.1684598Z triton_mm_964 0.0130 ms 67.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:31.1685394Z SingleProcess AUTOTUNE benchmarking takes 0.1807 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:14:31.4326852Z Autotune Choices Stats: 2025-09-07T09:14:31.4328121Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.018400000408291817, "best_triton_pos": 1, "best_triton_time": 0.024927999824285507, "best_triton_kernel": "triton_mm_1075", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T09:14:31.4360784Z AUTOTUNE mm(1584x3072, 3072x768) 2025-09-07T09:14:31.4361091Z strides: [3072, 1], [768, 1] 2025-09-07T09:14:31.4361376Z dtypes: torch.float16, torch.float16 2025-09-07T09:14:31.4361662Z mm 0.0184 ms 100.0% 2025-09-07T09:14:31.4362305Z triton_mm_1075 0.0249 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:14:31.4363307Z triton_mm_1068 0.0289 ms 63.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:31.4364510Z triton_mm_1064 0.0301 ms 61.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:14:31.4365974Z triton_mm_1069 0.0303 ms 60.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:14:31.4366999Z triton_mm_1074 0.0312 ms 58.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:31.4368012Z triton_mm_1065 0.0317 ms 58.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:14:31.4369017Z triton_mm_1067 0.0342 ms 53.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:31.4370195Z triton_mm_1071 0.0348 ms 52.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:31.4371169Z triton_mm_1066 0.0429 ms 42.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:31.4372011Z SingleProcess AUTOTUNE benchmarking takes 0.2659 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:14:31.6319787Z Autotune Choices Stats: 2025-09-07T09:14:31.6320962Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.010304000228643417, "best_triton_pos": 1, "best_triton_time": 0.01206399966031313, "best_triton_kernel": "triton_mm_1113", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T09:14:31.6352693Z AUTOTUNE mm(1584x768, 768x768) 2025-09-07T09:14:31.6352999Z strides: [768, 1], [768, 1] 2025-09-07T09:14:31.6354153Z dtypes: torch.float16, torch.float16 2025-09-07T09:14:31.6354481Z mm 0.0103 ms 100.0% 2025-09-07T09:14:31.6355108Z triton_mm_1113 0.0121 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:14:31.6356154Z triton_mm_1106 0.0124 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:31.6357232Z triton_mm_1102 0.0127 ms 80.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:14:31.6358210Z triton_mm_1112 0.0128 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:31.6359212Z triton_mm_1105 0.0134 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:31.6360241Z triton_mm_1109 0.0137 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:31.6361089Z triton_mm_1107 0.0143 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:14:31.6361934Z triton_mm_1104 0.0152 ms 67.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:31.6362774Z triton_mm_1103 0.0157 ms 65.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:14:31.6363940Z SingleProcess AUTOTUNE benchmarking takes 0.1983 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:14:31.8745746Z Autotune Choices Stats: 2025-09-07T09:14:31.8746986Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.015615999698638916, "best_triton_pos": 1, "best_triton_time": 0.021023999899625778, "best_triton_kernel": "triton_mm_1151", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T09:14:31.8779932Z AUTOTUNE mm(1584x2304, 2304x768) 2025-09-07T09:14:31.8780196Z strides: [2304, 1], [768, 1] 2025-09-07T09:14:31.8780458Z dtypes: torch.float16, torch.float16 2025-09-07T09:14:31.8780723Z mm 0.0156 ms 100.0% 2025-09-07T09:14:31.8781315Z triton_mm_1151 0.0210 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:14:31.8782234Z triton_mm_1144 0.0231 ms 67.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:31.8783134Z triton_mm_1140 0.0242 ms 64.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:14:31.8784395Z triton_mm_1145 0.0246 ms 63.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:14:31.8785302Z triton_mm_1150 0.0250 ms 62.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:31.8786464Z triton_mm_1141 0.0262 ms 59.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:14:31.8787401Z triton_mm_1143 0.0269 ms 58.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:31.8788319Z triton_mm_1147 0.0274 ms 57.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:14:31.8789222Z triton_mm_1142 0.0333 ms 46.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:14:31.8790019Z SingleProcess AUTOTUNE benchmarking takes 0.2418 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:14:36.1924775Z W0907 09:14:36.191000 50694 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T09:14:57.2433514Z pass 2025-09-07T09:15:02.7587949Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T09:15:02.7588994Z import pynvml # type: ignore[import] 2025-09-07T09:15:05.7582243Z 2025-09-07T09:15:07.6471124Z loading model: 0it [00:00, ?it/s] 2025-09-07T09:15:07.6471475Z loading model: 0it [00:01, ?it/s] 2025-09-07T09:15:07.6471777Z cuda train dla102 2025-09-07T09:15:48.2164068Z Autotune Choices Stats: 2025-09-07T09:15:48.2165596Z {"num_choices": 6, "num_triton_choices": 5, "best_kernel": "convolution", "best_time": 0.07635200023651123, "best_triton_pos": 1, "best_triton_time": 0.10678400099277496, "best_triton_kernel": "triton_convolution2d_4", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T09:15:48.2200512Z AUTOTUNE convolution(8x3x224x224, 16x3x7x7) 2025-09-07T09:15:48.2201427Z strides: [150528, 1, 672, 3], [147, 1, 21, 3] 2025-09-07T09:15:48.2201764Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:48.2202032Z convolution 0.0764 ms 100.0% 2025-09-07T09:15:48.2202696Z triton_convolution2d_4 0.1068 ms 71.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:48.2204024Z triton_convolution2d_1 0.1150 ms 66.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:48.2205150Z triton_convolution2d_3 0.1178 ms 64.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:48.2206285Z triton_convolution2d_0 0.1344 ms 56.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:48.2207500Z triton_convolution2d_2 0.1647 ms 46.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:15:48.2208472Z SingleProcess AUTOTUNE benchmarking takes 0.1344 seconds and 0.0002 seconds precompiling for 6 choices 2025-09-07T09:15:48.2976515Z Autotune Choices Stats: 2025-09-07T09:15:48.2977874Z {"num_choices": 6, "num_triton_choices": 5, "best_kernel": "triton_convolution2d_6", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.029023999348282814, "best_triton_pos": 0} 2025-09-07T09:15:48.3010899Z AUTOTUNE convolution(8x16x224x224, 16x16x3x3) 2025-09-07T09:15:48.3011270Z strides: [802816, 1, 3584, 16], [144, 1, 48, 16] 2025-09-07T09:15:48.3011566Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:48.3012333Z triton_convolution2d_6 0.0290 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:48.3013561Z triton_convolution2d_5 0.0300 ms 96.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:48.3014947Z triton_convolution2d_9 0.0345 ms 84.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:48.3015699Z convolution 0.0355 ms 81.9% 2025-09-07T09:15:48.3016420Z triton_convolution2d_8 0.0364 ms 79.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:48.3017629Z triton_convolution2d_7 0.0377 ms 76.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:15:48.3018518Z SingleProcess AUTOTUNE benchmarking takes 0.0796 seconds and 0.0002 seconds precompiling for 6 choices 2025-09-07T09:15:48.3866603Z Autotune Choices Stats: 2025-09-07T09:15:48.3867902Z {"num_choices": 7, "num_triton_choices": 6, "best_kernel": "triton_convolution2d_15", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.01894400082528591, "best_triton_pos": 0} 2025-09-07T09:15:48.3900538Z AUTOTUNE convolution(8x16x224x224, 32x16x3x3) 2025-09-07T09:15:48.3901099Z strides: [802816, 1, 3584, 16], [144, 1, 48, 16] 2025-09-07T09:15:48.3901605Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:48.3902825Z triton_convolution2d_15 0.0189 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:48.3905370Z triton_convolution2d_11 0.0192 ms 98.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:48.3906672Z convolution 0.0201 ms 94.4% 2025-09-07T09:15:48.3907527Z triton_convolution2d_10 0.0205 ms 92.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:48.3908638Z triton_convolution2d_14 0.0213 ms 88.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:48.3909760Z triton_convolution2d_13 0.0217 ms 87.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:48.3910883Z triton_convolution2d_12 0.0308 ms 61.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:15:48.3911937Z SingleProcess AUTOTUNE benchmarking takes 0.0885 seconds and 0.0002 seconds precompiling for 7 choices 2025-09-07T09:15:48.6018847Z Autotune Choices Stats: 2025-09-07T09:15:48.6020424Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_26", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.008671999908983707, "best_triton_pos": 0} 2025-09-07T09:15:48.6054003Z AUTOTUNE mm(25088x32, 32x128) 2025-09-07T09:15:48.6054277Z strides: [32, 1], [1, 32] 2025-09-07T09:15:48.6054520Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:48.6055162Z triton_mm_26 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:48.6056155Z triton_mm_24 0.0087 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:48.6057860Z triton_mm_29 0.0087 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:48.6059437Z triton_mm_27 0.0088 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:48.6060958Z triton_mm_22 0.0089 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:15:48.6062486Z triton_mm_25 0.0090 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:48.6064321Z triton_mm_30 0.0090 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:15:48.6066253Z triton_mm_28 0.0092 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:48.6067608Z triton_mm_31 0.0092 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:48.6068507Z triton_mm_32 0.0093 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:48.6069293Z SingleProcess AUTOTUNE benchmarking takes 0.2149 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:15:48.8007204Z Autotune Choices Stats: 2025-09-07T09:15:48.8008222Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_43", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.012319999746978283, "best_triton_pos": 0} 2025-09-07T09:15:48.8043144Z AUTOTUNE mm(100352x32, 32x64) 2025-09-07T09:15:48.8043568Z strides: [32, 1], [1, 32] 2025-09-07T09:15:48.8044339Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:48.8045409Z triton_mm_43 0.0123 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:48.8047171Z triton_mm_44 0.0127 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:48.8048198Z triton_mm_41 0.0127 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:48.8049377Z triton_mm_40 0.0128 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:48.8050346Z triton_mm_39 0.0128 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:15:48.8051299Z triton_mm_42 0.0128 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:48.8052249Z triton_mm_47 0.0128 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:15:48.8053204Z triton_mm_48 0.0128 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:48.8054362Z triton_mm_45 0.0129 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:48.8055320Z triton_mm_46 0.0129 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:48.8056160Z SingleProcess AUTOTUNE benchmarking takes 0.1984 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T09:15:48.9051004Z Autotune Choices Stats: 2025-09-07T09:15:48.9052341Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.017632000148296356, "best_triton_pos": 1, "best_triton_time": 0.02115200087428093, "best_triton_kernel": "triton_convolution2d_54", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T09:15:48.9087440Z AUTOTUNE convolution(8x64x112x112, 64x64x3x3) 2025-09-07T09:15:48.9087812Z strides: [802816, 1, 7168, 64], [576, 1, 192, 64] 2025-09-07T09:15:48.9088109Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:48.9088380Z convolution 0.0176 ms 100.0% 2025-09-07T09:15:48.9089118Z triton_convolution2d_54 0.0212 ms 83.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:48.9090344Z triton_convolution2d_52 0.0216 ms 81.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:48.9091557Z triton_convolution2d_53 0.0216 ms 81.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:48.9092770Z triton_convolution2d_55 0.0267 ms 66.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:48.9094128Z triton_convolution2d_49 0.0281 ms 62.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:48.9095346Z triton_convolution2d_50 0.0338 ms 52.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:48.9096572Z triton_convolution2d_51 0.0578 ms 30.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:15:48.9097684Z SingleProcess AUTOTUNE benchmarking takes 0.1038 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:15:49.1393166Z Autotune Choices Stats: 2025-09-07T09:15:49.1394979Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_64", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009151999838650227, "best_triton_pos": 0} 2025-09-07T09:15:49.1429456Z AUTOTUNE mm(25088x64, 64x128) 2025-09-07T09:15:49.1429874Z strides: [64, 1], [1, 64] 2025-09-07T09:15:49.1430302Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:49.1431312Z triton_mm_64 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:49.1432875Z triton_mm_68 0.0095 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:49.1434728Z triton_mm_65 0.0096 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:49.1436283Z triton_mm_70 0.0099 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:49.1437695Z triton_mm_66 0.0099 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:49.1438516Z triton_mm_62 0.0100 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:15:49.1439344Z triton_mm_69 0.0100 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:49.1440370Z triton_mm_67 0.0100 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:49.1441193Z triton_mm_74 0.0101 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:49.1442041Z triton_mm_71 0.0105 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:15:49.1442777Z SingleProcess AUTOTUNE benchmarking takes 0.2326 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:15:49.3611184Z Autotune Choices Stats: 2025-09-07T09:15:49.3612145Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_86", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.010080000385642052, "best_triton_pos": 0} 2025-09-07T09:15:49.3648473Z AUTOTUNE mm(25088x128, 128x64) 2025-09-07T09:15:49.3648741Z strides: [128, 1], [1, 128] 2025-09-07T09:15:49.3648987Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:49.3649621Z triton_mm_86 0.0101 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:49.3650578Z triton_mm_82 0.0105 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:49.3651754Z triton_mm_87 0.0106 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:49.3652720Z triton_mm_84 0.0108 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:49.3653671Z triton_mm_89 0.0108 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:49.3654818Z triton_mm_85 0.0108 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:49.3655766Z triton_mm_91 0.0108 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:49.3656718Z triton_mm_88 0.0108 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:49.3657648Z triton_mm_83 0.0109 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:49.3658520Z triton_mm_81 0.0111 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:15:49.3659298Z SingleProcess AUTOTUNE benchmarking takes 0.2214 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T09:15:49.4644293Z Autotune Choices Stats: 2025-09-07T09:15:49.4646540Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.016383999958634377, "best_triton_pos": 1, "best_triton_time": 0.018783999606966972, "best_triton_kernel": "triton_convolution2d_98", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T09:15:49.4681259Z AUTOTUNE convolution(8x64x56x56, 64x64x3x3) 2025-09-07T09:15:49.4681785Z strides: [200704, 1, 3584, 64], [576, 1, 192, 64] 2025-09-07T09:15:49.4682281Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:49.4682735Z convolution 0.0164 ms 100.0% 2025-09-07T09:15:49.4684167Z triton_convolution2d_98 0.0188 ms 87.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:49.4686142Z triton_convolution2d_97 0.0191 ms 85.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:49.4688068Z triton_convolution2d_96 0.0198 ms 82.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:49.4689282Z triton_convolution2d_93 0.0239 ms 68.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:49.4690493Z triton_convolution2d_99 0.0259 ms 63.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:49.4691704Z triton_convolution2d_94 0.0315 ms 52.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:49.4693080Z triton_convolution2d_95 0.0514 ms 31.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:15:49.4694213Z SingleProcess AUTOTUNE benchmarking takes 0.1028 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:15:49.6983478Z Autotune Choices Stats: 2025-09-07T09:15:49.6985325Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_130", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.014112000353634357, "best_triton_pos": 0} 2025-09-07T09:15:49.7020597Z AUTOTUNE mm(25088x256, 256x128) 2025-09-07T09:15:49.7021011Z strides: [256, 1], [1, 256] 2025-09-07T09:15:49.7021435Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:49.7022471Z triton_mm_130 0.0141 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:49.7024336Z triton_mm_136 0.0148 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:49.7025333Z mm 0.0150 ms 93.8% 2025-09-07T09:15:49.7026246Z triton_mm_132 0.0158 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:49.7027759Z triton_mm_135 0.0158 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:49.7028676Z triton_mm_129 0.0159 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:49.7029575Z triton_mm_133 0.0161 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:49.7030698Z triton_mm_128 0.0162 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:49.7031587Z triton_mm_125 0.0163 ms 86.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:15:49.7032481Z triton_mm_126 0.0166 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:49.7033252Z SingleProcess AUTOTUNE benchmarking takes 0.2312 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:15:49.9316344Z Autotune Choices Stats: 2025-09-07T09:15:49.9318057Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_149", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.008960000239312649, "best_triton_pos": 0} 2025-09-07T09:15:49.9352640Z AUTOTUNE mm(6272x128, 128x256) 2025-09-07T09:15:49.9353106Z strides: [128, 1], [1, 128] 2025-09-07T09:15:49.9353527Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:49.9354839Z triton_mm_149 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:49.9356443Z triton_mm_151 0.0090 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:49.9357945Z triton_mm_148 0.0090 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:49.9358670Z mm 0.0091 ms 98.2% 2025-09-07T09:15:49.9359184Z triton_mm_147 0.0093 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:49.9360025Z triton_mm_150 0.0093 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:49.9360873Z triton_mm_146 0.0093 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:49.9361722Z triton_mm_156 0.0094 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:49.9362565Z triton_mm_145 0.0094 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:49.9363396Z triton_mm_152 0.0094 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:49.9364308Z SingleProcess AUTOTUNE benchmarking takes 0.2327 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:15:50.1639386Z Autotune Choices Stats: 2025-09-07T09:15:50.1640719Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_168", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.010944000445306301, "best_triton_pos": 0} 2025-09-07T09:15:50.1676655Z AUTOTUNE mm(25088x128, 128x128) 2025-09-07T09:15:50.1677007Z strides: [128, 1], [1, 128] 2025-09-07T09:15:50.1677430Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:50.1678606Z triton_mm_168 0.0109 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:50.1679961Z triton_mm_170 0.0114 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:50.1681277Z triton_mm_171 0.0115 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:50.1682602Z triton_mm_166 0.0115 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:50.1684320Z triton_mm_173 0.0116 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:50.1685200Z mm 0.0117 ms 93.2% 2025-09-07T09:15:50.1685971Z triton_mm_163 0.0117 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:15:50.1687268Z triton_mm_174 0.0118 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:50.1688595Z triton_mm_167 0.0120 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:50.1689889Z triton_mm_169 0.0122 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:50.1691050Z SingleProcess AUTOTUNE benchmarking takes 0.2318 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:15:50.2792996Z Autotune Choices Stats: 2025-09-07T09:15:50.2794795Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.015231999568641186, "best_triton_pos": 1, "best_triton_time": 0.0315839983522892, "best_triton_kernel": "triton_convolution2d_180", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T09:15:50.2831212Z AUTOTUNE convolution(8x128x56x56, 128x128x3x3) 2025-09-07T09:15:50.2831656Z strides: [401408, 1, 7168, 128], [1152, 1, 384, 128] 2025-09-07T09:15:50.2832050Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:50.2832386Z convolution 0.0152 ms 100.0% 2025-09-07T09:15:50.2833301Z triton_convolution2d_180 0.0316 ms 48.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:50.2835033Z triton_convolution2d_181 0.0362 ms 42.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:50.2836529Z triton_convolution2d_179 0.0383 ms 39.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:50.2838127Z triton_convolution2d_182 0.0386 ms 39.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:50.2839261Z triton_convolution2d_176 0.0484 ms 31.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:50.2840600Z triton_convolution2d_177 0.0524 ms 29.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:50.2841753Z triton_convolution2d_178 0.1021 ms 14.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:15:50.2842657Z SingleProcess AUTOTUNE benchmarking takes 0.1149 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:15:50.5155939Z Autotune Choices Stats: 2025-09-07T09:15:50.5158077Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.009503999724984169, "best_triton_pos": 1, "best_triton_time": 0.009568000212311745, "best_triton_kernel": "triton_mm_214", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T09:15:50.5193595Z AUTOTUNE mm(6272x256, 256x128) 2025-09-07T09:15:50.5194311Z strides: [256, 1], [1, 256] 2025-09-07T09:15:50.5194748Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:50.5195170Z mm 0.0095 ms 100.0% 2025-09-07T09:15:50.5196118Z triton_mm_214 0.0096 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:50.5197787Z triton_mm_209 0.0096 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:50.5198631Z triton_mm_212 0.0098 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:50.5199680Z triton_mm_213 0.0098 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:50.5200524Z triton_mm_216 0.0098 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:50.5201356Z triton_mm_220 0.0101 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:50.5202200Z triton_mm_203 0.0102 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:15:50.5203033Z triton_mm_215 0.0103 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:50.5204040Z triton_mm_210 0.0104 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:50.5204776Z SingleProcess AUTOTUNE benchmarking takes 0.2333 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:15:50.6305152Z Autotune Choices Stats: 2025-09-07T09:15:50.6307357Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.014976000413298607, "best_triton_pos": 1, "best_triton_time": 0.030719999223947525, "best_triton_kernel": "triton_convolution2d_225", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T09:15:50.6344277Z AUTOTUNE convolution(8x128x28x28, 128x128x3x3) 2025-09-07T09:15:50.6344844Z strides: [100352, 1, 3584, 128], [1152, 1, 384, 128] 2025-09-07T09:15:50.6345733Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:50.6346176Z convolution 0.0150 ms 100.0% 2025-09-07T09:15:50.6347386Z triton_convolution2d_225 0.0307 ms 48.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:50.6349352Z triton_convolution2d_226 0.0340 ms 44.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:50.6351328Z triton_convolution2d_224 0.0390 ms 38.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:50.6353293Z triton_convolution2d_227 0.0392 ms 38.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:50.6355543Z triton_convolution2d_221 0.0475 ms 31.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:50.6357677Z triton_convolution2d_222 0.0494 ms 30.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:50.6358842Z triton_convolution2d_223 0.0968 ms 15.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:15:50.6359697Z SingleProcess AUTOTUNE benchmarking takes 0.1135 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:15:50.8681915Z Autotune Choices Stats: 2025-09-07T09:15:50.8684758Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01196799986064434, "best_triton_pos": 1, "best_triton_time": 0.01206399966031313, "best_triton_kernel": "triton_mm_265", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T09:15:50.8719399Z AUTOTUNE mm(6272x512, 512x256) 2025-09-07T09:15:50.8719642Z strides: [512, 1], [1, 512] 2025-09-07T09:15:50.8719865Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:50.8720112Z mm 0.0120 ms 100.0% 2025-09-07T09:15:50.8720662Z triton_mm_265 0.0121 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:50.8721566Z triton_mm_258 0.0123 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:50.8722457Z triton_mm_254 0.0126 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:50.8723351Z triton_mm_264 0.0128 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:50.8724421Z triton_mm_257 0.0136 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:50.8725323Z triton_mm_261 0.0136 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:50.8726222Z triton_mm_259 0.0139 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:50.8727345Z triton_mm_256 0.0147 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:50.8728305Z triton_mm_260 0.0149 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:50.8729144Z SingleProcess AUTOTUNE benchmarking takes 0.2348 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:15:51.1145204Z Autotune Choices Stats: 2025-09-07T09:15:51.1146489Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.012864000163972378, "best_triton_pos": 1, "best_triton_time": 0.013887999579310417, "best_triton_kernel": "triton_mm_374", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T09:15:51.1184673Z AUTOTUNE mm(6272x768, 768x256) 2025-09-07T09:15:51.1185101Z strides: [768, 1], [1, 768] 2025-09-07T09:15:51.1185544Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:51.1185971Z mm 0.0129 ms 100.0% 2025-09-07T09:15:51.1186912Z triton_mm_374 0.0139 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:51.1188468Z triton_mm_367 0.0146 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:51.1189986Z triton_mm_363 0.0149 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:51.1191871Z triton_mm_373 0.0154 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:51.1193424Z triton_mm_366 0.0161 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:51.1195429Z triton_mm_368 0.0164 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:51.1196953Z triton_mm_370 0.0166 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:51.1198430Z triton_mm_364 0.0176 ms 73.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:51.1199312Z triton_mm_369 0.0184 ms 69.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:51.1200091Z SingleProcess AUTOTUNE benchmarking takes 0.2340 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:15:51.3798597Z Autotune Choices Stats: 2025-09-07T09:15:51.3800564Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.015296000055968761, "best_triton_pos": 1, "best_triton_time": 0.017055999487638474, "best_triton_kernel": "triton_mm_592", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T09:15:51.3837969Z AUTOTUNE mm(6272x1152, 1152x256) 2025-09-07T09:15:51.3838448Z strides: [1152, 1], [1, 1152] 2025-09-07T09:15:51.3838859Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:51.3839288Z mm 0.0153 ms 100.0% 2025-09-07T09:15:51.3840684Z triton_mm_592 0.0171 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:51.3842276Z triton_mm_585 0.0183 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:51.3844132Z triton_mm_581 0.0187 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:51.3845672Z triton_mm_591 0.0197 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:51.3847328Z triton_mm_586 0.0198 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:51.3848756Z triton_mm_584 0.0208 ms 73.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:51.3849723Z triton_mm_582 0.0217 ms 70.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:51.3850685Z triton_mm_588 0.0218 ms 70.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:51.3851654Z triton_mm_583 0.0243 ms 62.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:51.3852494Z SingleProcess AUTOTUNE benchmarking takes 0.2395 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:15:51.6135036Z Autotune Choices Stats: 2025-09-07T09:15:51.6136217Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_605", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.008383999578654766, "best_triton_pos": 0} 2025-09-07T09:15:51.6177352Z AUTOTUNE mm(1568x256, 256x512) 2025-09-07T09:15:51.6177763Z strides: [256, 1], [1, 256] 2025-09-07T09:15:51.6178247Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:51.6179377Z triton_mm_605 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:51.6181047Z triton_mm_604 0.0085 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:51.6182640Z triton_mm_600 0.0085 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:51.6183620Z mm 0.0087 ms 96.3% 2025-09-07T09:15:51.6185023Z triton_mm_603 0.0087 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:51.6186563Z triton_mm_607 0.0090 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:51.6188242Z triton_mm_611 0.0092 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:51.6189140Z triton_mm_602 0.0094 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:51.6190215Z triton_mm_610 0.0094 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:51.6191120Z triton_mm_606 0.0095 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:51.6191904Z SingleProcess AUTOTUNE benchmarking takes 0.2325 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:15:51.8439616Z Autotune Choices Stats: 2025-09-07T09:15:51.8441186Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_619", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.010080000385642052, "best_triton_pos": 0} 2025-09-07T09:15:51.8479164Z AUTOTUNE mm(6272x256, 256x256) 2025-09-07T09:15:51.8479625Z strides: [256, 1], [1, 256] 2025-09-07T09:15:51.8480042Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:51.8481112Z triton_mm_619 0.0101 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:51.8482678Z triton_mm_623 0.0102 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:51.8483655Z mm 0.0104 ms 96.9% 2025-09-07T09:15:51.8484930Z triton_mm_621 0.0105 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:51.8486462Z triton_mm_622 0.0106 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:51.8488478Z triton_mm_626 0.0107 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:51.8489473Z triton_mm_630 0.0107 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:51.8490449Z triton_mm_629 0.0109 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:51.8491416Z triton_mm_625 0.0111 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:51.8492381Z triton_mm_628 0.0115 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:51.8493230Z SingleProcess AUTOTUNE benchmarking takes 0.2296 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:15:51.9939288Z Autotune Choices Stats: 2025-09-07T09:15:51.9941476Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.014816000126302242, "best_triton_pos": 1, "best_triton_time": 0.05443200096487999, "best_triton_kernel": "triton_convolution2d_635", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T09:15:51.9980628Z AUTOTUNE convolution(8x256x28x28, 256x256x3x3) 2025-09-07T09:15:51.9981186Z strides: [200704, 1, 7168, 256], [2304, 1, 768, 256] 2025-09-07T09:15:51.9981712Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:51.9982151Z convolution 0.0148 ms 100.0% 2025-09-07T09:15:51.9983355Z triton_convolution2d_635 0.0544 ms 27.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:51.9986133Z triton_convolution2d_637 0.0684 ms 21.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:51.9988229Z triton_convolution2d_634 0.0687 ms 21.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:51.9989424Z triton_convolution2d_636 0.0765 ms 19.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:51.9990565Z triton_convolution2d_632 0.0954 ms 15.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:51.9991700Z triton_convolution2d_631 0.1026 ms 14.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:51.9992835Z triton_convolution2d_633 0.2036 ms 7.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:15:51.9993856Z SingleProcess AUTOTUNE benchmarking takes 0.1497 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:15:52.2299157Z Autotune Choices Stats: 2025-09-07T09:15:52.2300276Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_665", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009216000325977802, "best_triton_pos": 0} 2025-09-07T09:15:52.2340060Z AUTOTUNE mm(1568x512, 512x256) 2025-09-07T09:15:52.2340492Z strides: [512, 1], [1, 512] 2025-09-07T09:15:52.2340923Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:52.2341971Z triton_mm_665 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:52.2342958Z mm 0.0097 ms 95.4% 2025-09-07T09:15:52.2344425Z triton_mm_669 0.0100 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:52.2345993Z triton_mm_664 0.0102 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:52.2347567Z triton_mm_661 0.0105 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:52.2348876Z triton_mm_668 0.0106 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:52.2349759Z triton_mm_660 0.0108 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:52.2350653Z triton_mm_675 0.0110 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:52.2351555Z triton_mm_671 0.0110 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:52.2352625Z triton_mm_674 0.0111 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:52.2353408Z SingleProcess AUTOTUNE benchmarking takes 0.2330 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:15:52.3789861Z Autotune Choices Stats: 2025-09-07T09:15:52.3792095Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.014688000082969666, "best_triton_pos": 1, "best_triton_time": 0.05321599915623665, "best_triton_kernel": "triton_convolution2d_680", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T09:15:52.3830857Z AUTOTUNE convolution(8x256x14x14, 256x256x3x3) 2025-09-07T09:15:52.3831471Z strides: [50176, 1, 3584, 256], [2304, 1, 768, 256] 2025-09-07T09:15:52.3832033Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:52.3832477Z convolution 0.0147 ms 100.0% 2025-09-07T09:15:52.3833671Z triton_convolution2d_680 0.0532 ms 27.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:52.3835927Z triton_convolution2d_679 0.0687 ms 21.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:52.3838145Z triton_convolution2d_682 0.0708 ms 20.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:52.3839632Z triton_convolution2d_681 0.0747 ms 19.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:52.3840799Z triton_convolution2d_677 0.0894 ms 16.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:52.3841951Z triton_convolution2d_676 0.1057 ms 13.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:52.3843091Z triton_convolution2d_678 0.1930 ms 7.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:15:52.3844144Z SingleProcess AUTOTUNE benchmarking takes 0.1485 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:15:52.6189323Z Autotune Choices Stats: 2025-09-07T09:15:52.6191267Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.010879999957978725, "best_triton_pos": 1, "best_triton_time": 0.011359999887645245, "best_triton_kernel": "triton_mm_714", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T09:15:52.6231646Z AUTOTUNE mm(1568x1024, 1024x512) 2025-09-07T09:15:52.6232108Z strides: [1024, 1], [1, 1024] 2025-09-07T09:15:52.6232529Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:52.6232950Z mm 0.0109 ms 100.0% 2025-09-07T09:15:52.6234191Z triton_mm_714 0.0114 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:52.6235806Z triton_mm_720 0.0132 ms 82.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:52.6237948Z triton_mm_710 0.0135 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:52.6239141Z triton_mm_709 0.0137 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:52.6240030Z triton_mm_713 0.0139 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:52.6240922Z triton_mm_719 0.0149 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:52.6241836Z triton_mm_716 0.0152 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:52.6242742Z triton_mm_712 0.0153 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:52.6243632Z triton_mm_703 0.0169 ms 64.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:15:52.6244563Z SingleProcess AUTOTUNE benchmarking takes 0.2370 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:15:52.8687211Z Autotune Choices Stats: 2025-09-07T09:15:52.8689367Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.012256000190973282, "best_triton_pos": 1, "best_triton_time": 0.012959999963641167, "best_triton_kernel": "triton_mm_823", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T09:15:52.8728433Z AUTOTUNE mm(1568x1536, 1536x512) 2025-09-07T09:15:52.8728773Z strides: [1536, 1], [1, 1536] 2025-09-07T09:15:52.8729016Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:52.8729272Z mm 0.0123 ms 100.0% 2025-09-07T09:15:52.8729850Z triton_mm_823 0.0130 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:52.8730820Z triton_mm_829 0.0156 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:52.8731793Z triton_mm_819 0.0162 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:52.8732748Z triton_mm_818 0.0168 ms 72.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:52.8733694Z triton_mm_822 0.0172 ms 71.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:52.8735002Z triton_mm_828 0.0190 ms 64.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:52.8735961Z triton_mm_821 0.0194 ms 63.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:52.8736923Z triton_mm_825 0.0196 ms 62.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:52.8738087Z triton_mm_815 0.0215 ms 57.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:52.8738932Z SingleProcess AUTOTUNE benchmarking takes 0.2380 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:15:53.1379935Z Autotune Choices Stats: 2025-09-07T09:15:53.1381827Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01484800036996603, "best_triton_pos": 1, "best_triton_time": 0.014911999925971031, "best_triton_kernel": "triton_mm_1041", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T09:15:53.1422780Z AUTOTUNE mm(1568x2048, 2048x512) 2025-09-07T09:15:53.1423235Z strides: [2048, 1], [1, 2048] 2025-09-07T09:15:53.1423669Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:53.1424422Z mm 0.0148 ms 100.0% 2025-09-07T09:15:53.1425434Z triton_mm_1041 0.0149 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:53.1427047Z triton_mm_1047 0.0185 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:53.1428754Z triton_mm_1037 0.0192 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:53.1429662Z triton_mm_1040 0.0210 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:53.1430757Z triton_mm_1036 0.0212 ms 70.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:53.1431683Z triton_mm_1046 0.0226 ms 65.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:53.1432583Z triton_mm_1039 0.0242 ms 61.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:53.1433480Z triton_mm_1043 0.0244 ms 60.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:53.1434538Z triton_mm_1033 0.0272 ms 54.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:53.1435337Z SingleProcess AUTOTUNE benchmarking takes 0.2457 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:15:53.4478763Z Autotune Choices Stats: 2025-09-07T09:15:53.4480774Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.015968000516295433, "best_triton_pos": 1, "best_triton_time": 0.017184000462293625, "best_triton_kernel": "triton_mm_1477", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T09:15:53.4521590Z AUTOTUNE mm(1568x2816, 2816x512) 2025-09-07T09:15:53.4522073Z strides: [2816, 1], [1, 2816] 2025-09-07T09:15:53.4522495Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:53.4522924Z mm 0.0160 ms 100.0% 2025-09-07T09:15:53.4524160Z triton_mm_1477 0.0172 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:53.4525787Z triton_mm_1483 0.0221 ms 72.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:53.4527805Z triton_mm_1473 0.0223 ms 71.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:53.4529337Z triton_mm_1476 0.0260 ms 61.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:53.4530297Z triton_mm_1472 0.0266 ms 60.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:53.4531264Z triton_mm_1482 0.0284 ms 56.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:53.4532238Z triton_mm_1475 0.0309 ms 51.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:53.4533200Z triton_mm_1479 0.0315 ms 50.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:53.4534300Z triton_mm_1469 0.0317 ms 50.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:53.4535141Z SingleProcess AUTOTUNE benchmarking takes 0.2606 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:15:53.6817731Z Autotune Choices Stats: 2025-09-07T09:15:53.6819558Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1492", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008927999995648861, "best_triton_pos": 0} 2025-09-07T09:15:53.6859968Z AUTOTUNE mm(392x512, 512x1024) 2025-09-07T09:15:53.6860404Z strides: [512, 1], [1, 512] 2025-09-07T09:15:53.6860808Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:53.6861860Z triton_mm_1492 0.0089 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:53.6862859Z mm 0.0092 ms 96.5% 2025-09-07T09:15:53.6864276Z triton_mm_1496 0.0093 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:53.6865854Z triton_mm_1488 0.0102 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:53.6867401Z triton_mm_1491 0.0102 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:53.6869028Z triton_mm_1495 0.0105 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:53.6869920Z triton_mm_1485 0.0107 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:15:53.6870806Z triton_mm_1487 0.0107 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:53.6871706Z triton_mm_1502 0.0108 ms 82.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:53.6872803Z triton_mm_1501 0.0111 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:53.6873589Z SingleProcess AUTOTUNE benchmarking takes 0.2321 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:15:53.9126418Z Autotune Choices Stats: 2025-09-07T09:15:53.9127346Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1515", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.009631999768316746, "best_triton_pos": 0} 2025-09-07T09:15:53.9168853Z AUTOTUNE mm(1568x512, 512x512) 2025-09-07T09:15:53.9169253Z strides: [512, 1], [1, 512] 2025-09-07T09:15:53.9169505Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:53.9170159Z triton_mm_1515 0.0096 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:53.9170786Z mm 0.0100 ms 95.9% 2025-09-07T09:15:53.9171356Z triton_mm_1510 0.0105 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:53.9172336Z triton_mm_1521 0.0106 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:53.9173307Z triton_mm_1514 0.0106 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:53.9174670Z triton_mm_1513 0.0113 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:53.9175920Z triton_mm_1520 0.0113 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:53.9176915Z triton_mm_1517 0.0114 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:15:53.9177890Z triton_mm_1511 0.0119 ms 80.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:53.9178862Z triton_mm_1512 0.0126 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:53.9179701Z SingleProcess AUTOTUNE benchmarking takes 0.2303 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:15:54.1164692Z Autotune Choices Stats: 2025-09-07T09:15:54.1165874Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01775999926030636, "best_triton_pos": 1, "best_triton_time": 0.10847999900579453, "best_triton_kernel": "triton_convolution2d_1526", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T09:15:54.1207010Z AUTOTUNE convolution(8x512x14x14, 512x512x3x3) 2025-09-07T09:15:54.1207547Z strides: [100352, 1, 7168, 512], [4608, 1, 1536, 512] 2025-09-07T09:15:54.1208120Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:54.1208664Z convolution 0.0178 ms 100.0% 2025-09-07T09:15:54.1209654Z triton_convolution2d_1526 0.1085 ms 16.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:54.1210920Z triton_convolution2d_1525 0.1314 ms 13.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:54.1212443Z triton_convolution2d_1528 0.1345 ms 13.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:54.1213684Z triton_convolution2d_1527 0.1424 ms 12.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:54.1215214Z triton_convolution2d_1523 0.1995 ms 8.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:54.1216454Z triton_convolution2d_1522 0.2047 ms 8.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:54.1217695Z triton_convolution2d_1524 0.2784 ms 6.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:15:54.1218679Z SingleProcess AUTOTUNE benchmarking takes 0.2033 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:15:54.3540142Z Autotune Choices Stats: 2025-09-07T09:15:54.3541755Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1552", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008704000152647495, "best_triton_pos": 0} 2025-09-07T09:15:54.3582805Z AUTOTUNE mm(392x1024, 1024x512) 2025-09-07T09:15:54.3583683Z strides: [1024, 1], [1, 1024] 2025-09-07T09:15:54.3584412Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:54.3585473Z triton_mm_1552 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:54.3587089Z triton_mm_1556 0.0091 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:54.3588118Z mm 0.0098 ms 89.2% 2025-09-07T09:15:54.3589143Z triton_mm_1560 0.0106 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:54.3590042Z triton_mm_1551 0.0118 ms 73.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:54.3590962Z triton_mm_1555 0.0120 ms 72.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:54.3591859Z triton_mm_1559 0.0122 ms 71.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:54.3592751Z triton_mm_1550 0.0123 ms 71.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:54.3593659Z triton_mm_1566 0.0126 ms 69.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:54.3594703Z triton_mm_1549 0.0132 ms 65.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:15:54.3595648Z SingleProcess AUTOTUNE benchmarking takes 0.2343 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:15:54.5553251Z Autotune Choices Stats: 2025-09-07T09:15:54.5555658Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01756799966096878, "best_triton_pos": 1, "best_triton_time": 0.10793600231409073, "best_triton_kernel": "triton_convolution2d_1571", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T09:15:54.5597000Z AUTOTUNE convolution(8x512x7x7, 512x512x3x3) 2025-09-07T09:15:54.5597645Z strides: [25088, 1, 3584, 512], [4608, 1, 1536, 512] 2025-09-07T09:15:54.5598145Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:54.5598609Z convolution 0.0176 ms 100.0% 2025-09-07T09:15:54.5599796Z triton_convolution2d_1571 0.1079 ms 16.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:54.5601787Z triton_convolution2d_1570 0.1342 ms 13.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:54.5604019Z triton_convolution2d_1572 0.1404 ms 12.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:54.5606014Z triton_convolution2d_1573 0.1426 ms 12.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:15:54.5608293Z triton_convolution2d_1568 0.1900 ms 9.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:54.5610041Z triton_convolution2d_1567 0.2036 ms 8.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:15:54.5611273Z triton_convolution2d_1569 0.2296 ms 7.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:15:54.5612236Z SingleProcess AUTOTUNE benchmarking takes 0.2009 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:15:54.8128267Z Autotune Choices Stats: 2025-09-07T09:15:54.8130103Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01369599997997284, "best_triton_pos": 1, "best_triton_time": 0.014047999866306782, "best_triton_kernel": "triton_mm_1601", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T09:15:54.8160671Z AUTOTUNE mm(392x2560, 2560x1024) 2025-09-07T09:15:54.8161124Z strides: [2560, 1], [1, 2560] 2025-09-07T09:15:54.8161538Z dtypes: torch.float16, torch.float16 2025-09-07T09:15:54.8161981Z mm 0.0137 ms 100.0% 2025-09-07T09:15:54.8162971Z triton_mm_1601 0.0140 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:54.8164947Z triton_mm_1605 0.0160 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:15:54.8166565Z triton_mm_1597 0.0184 ms 74.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:15:54.8168644Z triton_mm_1611 0.0205 ms 66.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:54.8170007Z triton_mm_1600 0.0235 ms 58.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:15:54.8170974Z triton_mm_1604 0.0241 ms 56.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:54.8171952Z triton_mm_1594 0.0245 ms 55.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:15:54.8172920Z triton_mm_1596 0.0249 ms 55.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:15:54.8174046Z triton_mm_1610 0.0260 ms 52.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:15:54.8174901Z SingleProcess AUTOTUNE benchmarking takes 0.2531 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:16:31.6909862Z W0907 09:16:31.690000 54731 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T09:17:21.1570790Z pass 2025-09-07T09:17:29.1089298Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T09:17:29.1090863Z import pynvml # type: ignore[import] 2025-09-07T09:17:32.1245955Z 2025-09-07T09:17:34.9529285Z loading model: 0it [00:00, ?it/s] 2025-09-07T09:17:34.9529639Z loading model: 0it [00:02, ?it/s] 2025-09-07T09:17:34.9529928Z cuda train dm_nfnet_f0 2025-09-07T09:17:55.7910287Z Autotune Choices Stats: 2025-09-07T09:17:55.7911485Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_14", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.030751999467611313, "best_triton_pos": 0} 2025-09-07T09:17:55.7957181Z AUTOTUNE convolution(8x32x96x96, 64x32x3x3) 2025-09-07T09:17:55.7957519Z strides: [294912, 9216, 96, 1], [288, 9, 3, 1] 2025-09-07T09:17:55.7957835Z dtypes: torch.float16, torch.float16 2025-09-07T09:17:55.7958652Z triton_convolution2d_14 0.0308 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:17:55.7959946Z triton_convolution2d_15 0.0312 ms 98.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:17:55.7961325Z triton_convolution2d_17 0.0332 ms 92.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:17:55.7962556Z triton_convolution2d_11 0.0335 ms 91.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:17:55.7964164Z triton_convolution2d_16 0.0336 ms 91.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:17:55.7965854Z triton_convolution2d_12 0.0369 ms 83.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:17:55.7966618Z convolution 0.0412 ms 74.6% 2025-09-07T09:17:55.7967362Z triton_convolution2d_13 0.1020 ms 30.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:17:55.7968343Z SingleProcess AUTOTUNE benchmarking takes 0.1169 seconds and 0.0004 seconds precompiling for 8 choices 2025-09-07T09:17:56.2394748Z Autotune Choices Stats: 2025-09-07T09:17:56.2396151Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.014175999909639359, "best_triton_pos": 1, "best_triton_time": 0.014751999638974667, "best_triton_kernel": "triton_convolution2d_29", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:17:56.2442446Z AUTOTUNE convolution(8x128x48x48, 256x128x1x1) 2025-09-07T09:17:56.2442767Z strides: [294912, 2304, 48, 1], [128, 1, 1, 1] 2025-09-07T09:17:56.2443067Z dtypes: torch.float16, torch.float16 2025-09-07T09:17:56.2443338Z convolution 0.0142 ms 100.0% 2025-09-07T09:17:56.2444355Z triton_convolution2d_29 0.0148 ms 96.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:17:56.2445849Z triton_convolution2d_26 0.0172 ms 82.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:17:56.2447088Z triton_convolution2d_28 0.0179 ms 79.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:17:56.2448319Z triton_convolution2d_31 0.0181 ms 78.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:17:56.2449562Z triton_convolution2d_25 0.0216 ms 65.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:17:56.2450780Z triton_convolution2d_30 0.0216 ms 65.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:17:56.2451999Z triton_convolution2d_27 0.0258 ms 55.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:17:56.2452705Z conv1x1_via_mm 0.0740 ms 19.2% 2025-09-07T09:17:56.2453139Z SingleProcess AUTOTUNE benchmarking takes 0.1418 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:17:56.7549725Z Autotune Choices Stats: 2025-09-07T09:17:56.7551123Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.016095999628305435, "best_triton_pos": 1, "best_triton_time": 0.020896000787615776, "best_triton_kernel": "triton_convolution2d_82", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:17:56.7597331Z AUTOTUNE convolution(8x256x48x48, 256x256x1x1) 2025-09-07T09:17:56.7597672Z strides: [589824, 2304, 48, 1], [256, 1, 1, 1] 2025-09-07T09:17:56.7597980Z dtypes: torch.float16, torch.float16 2025-09-07T09:17:56.7598286Z convolution 0.0161 ms 100.0% 2025-09-07T09:17:56.7599032Z triton_convolution2d_82 0.0209 ms 77.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:17:56.7600272Z triton_convolution2d_81 0.0235 ms 68.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:17:56.7601661Z triton_convolution2d_79 0.0241 ms 66.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:17:56.7602889Z triton_convolution2d_84 0.0246 ms 65.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:17:56.7604425Z triton_convolution2d_83 0.0306 ms 52.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:17:56.7605639Z triton_convolution2d_78 0.0321 ms 50.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:17:56.7606855Z triton_convolution2d_80 0.0422 ms 38.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:17:56.7607817Z conv1x1_via_mm 0.0952 ms 16.9% 2025-09-07T09:17:56.7608299Z SingleProcess AUTOTUNE benchmarking takes 0.1380 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:17:57.1590236Z Autotune Choices Stats: 2025-09-07T09:17:57.1591650Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.014751999638974667, "best_triton_pos": 1, "best_triton_time": 0.021215999498963356, "best_triton_kernel": "triton_convolution2d_141", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:17:57.1637528Z AUTOTUNE convolution(8x512x24x24, 768x512x1x1) 2025-09-07T09:17:57.1637873Z strides: [294912, 576, 24, 1], [512, 1, 1, 1] 2025-09-07T09:17:57.1638182Z dtypes: torch.float16, torch.float16 2025-09-07T09:17:57.1638470Z convolution 0.0148 ms 100.0% 2025-09-07T09:17:57.1639240Z triton_convolution2d_141 0.0212 ms 69.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:17:57.1640493Z triton_convolution2d_140 0.0236 ms 62.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:17:57.1641881Z triton_convolution2d_143 0.0242 ms 61.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:17:57.1643090Z triton_convolution2d_142 0.0246 ms 59.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:17:57.1644465Z triton_convolution2d_137 0.0322 ms 45.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:17:57.1645958Z triton_convolution2d_138 0.0339 ms 43.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:17:57.1647174Z triton_convolution2d_139 0.0429 ms 34.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:17:57.1647927Z conv1x1_via_mm 0.0714 ms 20.7% 2025-09-07T09:17:57.1648412Z SingleProcess AUTOTUNE benchmarking takes 0.1413 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:17:57.5317927Z Autotune Choices Stats: 2025-09-07T09:17:57.5319092Z {"num_choices": 7, "num_triton_choices": 6, "best_kernel": "triton_convolution2d_5", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.015968000516295433, "best_triton_pos": 0} 2025-09-07T09:17:57.5363517Z AUTOTUNE convolution(8x16x96x96, 32x16x3x3) 2025-09-07T09:17:57.5363969Z strides: [147456, 9216, 96, 1], [144, 9, 3, 1] 2025-09-07T09:17:57.5364288Z dtypes: torch.float16, torch.float16 2025-09-07T09:17:57.5365042Z triton_convolution2d_5 0.0160 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:17:57.5366282Z triton_convolution2d_6 0.0163 ms 98.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:17:57.5367802Z triton_convolution2d_8 0.0165 ms 96.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:17:57.5369051Z triton_convolution2d_10 0.0176 ms 90.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:17:57.5370265Z triton_convolution2d_9 0.0177 ms 90.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:17:57.5371010Z convolution 0.0316 ms 50.6% 2025-09-07T09:17:57.5371737Z triton_convolution2d_7 0.0400 ms 39.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:17:57.5372709Z SingleProcess AUTOTUNE benchmarking takes 0.0947 seconds and 0.0002 seconds precompiling for 7 choices 2025-09-07T09:17:57.9825121Z Autotune Choices Stats: 2025-09-07T09:17:57.9826597Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.040063999593257904, "best_triton_pos": 1, "best_triton_time": 0.042847998440265656, "best_triton_kernel": "triton_convolution2d_18", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T09:17:57.9871763Z AUTOTUNE convolution(8x64x97x97, 128x64x3x3) 2025-09-07T09:17:57.9872126Z strides: [602176, 9409, 97, 1], [576, 9, 3, 1] 2025-09-07T09:17:57.9872435Z dtypes: torch.float16, torch.float16 2025-09-07T09:17:57.9872711Z convolution 0.0401 ms 100.0% 2025-09-07T09:17:57.9873494Z triton_convolution2d_18 0.0428 ms 93.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:17:57.9875579Z triton_convolution2d_23 0.0451 ms 88.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:17:57.9876822Z triton_convolution2d_19 0.0455 ms 88.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:17:57.9878111Z triton_convolution2d_21 0.0481 ms 83.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:17:57.9879329Z triton_convolution2d_24 0.0490 ms 81.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:17:57.9880539Z triton_convolution2d_22 0.0522 ms 76.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:17:57.9881939Z triton_convolution2d_20 0.1888 ms 21.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:17:57.9882917Z SingleProcess AUTOTUNE benchmarking takes 0.1323 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:17:58.4441690Z Autotune Choices Stats: 2025-09-07T09:17:58.4443303Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_37", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8", "best_time": 0.011264000087976456, "best_triton_pos": 0} 2025-09-07T09:17:58.4489777Z AUTOTUNE convolution(8x128x48x48, 128x128x1x1) 2025-09-07T09:17:58.4490099Z strides: [294912, 2304, 48, 1], [128, 1, 1, 1] 2025-09-07T09:17:58.4490387Z dtypes: torch.float16, torch.float16 2025-09-07T09:17:58.4491142Z triton_convolution2d_37 0.0113 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:17:58.4492549Z triton_convolution2d_36 0.0114 ms 99.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:17:58.4493341Z convolution 0.0115 ms 98.1% 2025-09-07T09:17:58.4494320Z triton_convolution2d_35 0.0134 ms 83.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:17:58.4495584Z triton_convolution2d_32 0.0135 ms 83.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:17:58.4496817Z triton_convolution2d_38 0.0137 ms 82.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:17:58.4498051Z triton_convolution2d_33 0.0155 ms 72.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:17:58.4499288Z triton_convolution2d_34 0.0181 ms 62.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:17:58.4500344Z conv1x1_via_mm 0.0570 ms 19.8% 2025-09-07T09:17:58.4500835Z SingleProcess AUTOTUNE benchmarking takes 0.1416 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:17:58.9147900Z Autotune Choices Stats: 2025-09-07T09:17:58.9149358Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.047488000243902206, "best_triton_pos": 1, "best_triton_time": 0.07727999985218048, "best_triton_kernel": "triton_convolution2d_45", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T09:17:58.9195656Z AUTOTUNE convolution(8x128x48x48, 128x128x3x3) 2025-09-07T09:17:58.9196000Z strides: [294912, 2304, 48, 1], [1152, 9, 3, 1] 2025-09-07T09:17:58.9196308Z dtypes: torch.float16, torch.float16 2025-09-07T09:17:58.9196624Z convolution 0.0475 ms 100.0% 2025-09-07T09:17:58.9197474Z triton_convolution2d_45 0.0773 ms 61.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:17:58.9198704Z triton_convolution2d_39 0.0802 ms 59.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:17:58.9199911Z triton_convolution2d_40 0.0847 ms 56.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:17:58.9201497Z triton_convolution2d_44 0.0849 ms 55.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:17:58.9202833Z triton_convolution2d_43 0.0864 ms 54.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:17:58.9204604Z triton_convolution2d_42 0.0896 ms 53.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:17:58.9205828Z triton_convolution2d_41 0.2797 ms 17.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:17:58.9206810Z SingleProcess AUTOTUNE benchmarking takes 0.1631 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:17:59.1670080Z Autotune Choices Stats: 2025-09-07T09:17:59.1671522Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01056000031530857, "best_triton_pos": 1, "best_triton_time": 0.013952000066637993, "best_triton_kernel": "triton_convolution2d_75", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:17:59.1716882Z AUTOTUNE convolution(8x256x24x24, 512x256x1x1) 2025-09-07T09:17:59.1717306Z strides: [147456, 576, 24, 1], [256, 1, 1, 1] 2025-09-07T09:17:59.1717611Z dtypes: torch.float16, torch.float16 2025-09-07T09:17:59.1717898Z convolution 0.0106 ms 100.0% 2025-09-07T09:17:59.1718639Z triton_convolution2d_75 0.0140 ms 75.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:17:59.1719873Z triton_convolution2d_74 0.0155 ms 68.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:17:59.1721783Z triton_convolution2d_77 0.0164 ms 64.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:17:59.1723090Z triton_convolution2d_71 0.0191 ms 55.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:17:59.1724508Z triton_convolution2d_76 0.0195 ms 54.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:17:59.1725725Z triton_convolution2d_72 0.0211 ms 50.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:17:59.1726957Z triton_convolution2d_73 0.0257 ms 41.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:17:59.1727704Z conv1x1_via_mm 0.0515 ms 20.5% 2025-09-07T09:17:59.1728174Z SingleProcess AUTOTUNE benchmarking takes 0.1374 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:17:59.6827734Z Autotune Choices Stats: 2025-09-07T09:17:59.6830111Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.011168000288307667, "best_triton_pos": 1, "best_triton_time": 0.01913600042462349, "best_triton_kernel": "triton_convolution2d_134", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:17:59.6876412Z AUTOTUNE convolution(8x512x12x12, 1536x512x1x1) 2025-09-07T09:17:59.6876710Z strides: [73728, 144, 12, 1], [512, 1, 1, 1] 2025-09-07T09:17:59.6876963Z dtypes: torch.float16, torch.float16 2025-09-07T09:17:59.6877294Z convolution 0.0112 ms 100.0% 2025-09-07T09:17:59.6877946Z triton_convolution2d_134 0.0191 ms 58.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:17:59.6879025Z triton_convolution2d_133 0.0224 ms 49.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:17:59.6880110Z triton_convolution2d_136 0.0230 ms 48.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:17:59.6881172Z triton_convolution2d_135 0.0230 ms 48.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:17:59.6882322Z triton_convolution2d_130 0.0313 ms 35.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:17:59.6883546Z triton_convolution2d_131 0.0333 ms 33.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:17:59.6884699Z conv1x1_via_mm 0.0459 ms 24.4% 2025-09-07T09:17:59.6885438Z triton_convolution2d_132 0.0495 ms 22.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:17:59.6886733Z SingleProcess AUTOTUNE benchmarking takes 0.1428 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:18:00.1120036Z Autotune Choices Stats: 2025-09-07T09:18:00.1121204Z {"num_choices": 6, "num_triton_choices": 5, "best_kernel": "triton_convolution2d_1", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.01283199992030859, "best_triton_pos": 0} 2025-09-07T09:18:00.1165218Z AUTOTUNE convolution(8x3x193x193, 16x3x3x3) 2025-09-07T09:18:00.1165518Z strides: [111747, 37249, 193, 1], [27, 9, 3, 1] 2025-09-07T09:18:00.1165834Z dtypes: torch.float16, torch.float16 2025-09-07T09:18:00.1166623Z triton_convolution2d_1 0.0128 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:18:00.1167866Z triton_convolution2d_4 0.0132 ms 97.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:18:00.1169070Z triton_convolution2d_3 0.0144 ms 89.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:18:00.1170264Z triton_convolution2d_0 0.0147 ms 87.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:18:00.1172011Z triton_convolution2d_2 0.0151 ms 84.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:18:00.1172785Z convolution 0.0204 ms 63.1% 2025-09-07T09:18:00.1173224Z SingleProcess AUTOTUNE benchmarking takes 0.0804 seconds and 0.0002 seconds precompiling for 6 choices 2025-09-07T09:18:00.5752607Z Autotune Choices Stats: 2025-09-07T09:18:00.5754394Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01119999960064888, "best_triton_pos": 1, "best_triton_time": 0.018912000581622124, "best_triton_kernel": "triton_convolution2d_108", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:18:00.5802079Z AUTOTUNE convolution(8x512x24x24, 256x512x1x1) 2025-09-07T09:18:00.5802468Z strides: [294912, 576, 24, 1], [512, 1, 1, 1] 2025-09-07T09:18:00.5802834Z dtypes: torch.float16, torch.float16 2025-09-07T09:18:00.5803138Z convolution 0.0112 ms 100.0% 2025-09-07T09:18:00.5804027Z triton_convolution2d_108 0.0189 ms 59.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:00.5805256Z triton_convolution2d_107 0.0225 ms 49.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:00.5806481Z triton_convolution2d_110 0.0231 ms 48.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:00.5807692Z triton_convolution2d_109 0.0248 ms 45.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:00.5809362Z triton_convolution2d_104 0.0316 ms 35.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:00.5810577Z triton_convolution2d_105 0.0328 ms 34.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:00.5811796Z triton_convolution2d_106 0.0379 ms 29.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:18:00.5812552Z conv1x1_via_mm 0.0472 ms 23.7% 2025-09-07T09:18:00.5813015Z SingleProcess AUTOTUNE benchmarking takes 0.1420 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:18:01.0369664Z Autotune Choices Stats: 2025-09-07T09:18:01.0371029Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.013887999579310417, "best_triton_pos": 1, "best_triton_time": 0.042047999799251556, "best_triton_kernel": "triton_convolution2d_167", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:18:01.0416612Z AUTOTUNE convolution(8x1536x12x12, 768x1536x1x1) 2025-09-07T09:18:01.0417003Z strides: [221184, 144, 12, 1], [1536, 1, 1, 1] 2025-09-07T09:18:01.0417307Z dtypes: torch.float16, torch.float16 2025-09-07T09:18:01.0417590Z convolution 0.0139 ms 100.0% 2025-09-07T09:18:01.0418373Z triton_convolution2d_167 0.0420 ms 33.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:01.0420255Z triton_convolution2d_166 0.0512 ms 27.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:01.0421074Z conv1x1_via_mm 0.0522 ms 26.6% 2025-09-07T09:18:01.0421822Z triton_convolution2d_168 0.0526 ms 26.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:01.0423062Z triton_convolution2d_169 0.0534 ms 26.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:01.0424691Z triton_convolution2d_163 0.0762 ms 18.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:01.0425756Z triton_convolution2d_164 0.0836 ms 16.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:01.0426816Z triton_convolution2d_165 0.1104 ms 12.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:18:01.0427656Z SingleProcess AUTOTUNE benchmarking takes 0.1656 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:18:01.8043972Z Autotune Choices Stats: 2025-09-07T09:18:01.8045409Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.02191999927163124, "best_triton_pos": 1, "best_triton_time": 0.0424639992415905, "best_triton_kernel": "triton_convolution2d_297", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:18:01.8090592Z AUTOTUNE convolution(8x1536x6x6, 1536x1536x1x1) 2025-09-07T09:18:01.8090912Z strides: [55296, 36, 6, 1], [1536, 1, 1, 1] 2025-09-07T09:18:01.8091221Z dtypes: torch.float16, torch.float16 2025-09-07T09:18:01.8091505Z convolution 0.0219 ms 100.0% 2025-09-07T09:18:01.8092249Z triton_convolution2d_297 0.0425 ms 51.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:01.8093287Z conv1x1_via_mm 0.0452 ms 48.5% 2025-09-07T09:18:01.8094841Z triton_convolution2d_296 0.0543 ms 40.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:01.8096914Z triton_convolution2d_298 0.0544 ms 40.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:01.8098935Z triton_convolution2d_299 0.0549 ms 39.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:01.8100919Z triton_convolution2d_294 0.0730 ms 30.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:01.8102954Z triton_convolution2d_293 0.0785 ms 27.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:01.8104746Z triton_convolution2d_295 0.0794 ms 27.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:18:01.8105682Z SingleProcess AUTOTUNE benchmarking takes 0.1604 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:18:02.3238180Z Autotune Choices Stats: 2025-09-07T09:18:02.3240465Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.020800000056624413, "best_triton_pos": 2, "best_triton_time": 0.04227200150489807, "best_triton_kernel": "triton_convolution2d_330", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:18:02.3285255Z AUTOTUNE convolution(8x1536x6x6, 768x1536x1x1) 2025-09-07T09:18:02.3285572Z strides: [55296, 36, 6, 1], [1536, 1, 1, 1] 2025-09-07T09:18:02.3285878Z dtypes: torch.float16, torch.float16 2025-09-07T09:18:02.3286189Z convolution 0.0208 ms 100.0% 2025-09-07T09:18:02.3286434Z conv1x1_via_mm 0.0276 ms 75.5% 2025-09-07T09:18:02.3287168Z triton_convolution2d_330 0.0423 ms 49.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:02.3288407Z triton_convolution2d_329 0.0540 ms 38.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:02.3289633Z triton_convolution2d_331 0.0541 ms 38.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:02.3290850Z triton_convolution2d_332 0.0550 ms 37.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:02.3292458Z triton_convolution2d_327 0.0724 ms 28.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:02.3293850Z triton_convolution2d_326 0.0788 ms 26.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:02.3294992Z triton_convolution2d_328 0.0792 ms 26.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:18:02.3295903Z SingleProcess AUTOTUNE benchmarking takes 0.1621 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:18:02.7763274Z Autotune Choices Stats: 2025-09-07T09:18:02.7765266Z {"num_choices": 8, "num_triton_choices": 6, "best_kernel": "convolution", "best_time": 0.009056000038981438, "best_triton_pos": 2, "best_triton_time": 0.021888000890612602, "best_triton_kernel": "triton_convolution2d_161", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:18:02.7809092Z AUTOTUNE convolution(8x768x1x1, 1536x768x1x1) 2025-09-07T09:18:02.7809402Z strides: [768, 1, 1, 1], [768, 1, 1, 1] 2025-09-07T09:18:02.7809670Z dtypes: torch.float16, torch.float16 2025-09-07T09:18:02.7809938Z convolution 0.0091 ms 100.0% 2025-09-07T09:18:02.7810190Z conv1x1_via_mm 0.0147 ms 61.7% 2025-09-07T09:18:02.7810933Z triton_convolution2d_161 0.0219 ms 41.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:02.7812556Z triton_convolution2d_160 0.0238 ms 38.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:02.7813999Z triton_convolution2d_162 0.0275 ms 33.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:02.7815181Z triton_convolution2d_159 0.0275 ms 32.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=1 2025-09-07T09:18:02.7816313Z triton_convolution2d_158 0.0306 ms 29.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:02.7817472Z triton_convolution2d_157 0.0390 ms 23.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:02.7818369Z SingleProcess AUTOTUNE benchmarking takes 0.1293 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:18:03.2363992Z Autotune Choices Stats: 2025-09-07T09:18:03.2365487Z {"num_choices": 8, "num_triton_choices": 6, "best_kernel": "convolution", "best_time": 0.009664000011980534, "best_triton_pos": 2, "best_triton_time": 0.036959998309612274, "best_triton_kernel": "triton_convolution2d_155", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:18:03.2410712Z AUTOTUNE convolution(8x1536x1x1, 768x1536x1x1) 2025-09-07T09:18:03.2411027Z strides: [1536, 1, 1, 1], [1536, 1, 1, 1] 2025-09-07T09:18:03.2411800Z dtypes: torch.float16, torch.float16 2025-09-07T09:18:03.2412063Z convolution 0.0097 ms 100.0% 2025-09-07T09:18:03.2412321Z conv1x1_via_mm 0.0127 ms 76.3% 2025-09-07T09:18:03.2413061Z triton_convolution2d_155 0.0370 ms 26.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:03.2414739Z triton_convolution2d_154 0.0407 ms 23.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:03.2415992Z triton_convolution2d_156 0.0483 ms 20.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:03.2417266Z triton_convolution2d_153 0.0529 ms 18.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=1 2025-09-07T09:18:03.2418517Z triton_convolution2d_152 0.0587 ms 16.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:03.2419754Z triton_convolution2d_151 0.0694 ms 13.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:03.2420742Z SingleProcess AUTOTUNE benchmarking takes 0.1386 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:18:03.7005101Z Autotune Choices Stats: 2025-09-07T09:18:03.7006964Z {"num_choices": 8, "num_triton_choices": 6, "best_kernel": "convolution", "best_time": 0.007679999805986881, "best_triton_pos": 2, "best_triton_time": 0.010591999627649784, "best_triton_kernel": "triton_convolution2d_102", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:18:03.7052231Z AUTOTUNE convolution(8x256x1x1, 512x256x1x1) 2025-09-07T09:18:03.7052543Z strides: [256, 1, 1, 1], [256, 1, 1, 1] 2025-09-07T09:18:03.7052835Z dtypes: torch.float16, torch.float16 2025-09-07T09:18:03.7053098Z convolution 0.0077 ms 100.0% 2025-09-07T09:18:03.7053364Z conv1x1_via_mm 0.0099 ms 77.9% 2025-09-07T09:18:03.7054652Z triton_convolution2d_102 0.0106 ms 72.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:03.7055929Z triton_convolution2d_101 0.0110 ms 69.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:03.7057172Z triton_convolution2d_100 0.0123 ms 62.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=1 2025-09-07T09:18:03.7058398Z triton_convolution2d_99 0.0127 ms 60.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:03.7059633Z triton_convolution2d_103 0.0132 ms 58.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:03.7060854Z triton_convolution2d_98 0.0162 ms 47.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:03.7062097Z SingleProcess AUTOTUNE benchmarking takes 0.1268 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:18:04.0841838Z Autotune Choices Stats: 2025-09-07T09:18:04.0843314Z {"num_choices": 8, "num_triton_choices": 6, "best_kernel": "convolution", "best_time": 0.0071680000983178616, "best_triton_pos": 1, "best_triton_time": 0.00800000037997961, "best_triton_kernel": "triton_convolution2d_69", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:18:04.0889166Z AUTOTUNE convolution(8x128x1x1, 256x128x1x1) 2025-09-07T09:18:04.0889460Z strides: [128, 1, 1, 1], [128, 1, 1, 1] 2025-09-07T09:18:04.0889736Z dtypes: torch.float16, torch.float16 2025-09-07T09:18:04.0890034Z convolution 0.0072 ms 100.0% 2025-09-07T09:18:04.0890785Z triton_convolution2d_69 0.0080 ms 89.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:04.0892014Z triton_convolution2d_67 0.0084 ms 85.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=1 2025-09-07T09:18:04.0893241Z triton_convolution2d_68 0.0085 ms 84.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:04.0894622Z triton_convolution2d_66 0.0089 ms 80.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:04.0896064Z triton_convolution2d_70 0.0093 ms 76.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:04.0896783Z conv1x1_via_mm 0.0100 ms 71.8% 2025-09-07T09:18:04.0897457Z triton_convolution2d_65 0.0110 ms 65.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:04.0898353Z SingleProcess AUTOTUNE benchmarking takes 0.1260 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:18:04.4789494Z Autotune Choices Stats: 2025-09-07T09:18:04.4790957Z {"num_choices": 8, "num_triton_choices": 6, "best_kernel": "convolution", "best_time": 0.007552000228315592, "best_triton_pos": 2, "best_triton_time": 0.015263999812304974, "best_triton_kernel": "triton_convolution2d_96", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:18:04.4837289Z AUTOTUNE convolution(8x512x1x1, 256x512x1x1) 2025-09-07T09:18:04.4837625Z strides: [512, 1, 1, 1], [512, 1, 1, 1] 2025-09-07T09:18:04.4837944Z dtypes: torch.float16, torch.float16 2025-09-07T09:18:04.4838229Z convolution 0.0076 ms 100.0% 2025-09-07T09:18:04.4838491Z conv1x1_via_mm 0.0117 ms 64.5% 2025-09-07T09:18:04.4839252Z triton_convolution2d_96 0.0153 ms 49.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:04.4840487Z triton_convolution2d_95 0.0171 ms 44.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:04.4841704Z triton_convolution2d_94 0.0198 ms 38.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=1 2025-09-07T09:18:04.4843267Z triton_convolution2d_97 0.0207 ms 36.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:04.4844918Z triton_convolution2d_93 0.0209 ms 36.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:04.4846123Z triton_convolution2d_92 0.0280 ms 27.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:04.4847090Z SingleProcess AUTOTUNE benchmarking takes 0.1261 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:18:04.8508247Z Autotune Choices Stats: 2025-09-07T09:18:04.8509638Z {"num_choices": 7, "num_triton_choices": 5, "best_kernel": "convolution", "best_time": 0.007104000076651573, "best_triton_pos": 2, "best_triton_time": 0.01027199998497963, "best_triton_kernel": "triton_convolution2d_64", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:18:04.8555871Z AUTOTUNE convolution(8x256x1x1, 128x256x1x1) 2025-09-07T09:18:04.8556197Z strides: [256, 1, 1, 1], [256, 1, 1, 1] 2025-09-07T09:18:04.8556505Z dtypes: torch.float16, torch.float16 2025-09-07T09:18:04.8556791Z convolution 0.0071 ms 100.0% 2025-09-07T09:18:04.8557159Z conv1x1_via_mm 0.0098 ms 72.8% 2025-09-07T09:18:04.8558244Z triton_convolution2d_64 0.0103 ms 69.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:04.8559495Z triton_convolution2d_62 0.0108 ms 65.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=1 2025-09-07T09:18:04.8560713Z triton_convolution2d_63 0.0111 ms 64.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:04.8561914Z triton_convolution2d_61 0.0122 ms 58.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:04.8563110Z triton_convolution2d_60 0.0132 ms 53.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:04.8564553Z SingleProcess AUTOTUNE benchmarking takes 0.1121 seconds and 0.0002 seconds precompiling for 7 choices 2025-09-07T09:18:05.7961887Z Autotune Choices Stats: 2025-09-07T09:18:05.7963319Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.013088000006973743, "best_triton_pos": 1, "best_triton_time": 0.025631999596953392, "best_triton_kernel": "triton_convolution2d_148", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:18:05.8012007Z AUTOTUNE convolution(8x768x12x12, 1536x768x1x1) 2025-09-07T09:18:05.8012342Z strides: [110592, 144, 12, 1], [768, 1, 1, 1] 2025-09-07T09:18:05.8012632Z dtypes: torch.float16, torch.float16 2025-09-07T09:18:05.8012894Z convolution 0.0131 ms 100.0% 2025-09-07T09:18:05.8013653Z triton_convolution2d_148 0.0256 ms 51.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:05.8015701Z triton_convolution2d_147 0.0302 ms 43.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:05.8016975Z triton_convolution2d_150 0.0310 ms 42.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:05.8018256Z triton_convolution2d_149 0.0310 ms 42.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:05.8019515Z triton_convolution2d_144 0.0421 ms 31.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:05.8020776Z triton_convolution2d_145 0.0458 ms 28.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:05.8021568Z conv1x1_via_mm 0.0527 ms 24.8% 2025-09-07T09:18:05.8022323Z triton_convolution2d_146 0.0699 ms 18.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:18:05.8023338Z SingleProcess AUTOTUNE benchmarking takes 0.1433 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:18:05.9562611Z Autotune Choices Stats: 2025-09-07T09:18:05.9564691Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.019071999937295914, "best_triton_pos": 1, "best_triton_time": 0.024480000138282776, "best_triton_kernel": "triton_convolution2d_311", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:18:05.9610152Z AUTOTUNE convolution(8x768x6x6, 1536x768x1x1) 2025-09-07T09:18:05.9610474Z strides: [27648, 36, 6, 1], [768, 1, 1, 1] 2025-09-07T09:18:05.9610749Z dtypes: torch.float16, torch.float16 2025-09-07T09:18:05.9611015Z convolution 0.0191 ms 100.0% 2025-09-07T09:18:05.9611749Z triton_convolution2d_311 0.0245 ms 77.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:05.9612982Z triton_convolution2d_310 0.0306 ms 62.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:05.9613903Z conv1x1_via_mm 0.0311 ms 61.3% 2025-09-07T09:18:05.9614643Z triton_convolution2d_312 0.0312 ms 61.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:05.9615838Z triton_convolution2d_313 0.0313 ms 60.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:05.9616961Z triton_convolution2d_308 0.0398 ms 47.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:05.9618085Z triton_convolution2d_307 0.0418 ms 45.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:05.9619436Z triton_convolution2d_309 0.0425 ms 44.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:18:05.9620331Z SingleProcess AUTOTUNE benchmarking takes 0.1377 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:18:06.1271896Z Autotune Choices Stats: 2025-09-07T09:18:06.1273214Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.02848000079393387, "best_triton_pos": 1, "best_triton_time": 0.04521600157022476, "best_triton_kernel": "triton_convolution2d_382", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:18:06.1319560Z AUTOTUNE convolution(8x1536x6x6, 3072x1536x1x1) 2025-09-07T09:18:06.1319900Z strides: [55296, 36, 6, 1], [1536, 1, 1, 1] 2025-09-07T09:18:06.1320209Z dtypes: torch.float16, torch.float16 2025-09-07T09:18:06.1320487Z convolution 0.0285 ms 100.0% 2025-09-07T09:18:06.1321219Z triton_convolution2d_382 0.0452 ms 63.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:06.1322430Z triton_convolution2d_381 0.0550 ms 51.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:06.1324057Z triton_convolution2d_383 0.0561 ms 50.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:06.1325353Z triton_convolution2d_384 0.0567 ms 50.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:18:06.1326048Z conv1x1_via_mm 0.0721 ms 39.5% 2025-09-07T09:18:06.1326729Z triton_convolution2d_378 0.0788 ms 36.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:06.1327857Z triton_convolution2d_380 0.0825 ms 34.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:18:06.1328985Z triton_convolution2d_379 0.0849 ms 33.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:18:06.1329877Z SingleProcess AUTOTUNE benchmarking takes 0.1601 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:18:06.4020193Z Autotune Choices Stats: 2025-09-07T09:18:06.4021469Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "bias_addmm", "best_time": 0.01414399966597557, "best_triton_pos": 1, "best_triton_time": 0.014944000169634819, "best_triton_kernel": "triton_mm_389", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2"} 2025-09-07T09:18:06.4068471Z AUTOTUNE addmm(8x1000, 8x3072, 3072x1000) 2025-09-07T09:18:06.4068780Z strides: [0, 1], [3072, 1], [1, 3072] 2025-09-07T09:18:06.4069087Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T09:18:06.4069448Z bias_addmm 0.0141 ms 100.0% 2025-09-07T09:18:06.4070095Z triton_mm_389 0.0149 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:18:06.4071377Z triton_mm_393 0.0158 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:18:06.4072004Z addmm 0.0176 ms 80.2% 2025-09-07T09:18:06.4072606Z triton_mm_401 0.0204 ms 69.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:18:06.4073586Z triton_mm_397 0.0218 ms 64.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:18:06.4074820Z triton_mm_388 0.0234 ms 60.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:18:06.4075655Z triton_mm_387 0.0249 ms 56.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:18:06.4076487Z triton_mm_392 0.0253 ms 55.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:18:06.4077417Z triton_mm_386 0.0257 ms 55.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:18:06.4078149Z SingleProcess AUTOTUNE benchmarking takes 0.2736 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T09:18:22.3850966Z Autotune Choices Stats: 2025-09-07T09:18:22.3852569Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_431", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.007807999849319458, "best_triton_pos": 0} 2025-09-07T09:18:22.3900351Z AUTOTUNE mm(1000x8, 8x3072) 2025-09-07T09:18:22.3900689Z strides: [1, 1000], [3072, 1] 2025-09-07T09:18:22.3901003Z dtypes: torch.float16, torch.float16 2025-09-07T09:18:22.3901718Z triton_mm_431 0.0078 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:18:22.3902754Z triton_mm_424 0.0078 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:18:22.3903976Z triton_mm_426 0.0078 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:18:22.3904977Z triton_mm_428 0.0079 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:18:22.3905976Z triton_mm_427 0.0080 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:18:22.3906946Z triton_mm_429 0.0080 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:18:22.3907917Z triton_mm_430 0.0081 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:18:22.3908893Z triton_mm_433 0.0082 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:18:22.3910313Z triton_mm_425 0.0082 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:18:22.3911231Z triton_mm_432 0.0082 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:18:22.3912035Z SingleProcess AUTOTUNE benchmarking takes 0.1611 seconds and 0.0003 seconds precompiling for 17 choices 2025-09-07T09:18:22.9829700Z Autotune Choices Stats: 2025-09-07T09:18:22.9831082Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "mm", "best_time": 0.010143999941647053, "best_triton_pos": 1, "best_triton_time": 0.010879999957978725, "best_triton_kernel": "triton_mm_410", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T09:18:22.9877552Z AUTOTUNE mm(8x1000, 1000x3072) 2025-09-07T09:18:22.9877810Z strides: [1000, 1], [3072, 1] 2025-09-07T09:18:22.9878075Z dtypes: torch.float16, torch.float16 2025-09-07T09:18:22.9878339Z mm 0.0101 ms 100.0% 2025-09-07T09:18:22.9878962Z triton_mm_410 0.0109 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:18:22.9879943Z triton_mm_414 0.0111 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:18:22.9880913Z triton_mm_406 0.0116 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:18:22.9882193Z triton_mm_418 0.0122 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:18:22.9883183Z triton_mm_404 0.0124 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:18:22.9884574Z triton_mm_409 0.0131 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:18:22.9885555Z triton_mm_405 0.0132 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:18:22.9886514Z triton_mm_413 0.0140 ms 72.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:18:22.9887483Z triton_mm_416 0.0142 ms 71.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:18:22.9888323Z SingleProcess AUTOTUNE benchmarking takes 0.1865 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:18:29.6177132Z W0907 09:18:29.616000 63359 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T09:19:04.6438790Z pass 2025-09-07T09:19:11.1809101Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T09:19:11.1810352Z import pynvml # type: ignore[import] 2025-09-07T09:19:14.1694235Z 2025-09-07T09:19:16.5566235Z loading model: 0it [00:00, ?it/s] 2025-09-07T09:19:16.5566592Z loading model: 0it [00:02, ?it/s] 2025-09-07T09:19:16.5566894Z cuda train dpn107 2025-09-07T09:19:55.3776348Z Autotune Choices Stats: 2025-09-07T09:19:55.3777888Z {"num_choices": 8, "num_triton_choices": 6, "best_kernel": "convolution", "best_time": 0.01142400037497282, "best_triton_pos": 2, "best_triton_time": 0.06092799827456474, "best_triton_kernel": "triton_convolution2d_529", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:55.3830379Z AUTOTUNE convolution(8x2688x1x1, 1000x2688x1x1) 2025-09-07T09:19:55.3830739Z strides: [2688, 1, 1, 1], [2688, 1, 1, 1] 2025-09-07T09:19:55.3831046Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:55.3831355Z convolution 0.0114 ms 100.0% 2025-09-07T09:19:55.3831615Z conv1x1_via_mm 0.0156 ms 73.3% 2025-09-07T09:19:55.3832414Z triton_convolution2d_529 0.0609 ms 18.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:55.3833534Z triton_convolution2d_528 0.0682 ms 16.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:55.3835005Z triton_convolution2d_530 0.0780 ms 14.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:55.3836057Z triton_convolution2d_527 0.0878 ms 13.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=1 2025-09-07T09:19:55.3837571Z triton_convolution2d_526 0.0997 ms 11.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:55.3838652Z triton_convolution2d_525 0.1286 ms 8.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:55.3839517Z SingleProcess AUTOTUNE benchmarking takes 0.1805 seconds and 0.0003 seconds precompiling for 8 choices 2025-09-07T09:19:55.8945953Z Autotune Choices Stats: 2025-09-07T09:19:55.8947098Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_6", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.09318400174379349, "best_triton_pos": 0} 2025-09-07T09:19:55.8997641Z AUTOTUNE convolution(8x3x224x224, 128x3x7x7) 2025-09-07T09:19:55.8998013Z strides: [150528, 50176, 224, 1], [147, 49, 7, 1] 2025-09-07T09:19:55.8998371Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:55.8999158Z triton_convolution2d_6 0.0932 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:19:55.9000407Z triton_convolution2d_3 0.0967 ms 96.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:19:55.9001634Z triton_convolution2d_1 0.0996 ms 93.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:19:55.9003056Z triton_convolution2d_0 0.1112 ms 83.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:19:55.9004885Z triton_convolution2d_4 0.1324 ms 70.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:19:55.9006100Z triton_convolution2d_5 0.1333 ms 69.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:19:55.9006847Z convolution 0.1481 ms 62.9% 2025-09-07T09:19:55.9007568Z triton_convolution2d_2 0.2715 ms 34.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:19:55.9008549Z SingleProcess AUTOTUNE benchmarking takes 0.1843 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:19:56.0273599Z Autotune Choices Stats: 2025-09-07T09:19:56.0275294Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.013567999936640263, "best_triton_pos": 1, "best_triton_time": 0.0180479995906353, "best_triton_kernel": "triton_convolution2d_11", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:56.0323161Z AUTOTUNE convolution(8x128x56x56, 296x128x1x1) 2025-09-07T09:19:56.0323595Z strides: [401408, 3136, 56, 1], [128, 1, 1, 1] 2025-09-07T09:19:56.0324068Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:56.0324338Z convolution 0.0136 ms 100.0% 2025-09-07T09:19:56.0325065Z triton_convolution2d_11 0.0180 ms 75.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.0326664Z triton_convolution2d_13 0.0208 ms 65.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.0327913Z triton_convolution2d_7 0.0221 ms 61.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.0329136Z triton_convolution2d_10 0.0226 ms 60.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.0330381Z triton_convolution2d_8 0.0238 ms 56.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.0331612Z triton_convolution2d_9 0.0298 ms 45.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:56.0332829Z triton_convolution2d_12 0.0311 ms 43.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.0333579Z conv1x1_via_mm 0.1057 ms 12.8% 2025-09-07T09:19:56.0334158Z SingleProcess AUTOTUNE benchmarking takes 0.1321 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:56.1659023Z Autotune Choices Stats: 2025-09-07T09:19:56.1660442Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.012319999746978283, "best_triton_pos": 1, "best_triton_time": 0.015135999768972397, "best_triton_kernel": "triton_convolution2d_14", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:56.1708170Z AUTOTUNE convolution(8x128x56x56, 200x128x1x1) 2025-09-07T09:19:56.1709237Z strides: [401408, 3136, 56, 1], [128, 1, 1, 1] 2025-09-07T09:19:56.1709626Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:56.1709939Z convolution 0.0123 ms 100.0% 2025-09-07T09:19:56.1710746Z triton_convolution2d_14 0.0151 ms 81.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.1712012Z triton_convolution2d_18 0.0162 ms 75.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.1713294Z triton_convolution2d_15 0.0176 ms 69.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.1714739Z triton_convolution2d_17 0.0179 ms 68.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.1715822Z triton_convolution2d_20 0.0193 ms 63.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.1716879Z triton_convolution2d_19 0.0216 ms 57.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.1718612Z triton_convolution2d_16 0.0260 ms 47.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:56.1719301Z conv1x1_via_mm 0.0804 ms 15.3% 2025-09-07T09:19:56.1719723Z SingleProcess AUTOTUNE benchmarking takes 0.1369 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:56.3010048Z Autotune Choices Stats: 2025-09-07T09:19:56.3011443Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01942400075495243, "best_triton_pos": 1, "best_triton_time": 0.026688000187277794, "best_triton_kernel": "triton_convolution2d_27", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8"} 2025-09-07T09:19:56.3058885Z AUTOTUNE convolution(8x200x56x56, 276x200x1x1) 2025-09-07T09:19:56.3059257Z strides: [627200, 3136, 56, 1], [200, 1, 1, 1] 2025-09-07T09:19:56.3059579Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:56.3059899Z convolution 0.0194 ms 100.0% 2025-09-07T09:19:56.3060660Z triton_convolution2d_27 0.0267 ms 72.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.3061881Z triton_convolution2d_25 0.0282 ms 68.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.3063145Z triton_convolution2d_24 0.0294 ms 66.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.3064389Z triton_convolution2d_22 0.0325 ms 59.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.3065816Z triton_convolution2d_26 0.0352 ms 55.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.3066873Z triton_convolution2d_21 0.0391 ms 49.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.3067945Z triton_convolution2d_23 0.0411 ms 47.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:56.3068604Z conv1x1_via_mm 0.1547 ms 12.6% 2025-09-07T09:19:56.3069013Z SingleProcess AUTOTUNE benchmarking takes 0.1335 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:56.4369265Z Autotune Choices Stats: 2025-09-07T09:19:56.4370377Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_32", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4", "best_time": 0.02707199938595295, "best_triton_pos": 0} 2025-09-07T09:19:56.4418670Z AUTOTUNE convolution(8x316x56x56, 200x316x1x1) 2025-09-07T09:19:56.4419021Z strides: [990976, 3136, 56, 1], [316, 1, 1, 1] 2025-09-07T09:19:56.4419333Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:56.4420106Z triton_convolution2d_32 0.0271 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.4421683Z triton_convolution2d_33 0.0284 ms 95.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.4423002Z triton_convolution2d_31 0.0291 ms 93.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.4424430Z triton_convolution2d_29 0.0307 ms 88.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.4425488Z triton_convolution2d_34 0.0321 ms 84.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.4426125Z convolution 0.0346 ms 78.3% 2025-09-07T09:19:56.4426765Z triton_convolution2d_28 0.0427 ms 63.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.4427827Z triton_convolution2d_30 0.0516 ms 52.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:56.4428477Z conv1x1_via_mm 0.1538 ms 17.6% 2025-09-07T09:19:56.4428891Z SingleProcess AUTOTUNE benchmarking takes 0.1346 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:56.5735278Z Autotune Choices Stats: 2025-09-07T09:19:56.5736613Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.018144000321626663, "best_triton_pos": 1, "best_triton_time": 0.027936000376939774, "best_triton_kernel": "triton_convolution2d_42", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:56.5784180Z AUTOTUNE convolution(8x336x56x56, 200x336x1x1) 2025-09-07T09:19:56.5784553Z strides: [1053696, 3136, 56, 1], [336, 1, 1, 1] 2025-09-07T09:19:56.5784859Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:56.5785142Z convolution 0.0181 ms 100.0% 2025-09-07T09:19:56.5785892Z triton_convolution2d_42 0.0279 ms 64.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.5787121Z triton_convolution2d_46 0.0295 ms 61.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.5788358Z triton_convolution2d_43 0.0308 ms 59.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.5789586Z triton_convolution2d_45 0.0310 ms 58.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.5790800Z triton_convolution2d_47 0.0320 ms 56.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.5792025Z triton_convolution2d_48 0.0348 ms 52.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.5793264Z triton_convolution2d_44 0.0542 ms 33.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:56.5794273Z conv1x1_via_mm 0.1361 ms 13.3% 2025-09-07T09:19:56.5794705Z SingleProcess AUTOTUNE benchmarking takes 0.1336 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:56.7140942Z Autotune Choices Stats: 2025-09-07T09:19:56.7142001Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_60", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4", "best_time": 0.03081599995493889, "best_triton_pos": 0} 2025-09-07T09:19:56.7191122Z AUTOTUNE convolution(8x356x56x56, 200x356x1x1) 2025-09-07T09:19:56.7191486Z strides: [1116416, 3136, 56, 1], [356, 1, 1, 1] 2025-09-07T09:19:56.7191810Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:56.7192597Z triton_convolution2d_60 0.0308 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.7193932Z triton_convolution2d_61 0.0314 ms 98.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.7195006Z triton_convolution2d_57 0.0336 ms 91.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.7196054Z triton_convolution2d_59 0.0338 ms 91.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.7197208Z triton_convolution2d_62 0.0370 ms 83.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.7198158Z convolution 0.0374 ms 82.4% 2025-09-07T09:19:56.7198775Z triton_convolution2d_56 0.0495 ms 62.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.7199820Z triton_convolution2d_58 0.0574 ms 53.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:56.7200459Z conv1x1_via_mm 0.1661 ms 18.5% 2025-09-07T09:19:56.7200888Z SingleProcess AUTOTUNE benchmarking takes 0.1369 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:56.8342240Z Autotune Choices Stats: 2025-09-07T09:19:56.8343418Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_73", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8", "best_time": 0.030527999624609947, "best_triton_pos": 0} 2025-09-07T09:19:56.8391835Z AUTOTUNE convolution(8x376x56x56, 640x376x1x1) 2025-09-07T09:19:56.8392165Z strides: [1179136, 3136, 56, 1], [376, 1, 1, 1] 2025-09-07T09:19:56.8392474Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:56.8393276Z triton_convolution2d_73 0.0305 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.8394704Z triton_convolution2d_76 0.0347 ms 88.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.8396329Z triton_convolution2d_71 0.0364 ms 83.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.8397694Z triton_convolution2d_75 0.0372 ms 82.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.8398451Z convolution 0.0481 ms 63.4% 2025-09-07T09:19:56.8399168Z triton_convolution2d_70 0.0498 ms 61.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.8400398Z triton_convolution2d_74 0.0551 ms 55.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.8401644Z triton_convolution2d_72 0.1473 ms 20.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:56.8402640Z SingleProcess AUTOTUNE benchmarking takes 0.1172 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:19:56.9915055Z Autotune Choices Stats: 2025-09-07T09:19:56.9916443Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.02332800067961216, "best_triton_pos": 1, "best_triton_time": 0.05158400163054466, "best_triton_kernel": "triton_convolution2d_80", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8"} 2025-09-07T09:19:56.9964691Z AUTOTUNE convolution(8x376x56x56, 400x376x1x1) 2025-09-07T09:19:56.9965015Z strides: [1179136, 3136, 56, 1], [376, 1, 1, 1] 2025-09-07T09:19:56.9965321Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:56.9965909Z convolution 0.0233 ms 100.0% 2025-09-07T09:19:56.9966648Z triton_convolution2d_80 0.0516 ms 45.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.9967888Z triton_convolution2d_83 0.0540 ms 43.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.9969097Z triton_convolution2d_82 0.0580 ms 40.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:56.9970307Z triton_convolution2d_78 0.0581 ms 40.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.9971519Z triton_convolution2d_81 0.0616 ms 37.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.9972725Z triton_convolution2d_77 0.0771 ms 30.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:56.9974236Z triton_convolution2d_79 0.0956 ms 24.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:56.9974931Z conv1x1_via_mm 0.2007 ms 11.6% 2025-09-07T09:19:56.9975363Z SingleProcess AUTOTUNE benchmarking takes 0.1568 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:57.1264776Z Autotune Choices Stats: 2025-09-07T09:19:57.1266470Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.014208000153303146, "best_triton_pos": 1, "best_triton_time": 0.019999999552965164, "best_triton_kernel": "triton_convolution2d_88", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:57.1314366Z AUTOTUNE convolution(8x400x28x28, 576x400x1x1) 2025-09-07T09:19:57.1314715Z strides: [313600, 784, 28, 1], [400, 1, 1, 1] 2025-09-07T09:19:57.1315026Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:57.1315305Z convolution 0.0142 ms 100.0% 2025-09-07T09:19:57.1316045Z triton_convolution2d_88 0.0200 ms 71.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:57.1317432Z triton_convolution2d_87 0.0220 ms 64.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:57.1318657Z triton_convolution2d_90 0.0229 ms 62.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:57.1319861Z triton_convolution2d_89 0.0285 ms 49.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:57.1321057Z triton_convolution2d_84 0.0291 ms 48.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:57.1322260Z triton_convolution2d_85 0.0297 ms 47.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:57.1323968Z triton_convolution2d_86 0.0370 ms 38.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:57.1324672Z conv1x1_via_mm 0.0908 ms 15.7% 2025-09-07T09:19:57.1325104Z SingleProcess AUTOTUNE benchmarking takes 0.1335 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:57.2671903Z Autotune Choices Stats: 2025-09-07T09:19:57.2673288Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.015104000456631184, "best_triton_pos": 1, "best_triton_time": 0.025728000327944756, "best_triton_kernel": "triton_convolution2d_95", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:57.2725410Z AUTOTUNE convolution(8x704x28x28, 400x704x1x1) 2025-09-07T09:19:57.2725901Z strides: [551936, 784, 28, 1], [704, 1, 1, 1] 2025-09-07T09:19:57.2734226Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:57.2734516Z convolution 0.0151 ms 100.0% 2025-09-07T09:19:57.2735220Z triton_convolution2d_95 0.0257 ms 58.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:57.2736380Z triton_convolution2d_94 0.0294 ms 51.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:57.2737867Z triton_convolution2d_96 0.0295 ms 51.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:57.2739023Z triton_convolution2d_97 0.0308 ms 49.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:57.2740149Z triton_convolution2d_91 0.0394 ms 38.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:57.2741280Z triton_convolution2d_92 0.0446 ms 33.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:57.2742424Z triton_convolution2d_93 0.0591 ms 25.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:57.2743125Z conv1x1_via_mm 0.0969 ms 15.6% 2025-09-07T09:19:57.2743576Z SingleProcess AUTOTUNE benchmarking takes 0.1396 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:57.4109993Z Autotune Choices Stats: 2025-09-07T09:19:57.4111467Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01539199985563755, "best_triton_pos": 1, "best_triton_time": 0.027583999559283257, "best_triton_kernel": "triton_convolution2d_109", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:57.4160918Z AUTOTUNE convolution(8x768x28x28, 400x768x1x1) 2025-09-07T09:19:57.4161780Z strides: [602112, 784, 28, 1], [768, 1, 1, 1] 2025-09-07T09:19:57.4162136Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:57.4162892Z convolution 0.0154 ms 100.0% 2025-09-07T09:19:57.4164093Z triton_convolution2d_109 0.0276 ms 55.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:57.4165373Z triton_convolution2d_108 0.0311 ms 49.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:57.4166617Z triton_convolution2d_110 0.0313 ms 49.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:57.4167848Z triton_convolution2d_111 0.0330 ms 46.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:57.4169078Z triton_convolution2d_105 0.0423 ms 36.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:57.4170299Z triton_convolution2d_106 0.0479 ms 32.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:57.4171525Z triton_convolution2d_107 0.0637 ms 24.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:57.4172269Z conv1x1_via_mm 0.1009 ms 15.3% 2025-09-07T09:19:57.4172752Z SingleProcess AUTOTUNE benchmarking takes 0.1392 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:57.5555979Z Autotune Choices Stats: 2025-09-07T09:19:57.5557696Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.016607999801635742, "best_triton_pos": 1, "best_triton_time": 0.029823999851942062, "best_triton_kernel": "triton_convolution2d_123", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:57.5606500Z AUTOTUNE convolution(8x832x28x28, 400x832x1x1) 2025-09-07T09:19:57.5606908Z strides: [652288, 784, 28, 1], [832, 1, 1, 1] 2025-09-07T09:19:57.5607195Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:57.5607463Z convolution 0.0166 ms 100.0% 2025-09-07T09:19:57.5608202Z triton_convolution2d_123 0.0298 ms 55.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:57.5609452Z triton_convolution2d_124 0.0335 ms 49.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:57.5610687Z triton_convolution2d_122 0.0336 ms 49.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:57.5611914Z triton_convolution2d_125 0.0356 ms 46.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:57.5613155Z triton_convolution2d_119 0.0458 ms 36.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:57.5614676Z triton_convolution2d_120 0.0515 ms 32.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:57.5616040Z triton_convolution2d_121 0.0685 ms 24.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:57.5616736Z conv1x1_via_mm 0.1080 ms 15.4% 2025-09-07T09:19:57.5617177Z SingleProcess AUTOTUNE benchmarking takes 0.1406 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:57.7016881Z Autotune Choices Stats: 2025-09-07T09:19:57.7018223Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.016095999628305435, "best_triton_pos": 1, "best_triton_time": 0.03097599931061268, "best_triton_kernel": "triton_convolution2d_137", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:57.7068478Z AUTOTUNE convolution(8x896x28x28, 400x896x1x1) 2025-09-07T09:19:57.7068863Z strides: [702464, 784, 28, 1], [896, 1, 1, 1] 2025-09-07T09:19:57.7069172Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:57.7069458Z convolution 0.0161 ms 100.0% 2025-09-07T09:19:57.7070227Z triton_convolution2d_137 0.0310 ms 52.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:57.7071471Z triton_convolution2d_138 0.0349 ms 46.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:57.7073095Z triton_convolution2d_136 0.0349 ms 46.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:57.7074602Z triton_convolution2d_139 0.0372 ms 43.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:57.7075654Z triton_convolution2d_133 0.0484 ms 33.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:57.7076710Z triton_convolution2d_134 0.0547 ms 29.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:57.7077859Z triton_convolution2d_135 0.0729 ms 22.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:57.7078523Z conv1x1_via_mm 0.1152 ms 14.0% 2025-09-07T09:19:57.7078940Z SingleProcess AUTOTUNE benchmarking takes 0.1423 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:57.8498551Z Autotune Choices Stats: 2025-09-07T09:19:57.8499913Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.016127999871969223, "best_triton_pos": 1, "best_triton_time": 0.03283200040459633, "best_triton_kernel": "triton_convolution2d_151", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:57.8550717Z AUTOTUNE convolution(8x960x28x28, 400x960x1x1) 2025-09-07T09:19:57.8551096Z strides: [752640, 784, 28, 1], [960, 1, 1, 1] 2025-09-07T09:19:57.8551412Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:57.8552157Z convolution 0.0161 ms 100.0% 2025-09-07T09:19:57.8552931Z triton_convolution2d_151 0.0328 ms 49.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:57.8554421Z triton_convolution2d_152 0.0370 ms 43.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:57.8555473Z triton_convolution2d_150 0.0371 ms 43.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:57.8556498Z triton_convolution2d_153 0.0393 ms 41.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:57.8557606Z triton_convolution2d_147 0.0512 ms 31.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:57.8558606Z triton_convolution2d_148 0.0581 ms 27.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:57.8559621Z triton_convolution2d_149 0.0778 ms 20.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:57.8560239Z conv1x1_via_mm 0.1220 ms 13.2% 2025-09-07T09:19:57.8560640Z SingleProcess AUTOTUNE benchmarking takes 0.1443 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:57.9999294Z Autotune Choices Stats: 2025-09-07T09:19:58.0000677Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.016896000131964684, "best_triton_pos": 1, "best_triton_time": 0.03440000116825104, "best_triton_kernel": "triton_convolution2d_165", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:58.0051257Z AUTOTUNE convolution(8x1024x28x28, 400x1024x1x1) 2025-09-07T09:19:58.0051566Z strides: [802816, 784, 28, 1], [1024, 1, 1, 1] 2025-09-07T09:19:58.0051837Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:58.0052085Z convolution 0.0169 ms 100.0% 2025-09-07T09:19:58.0052773Z triton_convolution2d_165 0.0344 ms 49.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.0054502Z triton_convolution2d_166 0.0390 ms 43.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.0055757Z triton_convolution2d_164 0.0395 ms 42.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.0057013Z triton_convolution2d_167 0.0411 ms 41.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.0058254Z triton_convolution2d_161 0.0561 ms 30.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.0059769Z triton_convolution2d_162 0.0620 ms 27.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.0061041Z triton_convolution2d_163 0.0822 ms 20.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:58.0061800Z conv1x1_via_mm 0.1211 ms 13.9% 2025-09-07T09:19:58.0062293Z SingleProcess AUTOTUNE benchmarking takes 0.1462 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:58.1517919Z Autotune Choices Stats: 2025-09-07T09:19:58.1519288Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.017311999574303627, "best_triton_pos": 1, "best_triton_time": 0.03625600039958954, "best_triton_kernel": "triton_convolution2d_179", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:58.1570406Z AUTOTUNE convolution(8x1088x28x28, 400x1088x1x1) 2025-09-07T09:19:58.1570727Z strides: [852992, 784, 28, 1], [1088, 1, 1, 1] 2025-09-07T09:19:58.1571010Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:58.1571284Z convolution 0.0173 ms 100.0% 2025-09-07T09:19:58.1572015Z triton_convolution2d_179 0.0363 ms 47.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.1573240Z triton_convolution2d_180 0.0410 ms 42.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.1575194Z triton_convolution2d_178 0.0412 ms 42.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.1576395Z triton_convolution2d_181 0.0434 ms 39.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.1577547Z triton_convolution2d_175 0.0574 ms 30.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.1578682Z triton_convolution2d_176 0.0651 ms 26.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.1579833Z triton_convolution2d_177 0.0858 ms 20.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:58.1580548Z conv1x1_via_mm 0.1317 ms 13.1% 2025-09-07T09:19:58.1580987Z SingleProcess AUTOTUNE benchmarking takes 0.1480 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:58.2952151Z Autotune Choices Stats: 2025-09-07T09:19:58.2953228Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_193", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4", "best_time": 0.04211200028657913, "best_triton_pos": 0} 2025-09-07T09:19:58.3006284Z AUTOTUNE convolution(8x1152x28x28, 1152x1152x1x1) 2025-09-07T09:19:58.3006634Z strides: [903168, 784, 28, 1], [1152, 1, 1, 1] 2025-09-07T09:19:58.3006927Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:58.3007700Z triton_convolution2d_193 0.0421 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.3009237Z triton_convolution2d_194 0.0466 ms 90.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.3010461Z triton_convolution2d_192 0.0489 ms 86.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.3011205Z convolution 0.0516 ms 81.6% 2025-09-07T09:19:58.3011922Z triton_convolution2d_189 0.0693 ms 60.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.3013142Z triton_convolution2d_195 0.0703 ms 59.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.3014549Z triton_convolution2d_190 0.0791 ms 53.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.3015692Z triton_convolution2d_191 0.2535 ms 16.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:58.3016605Z SingleProcess AUTOTUNE benchmarking takes 0.1396 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:19:58.4740202Z Autotune Choices Stats: 2025-09-07T09:19:58.4741826Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.031936001032590866, "best_triton_pos": 1, "best_triton_time": 0.0666240006685257, "best_triton_kernel": "triton_convolution2d_200", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:58.4792521Z AUTOTUNE convolution(8x1152x28x28, 800x1152x1x1) 2025-09-07T09:19:58.4792900Z strides: [903168, 784, 28, 1], [1152, 1, 1, 1] 2025-09-07T09:19:58.4793212Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:58.4793492Z convolution 0.0319 ms 100.0% 2025-09-07T09:19:58.4794432Z triton_convolution2d_200 0.0666 ms 47.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.4795732Z triton_convolution2d_197 0.0704 ms 45.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.4796990Z triton_convolution2d_199 0.0732 ms 43.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.4798436Z triton_convolution2d_201 0.0769 ms 41.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.4799702Z triton_convolution2d_202 0.0805 ms 39.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.4800920Z triton_convolution2d_196 0.1050 ms 30.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.4802551Z triton_convolution2d_198 0.1709 ms 18.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:58.4803318Z conv1x1_via_mm 0.1904 ms 16.8% 2025-09-07T09:19:58.4803991Z SingleProcess AUTOTUNE benchmarking takes 0.1782 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:58.6186137Z Autotune Choices Stats: 2025-09-07T09:19:58.6187230Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_207", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4", "best_time": 0.027904000133275986, "best_triton_pos": 0} 2025-09-07T09:19:58.6239764Z AUTOTUNE convolution(8x800x14x14, 1088x800x1x1) 2025-09-07T09:19:58.6240147Z strides: [156800, 196, 14, 1], [800, 1, 1, 1] 2025-09-07T09:19:58.6240448Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:58.6241234Z triton_convolution2d_207 0.0279 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.6242527Z triton_convolution2d_206 0.0321 ms 86.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.6244222Z triton_convolution2d_208 0.0326 ms 85.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.6245718Z triton_convolution2d_209 0.0329 ms 84.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.6246495Z convolution 0.0330 ms 84.5% 2025-09-07T09:19:58.6247216Z triton_convolution2d_204 0.0415 ms 67.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.6248448Z triton_convolution2d_203 0.0424 ms 65.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.6249194Z conv1x1_via_mm 0.0658 ms 42.4% 2025-09-07T09:19:58.6249928Z triton_convolution2d_205 0.0721 ms 38.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:58.6250906Z SingleProcess AUTOTUNE benchmarking takes 0.1431 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:58.7712332Z Autotune Choices Stats: 2025-09-07T09:19:58.7713665Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.0331839993596077, "best_triton_pos": 1, "best_triton_time": 0.037087999284267426, "best_triton_kernel": "triton_convolution2d_214", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:58.7765731Z AUTOTUNE convolution(8x1216x14x14, 800x1216x1x1) 2025-09-07T09:19:58.7766062Z strides: [238336, 196, 14, 1], [1216, 1, 1, 1] 2025-09-07T09:19:58.7766351Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:58.7766617Z convolution 0.0332 ms 100.0% 2025-09-07T09:19:58.7767353Z triton_convolution2d_214 0.0371 ms 89.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.7768814Z triton_convolution2d_213 0.0441 ms 75.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.7770040Z triton_convolution2d_215 0.0444 ms 74.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.7771253Z triton_convolution2d_216 0.0454 ms 73.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.7772467Z triton_convolution2d_210 0.0590 ms 56.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.7773691Z triton_convolution2d_211 0.0694 ms 47.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.7774612Z conv1x1_via_mm 0.0831 ms 39.9% 2025-09-07T09:19:58.7775300Z triton_convolution2d_212 0.0949 ms 35.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:58.7776196Z SingleProcess AUTOTUNE benchmarking takes 0.1521 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:58.9266157Z Autotune Choices Stats: 2025-09-07T09:19:58.9267760Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.034304000437259674, "best_triton_pos": 1, "best_triton_time": 0.03843199834227562, "best_triton_kernel": "triton_convolution2d_228", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:58.9320575Z AUTOTUNE convolution(8x1280x14x14, 800x1280x1x1) 2025-09-07T09:19:58.9320913Z strides: [250880, 196, 14, 1], [1280, 1, 1, 1] 2025-09-07T09:19:58.9321206Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:58.9321475Z convolution 0.0343 ms 100.0% 2025-09-07T09:19:58.9322235Z triton_convolution2d_228 0.0384 ms 89.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.9323560Z triton_convolution2d_227 0.0454 ms 75.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.9325180Z triton_convolution2d_229 0.0460 ms 74.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.9326397Z triton_convolution2d_230 0.0471 ms 72.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:58.9327613Z triton_convolution2d_224 0.0614 ms 55.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.9328840Z triton_convolution2d_225 0.0724 ms 47.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:58.9329771Z conv1x1_via_mm 0.0811 ms 42.3% 2025-09-07T09:19:58.9330499Z triton_convolution2d_226 0.0980 ms 35.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:58.9331467Z SingleProcess AUTOTUNE benchmarking takes 0.1522 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:59.0839847Z Autotune Choices Stats: 2025-09-07T09:19:59.0841219Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.03516799956560135, "best_triton_pos": 1, "best_triton_time": 0.039903998374938965, "best_triton_kernel": "triton_convolution2d_242", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:59.0893684Z AUTOTUNE convolution(8x1344x14x14, 800x1344x1x1) 2025-09-07T09:19:59.0894419Z strides: [263424, 196, 14, 1], [1344, 1, 1, 1] 2025-09-07T09:19:59.0894791Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:59.0895100Z convolution 0.0352 ms 100.0% 2025-09-07T09:19:59.0895889Z triton_convolution2d_242 0.0399 ms 88.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:59.0897150Z triton_convolution2d_243 0.0470 ms 74.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:59.0898370Z triton_convolution2d_241 0.0471 ms 74.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:59.0899960Z triton_convolution2d_244 0.0491 ms 71.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:59.0901239Z triton_convolution2d_238 0.0639 ms 55.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:59.0902472Z triton_convolution2d_239 0.0754 ms 46.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:59.0903222Z conv1x1_via_mm 0.0839 ms 41.9% 2025-09-07T09:19:59.0904182Z triton_convolution2d_240 0.1033 ms 34.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:59.0905054Z SingleProcess AUTOTUNE benchmarking takes 0.1542 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:59.2428352Z Autotune Choices Stats: 2025-09-07T09:19:59.2429710Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.036896001547575, "best_triton_pos": 1, "best_triton_time": 0.04137599840760231, "best_triton_kernel": "triton_convolution2d_256", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:59.2482029Z AUTOTUNE convolution(8x1408x14x14, 800x1408x1x1) 2025-09-07T09:19:59.2482384Z strides: [275968, 196, 14, 1], [1408, 1, 1, 1] 2025-09-07T09:19:59.2482692Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:59.2482975Z convolution 0.0369 ms 100.0% 2025-09-07T09:19:59.2484149Z triton_convolution2d_256 0.0414 ms 89.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:59.2485726Z triton_convolution2d_255 0.0492 ms 75.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:59.2486954Z triton_convolution2d_257 0.0497 ms 74.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:59.2488199Z triton_convolution2d_258 0.0510 ms 72.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:59.2489426Z triton_convolution2d_252 0.0663 ms 55.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:59.2490647Z triton_convolution2d_253 0.0781 ms 47.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:59.2491408Z conv1x1_via_mm 0.0863 ms 42.8% 2025-09-07T09:19:59.2492135Z triton_convolution2d_254 0.1071 ms 34.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:59.2493108Z SingleProcess AUTOTUNE benchmarking takes 0.1557 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:59.4043955Z Autotune Choices Stats: 2025-09-07T09:19:59.4045603Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.03814399987459183, "best_triton_pos": 1, "best_triton_time": 0.04310400038957596, "best_triton_kernel": "triton_convolution2d_270", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:59.4097981Z AUTOTUNE convolution(8x1472x14x14, 800x1472x1x1) 2025-09-07T09:19:59.4098333Z strides: [288512, 196, 14, 1], [1472, 1, 1, 1] 2025-09-07T09:19:59.4098647Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:59.4098922Z convolution 0.0381 ms 100.0% 2025-09-07T09:19:59.4099691Z triton_convolution2d_270 0.0431 ms 88.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:59.4100938Z triton_convolution2d_269 0.0508 ms 75.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:59.4102173Z triton_convolution2d_271 0.0515 ms 74.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:59.4103377Z triton_convolution2d_272 0.0533 ms 71.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:59.4104764Z triton_convolution2d_266 0.0695 ms 54.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:59.4105820Z triton_convolution2d_267 0.0822 ms 46.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:59.4106682Z conv1x1_via_mm 0.0940 ms 40.6% 2025-09-07T09:19:59.4107319Z triton_convolution2d_268 0.1118 ms 34.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:59.4108174Z SingleProcess AUTOTUNE benchmarking takes 0.1584 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:59.5672193Z Autotune Choices Stats: 2025-09-07T09:19:59.5673535Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.038784001022577286, "best_triton_pos": 1, "best_triton_time": 0.04495999962091446, "best_triton_kernel": "triton_convolution2d_284", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:59.5727908Z AUTOTUNE convolution(8x1536x14x14, 800x1536x1x1) 2025-09-07T09:19:59.5728261Z strides: [301056, 196, 14, 1], [1536, 1, 1, 1] 2025-09-07T09:19:59.5728554Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:59.5728827Z convolution 0.0388 ms 100.0% 2025-09-07T09:19:59.5729571Z triton_convolution2d_284 0.0450 ms 86.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:59.5730828Z triton_convolution2d_283 0.0525 ms 73.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:59.5732064Z triton_convolution2d_285 0.0539 ms 72.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:59.5733597Z triton_convolution2d_286 0.0551 ms 70.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:59.5735024Z triton_convolution2d_280 0.0722 ms 53.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:59.5736258Z triton_convolution2d_281 0.0849 ms 45.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:59.5737006Z conv1x1_via_mm 0.0909 ms 42.6% 2025-09-07T09:19:59.5737752Z triton_convolution2d_282 0.1153 ms 33.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:59.5738725Z SingleProcess AUTOTUNE benchmarking takes 0.1598 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:59.7346099Z Autotune Choices Stats: 2025-09-07T09:19:59.7347515Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.04022400081157684, "best_triton_pos": 1, "best_triton_time": 0.046431999653577805, "best_triton_kernel": "triton_convolution2d_298", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:59.7403483Z AUTOTUNE convolution(8x1600x14x14, 800x1600x1x1) 2025-09-07T09:19:59.7404250Z strides: [313600, 196, 14, 1], [1600, 1, 1, 1] 2025-09-07T09:19:59.7404554Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:59.7404829Z convolution 0.0402 ms 100.0% 2025-09-07T09:19:59.7405584Z triton_convolution2d_298 0.0464 ms 86.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:59.7407137Z triton_convolution2d_297 0.0546 ms 73.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:59.7408368Z triton_convolution2d_299 0.0556 ms 72.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:59.7409585Z triton_convolution2d_300 0.0569 ms 70.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:59.7410796Z triton_convolution2d_294 0.0750 ms 53.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:59.7412001Z triton_convolution2d_295 0.0881 ms 45.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:59.7412743Z conv1x1_via_mm 0.0972 ms 41.4% 2025-09-07T09:19:59.7413486Z triton_convolution2d_296 0.1207 ms 33.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:59.7414626Z SingleProcess AUTOTUNE benchmarking takes 0.1643 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:19:59.9021279Z Autotune Choices Stats: 2025-09-07T09:19:59.9022981Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.04121600091457367, "best_triton_pos": 1, "best_triton_time": 0.048128001391887665, "best_triton_kernel": "triton_convolution2d_312", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:19:59.9076914Z AUTOTUNE convolution(8x1664x14x14, 800x1664x1x1) 2025-09-07T09:19:59.9077421Z strides: [326144, 196, 14, 1], [1664, 1, 1, 1] 2025-09-07T09:19:59.9077751Z dtypes: torch.float16, torch.float16 2025-09-07T09:19:59.9078039Z convolution 0.0412 ms 100.0% 2025-09-07T09:19:59.9078819Z triton_convolution2d_312 0.0481 ms 85.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:59.9080086Z triton_convolution2d_311 0.0565 ms 73.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:59.9081318Z triton_convolution2d_313 0.0575 ms 71.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:59.9082542Z triton_convolution2d_314 0.0590 ms 69.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:19:59.9084044Z triton_convolution2d_308 0.0777 ms 53.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:59.9085346Z triton_convolution2d_309 0.0913 ms 45.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:19:59.9086423Z conv1x1_via_mm 0.0998 ms 41.3% 2025-09-07T09:19:59.9087163Z triton_convolution2d_310 0.1239 ms 33.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:19:59.9088130Z SingleProcess AUTOTUNE benchmarking takes 0.1639 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:20:00.0711956Z Autotune Choices Stats: 2025-09-07T09:20:00.0713322Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.04217600077390671, "best_triton_pos": 1, "best_triton_time": 0.04956800118088722, "best_triton_kernel": "triton_convolution2d_326", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:20:00.0768374Z AUTOTUNE convolution(8x1728x14x14, 800x1728x1x1) 2025-09-07T09:20:00.0768717Z strides: [338688, 196, 14, 1], [1728, 1, 1, 1] 2025-09-07T09:20:00.0768991Z dtypes: torch.float16, torch.float16 2025-09-07T09:20:00.0769242Z convolution 0.0422 ms 100.0% 2025-09-07T09:20:00.0769929Z triton_convolution2d_326 0.0496 ms 85.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:00.0771085Z triton_convolution2d_325 0.0585 ms 72.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:00.0772227Z triton_convolution2d_327 0.0593 ms 71.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:00.0773690Z triton_convolution2d_328 0.0604 ms 69.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:00.0775670Z triton_convolution2d_322 0.0799 ms 52.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:00.0776751Z triton_convolution2d_323 0.0944 ms 44.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:00.0777406Z conv1x1_via_mm 0.1025 ms 41.2% 2025-09-07T09:20:00.0778052Z triton_convolution2d_324 0.1279 ms 33.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:20:00.0778896Z SingleProcess AUTOTUNE benchmarking takes 0.1659 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:20:00.2421455Z Autotune Choices Stats: 2025-09-07T09:20:00.2422796Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.04419200122356415, "best_triton_pos": 1, "best_triton_time": 0.051231998950242996, "best_triton_kernel": "triton_convolution2d_340", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:20:00.2477836Z AUTOTUNE convolution(8x1792x14x14, 800x1792x1x1) 2025-09-07T09:20:00.2478220Z strides: [351232, 196, 14, 1], [1792, 1, 1, 1] 2025-09-07T09:20:00.2478535Z dtypes: torch.float16, torch.float16 2025-09-07T09:20:00.2478825Z convolution 0.0442 ms 100.0% 2025-09-07T09:20:00.2479611Z triton_convolution2d_340 0.0512 ms 86.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:00.2481215Z triton_convolution2d_339 0.0607 ms 72.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:00.2482444Z triton_convolution2d_341 0.0613 ms 72.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:00.2483657Z triton_convolution2d_342 0.0625 ms 70.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:00.2486093Z triton_convolution2d_336 0.0830 ms 53.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:00.2487242Z triton_convolution2d_337 0.0982 ms 45.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:00.2487936Z conv1x1_via_mm 0.1061 ms 41.6% 2025-09-07T09:20:00.2488614Z triton_convolution2d_338 0.1350 ms 32.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:20:00.2489520Z SingleProcess AUTOTUNE benchmarking takes 0.1676 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:20:00.4165563Z Autotune Choices Stats: 2025-09-07T09:20:00.4167156Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.044863998889923096, "best_triton_pos": 1, "best_triton_time": 0.0533440001308918, "best_triton_kernel": "triton_convolution2d_354", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:20:00.4221889Z AUTOTUNE convolution(8x1856x14x14, 800x1856x1x1) 2025-09-07T09:20:00.4222244Z strides: [363776, 196, 14, 1], [1856, 1, 1, 1] 2025-09-07T09:20:00.4222553Z dtypes: torch.float16, torch.float16 2025-09-07T09:20:00.4222835Z convolution 0.0449 ms 100.0% 2025-09-07T09:20:00.4223586Z triton_convolution2d_354 0.0533 ms 84.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:00.4225075Z triton_convolution2d_353 0.0632 ms 71.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:00.4226332Z triton_convolution2d_355 0.0637 ms 70.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:00.4227552Z triton_convolution2d_356 0.0651 ms 68.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:00.4228775Z triton_convolution2d_350 0.0856 ms 52.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:00.4229985Z triton_convolution2d_351 0.1036 ms 43.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:00.4230976Z conv1x1_via_mm 0.1072 ms 41.9% 2025-09-07T09:20:00.4231736Z triton_convolution2d_352 0.1420 ms 31.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:20:00.4232723Z SingleProcess AUTOTUNE benchmarking takes 0.1712 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:20:00.5915926Z Autotune Choices Stats: 2025-09-07T09:20:00.5917398Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.046720001846551895, "best_triton_pos": 1, "best_triton_time": 0.05427199974656105, "best_triton_kernel": "triton_convolution2d_368", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:20:00.5972094Z AUTOTUNE convolution(8x1920x14x14, 800x1920x1x1) 2025-09-07T09:20:00.5972430Z strides: [376320, 196, 14, 1], [1920, 1, 1, 1] 2025-09-07T09:20:00.5972699Z dtypes: torch.float16, torch.float16 2025-09-07T09:20:00.5972962Z convolution 0.0467 ms 100.0% 2025-09-07T09:20:00.5973654Z triton_convolution2d_368 0.0543 ms 86.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:00.5975254Z triton_convolution2d_367 0.0651 ms 71.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:00.5976941Z triton_convolution2d_369 0.0660 ms 70.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:00.5978222Z triton_convolution2d_370 0.0668 ms 69.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:00.5979448Z triton_convolution2d_364 0.0892 ms 52.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:00.5980689Z triton_convolution2d_365 0.1052 ms 44.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:00.5981464Z conv1x1_via_mm 0.1106 ms 42.2% 2025-09-07T09:20:00.5982217Z triton_convolution2d_366 0.1440 ms 32.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:20:00.5983197Z SingleProcess AUTOTUNE benchmarking takes 0.1717 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:20:00.7681466Z Autotune Choices Stats: 2025-09-07T09:20:00.7682836Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.047968000173568726, "best_triton_pos": 1, "best_triton_time": 0.056384000927209854, "best_triton_kernel": "triton_convolution2d_382", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:20:00.7738569Z AUTOTUNE convolution(8x1984x14x14, 800x1984x1x1) 2025-09-07T09:20:00.7738946Z strides: [388864, 196, 14, 1], [1984, 1, 1, 1] 2025-09-07T09:20:00.7739272Z dtypes: torch.float16, torch.float16 2025-09-07T09:20:00.7739546Z convolution 0.0480 ms 100.0% 2025-09-07T09:20:00.7740635Z triton_convolution2d_382 0.0564 ms 85.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:00.7741890Z triton_convolution2d_381 0.0662 ms 72.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:00.7743207Z triton_convolution2d_383 0.0672 ms 71.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:00.7744709Z triton_convolution2d_384 0.0690 ms 69.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:00.7745780Z triton_convolution2d_378 0.0909 ms 52.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:00.7746845Z triton_convolution2d_379 0.1068 ms 44.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:00.7747489Z conv1x1_via_mm 0.1153 ms 41.6% 2025-09-07T09:20:00.7748143Z triton_convolution2d_380 0.1471 ms 32.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:20:00.7748987Z SingleProcess AUTOTUNE benchmarking takes 0.1733 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:20:00.9472223Z Autotune Choices Stats: 2025-09-07T09:20:00.9473846Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.04835199937224388, "best_triton_pos": 1, "best_triton_time": 0.05734400078654289, "best_triton_kernel": "triton_convolution2d_396", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:20:00.9530267Z AUTOTUNE convolution(8x2048x14x14, 800x2048x1x1) 2025-09-07T09:20:00.9530604Z strides: [401408, 196, 14, 1], [2048, 1, 1, 1] 2025-09-07T09:20:00.9530876Z dtypes: torch.float16, torch.float16 2025-09-07T09:20:00.9531118Z convolution 0.0484 ms 100.0% 2025-09-07T09:20:00.9531807Z triton_convolution2d_396 0.0573 ms 84.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:00.9532980Z triton_convolution2d_397 0.0683 ms 70.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:00.9534291Z triton_convolution2d_395 0.0684 ms 70.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:00.9535439Z triton_convolution2d_398 0.0702 ms 68.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:00.9536561Z triton_convolution2d_392 0.0965 ms 50.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:00.9537723Z triton_convolution2d_393 0.1123 ms 43.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:00.9538707Z conv1x1_via_mm 0.1143 ms 42.3% 2025-09-07T09:20:00.9539407Z triton_convolution2d_394 0.1529 ms 31.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:20:00.9540313Z SingleProcess AUTOTUNE benchmarking takes 0.1758 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:20:01.1286494Z Autotune Choices Stats: 2025-09-07T09:20:01.1287762Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.04995200037956238, "best_triton_pos": 1, "best_triton_time": 0.06032000109553337, "best_triton_kernel": "triton_convolution2d_410", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:20:01.1345457Z AUTOTUNE convolution(8x2112x14x14, 800x2112x1x1) 2025-09-07T09:20:01.1345805Z strides: [413952, 196, 14, 1], [2112, 1, 1, 1] 2025-09-07T09:20:01.1346110Z dtypes: torch.float16, torch.float16 2025-09-07T09:20:01.1346398Z convolution 0.0500 ms 100.0% 2025-09-07T09:20:01.1347152Z triton_convolution2d_410 0.0603 ms 82.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:01.1348396Z triton_convolution2d_409 0.0715 ms 69.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:01.1349980Z triton_convolution2d_411 0.0724 ms 68.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:01.1351231Z triton_convolution2d_412 0.0733 ms 68.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:01.1352445Z triton_convolution2d_406 0.0969 ms 51.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:01.1353666Z triton_convolution2d_407 0.1160 ms 43.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:01.1354848Z conv1x1_via_mm 0.1211 ms 41.3% 2025-09-07T09:20:01.1355508Z triton_convolution2d_408 0.1602 ms 31.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:20:01.1356375Z SingleProcess AUTOTUNE benchmarking takes 0.1781 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:20:01.3119028Z Autotune Choices Stats: 2025-09-07T09:20:01.3120404Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.05132799968123436, "best_triton_pos": 1, "best_triton_time": 0.061216000467538834, "best_triton_kernel": "triton_convolution2d_424", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:20:01.3176918Z AUTOTUNE convolution(8x2176x14x14, 800x2176x1x1) 2025-09-07T09:20:01.3177394Z strides: [426496, 196, 14, 1], [2176, 1, 1, 1] 2025-09-07T09:20:01.3177707Z dtypes: torch.float16, torch.float16 2025-09-07T09:20:01.3177997Z convolution 0.0513 ms 100.0% 2025-09-07T09:20:01.3179198Z triton_convolution2d_424 0.0612 ms 83.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:01.3180449Z triton_convolution2d_423 0.0728 ms 70.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:01.3181685Z triton_convolution2d_425 0.0732 ms 70.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:01.3182903Z triton_convolution2d_426 0.0749 ms 68.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:01.3184541Z triton_convolution2d_420 0.0995 ms 51.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:01.3185678Z triton_convolution2d_421 0.1174 ms 43.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:01.3186330Z conv1x1_via_mm 0.1261 ms 40.7% 2025-09-07T09:20:01.3186973Z triton_convolution2d_422 0.1608 ms 31.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:20:01.3187824Z SingleProcess AUTOTUNE benchmarking takes 0.1798 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:20:01.4976230Z Autotune Choices Stats: 2025-09-07T09:20:01.4978363Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.05212799832224846, "best_triton_pos": 1, "best_triton_time": 0.06345599889755249, "best_triton_kernel": "triton_convolution2d_438", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:20:01.5033112Z AUTOTUNE convolution(8x2240x14x14, 800x2240x1x1) 2025-09-07T09:20:01.5033498Z strides: [439040, 196, 14, 1], [2240, 1, 1, 1] 2025-09-07T09:20:01.5034035Z dtypes: torch.float16, torch.float16 2025-09-07T09:20:01.5034338Z convolution 0.0521 ms 100.0% 2025-09-07T09:20:01.5035102Z triton_convolution2d_438 0.0635 ms 82.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:01.5036346Z triton_convolution2d_437 0.0749 ms 69.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:01.5037666Z triton_convolution2d_439 0.0753 ms 69.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:01.5038877Z triton_convolution2d_440 0.0777 ms 67.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:01.5040082Z triton_convolution2d_434 0.1016 ms 51.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:01.5041310Z triton_convolution2d_435 0.1212 ms 43.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:01.5042423Z conv1x1_via_mm 0.1331 ms 39.2% 2025-09-07T09:20:01.5043172Z triton_convolution2d_436 0.1700 ms 30.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:20:01.5044345Z SingleProcess AUTOTUNE benchmarking takes 0.1822 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:20:01.6845951Z Autotune Choices Stats: 2025-09-07T09:20:01.6847399Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.05452800169587135, "best_triton_pos": 1, "best_triton_time": 0.06492800265550613, "best_triton_kernel": "triton_convolution2d_452", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:20:01.6903961Z AUTOTUNE convolution(8x2304x14x14, 800x2304x1x1) 2025-09-07T09:20:01.6904696Z strides: [451584, 196, 14, 1], [2304, 1, 1, 1] 2025-09-07T09:20:01.6905071Z dtypes: torch.float16, torch.float16 2025-09-07T09:20:01.6905362Z convolution 0.0545 ms 100.0% 2025-09-07T09:20:01.6906137Z triton_convolution2d_452 0.0649 ms 84.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:01.6907416Z triton_convolution2d_451 0.0765 ms 71.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:01.6909081Z triton_convolution2d_453 0.0777 ms 70.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:01.6910380Z triton_convolution2d_454 0.0803 ms 67.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:01.6911600Z triton_convolution2d_448 0.1055 ms 51.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:01.6912812Z triton_convolution2d_449 0.1250 ms 43.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:01.6913569Z conv1x1_via_mm 0.1302 ms 41.9% 2025-09-07T09:20:01.6914524Z triton_convolution2d_450 0.1722 ms 31.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:20:01.6915441Z SingleProcess AUTOTUNE benchmarking takes 0.1836 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:20:01.8736077Z Autotune Choices Stats: 2025-09-07T09:20:01.8737460Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.05721599981188774, "best_triton_pos": 1, "best_triton_time": 0.06566400080919266, "best_triton_kernel": "triton_convolution2d_466", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:20:01.8794993Z AUTOTUNE convolution(8x2368x14x14, 800x2368x1x1) 2025-09-07T09:20:01.8795802Z strides: [464128, 196, 14, 1], [2368, 1, 1, 1] 2025-09-07T09:20:01.8796156Z dtypes: torch.float16, torch.float16 2025-09-07T09:20:01.8796449Z convolution 0.0572 ms 100.0% 2025-09-07T09:20:01.8798077Z triton_convolution2d_466 0.0657 ms 87.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:01.8799318Z triton_convolution2d_465 0.0773 ms 74.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:01.8800549Z triton_convolution2d_467 0.0784 ms 73.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:01.8801760Z triton_convolution2d_468 0.0806 ms 71.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:01.8802983Z triton_convolution2d_462 0.1083 ms 52.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:01.8804825Z triton_convolution2d_463 0.1291 ms 44.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:01.8805619Z conv1x1_via_mm 0.1328 ms 43.1% 2025-09-07T09:20:01.8806349Z triton_convolution2d_464 0.1786 ms 32.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:20:01.8807316Z SingleProcess AUTOTUNE benchmarking takes 0.1857 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:20:02.0621733Z Autotune Choices Stats: 2025-09-07T09:20:02.0623556Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.04940799996256828, "best_triton_pos": 1, "best_triton_time": 0.08636800199747086, "best_triton_kernel": "triton_convolution2d_481", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8"} 2025-09-07T09:20:02.0680005Z AUTOTUNE convolution(8x2432x14x14, 2304x2432x1x1) 2025-09-07T09:20:02.0680504Z strides: [476672, 196, 14, 1], [2432, 1, 1, 1] 2025-09-07T09:20:02.0680837Z dtypes: torch.float16, torch.float16 2025-09-07T09:20:02.0681126Z convolution 0.0494 ms 100.0% 2025-09-07T09:20:02.0681912Z triton_convolution2d_481 0.0864 ms 57.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:02.0683183Z triton_convolution2d_479 0.0913 ms 54.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:02.0684981Z triton_convolution2d_480 0.1351 ms 36.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:02.0686248Z triton_convolution2d_476 0.1392 ms 35.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:02.0687469Z triton_convolution2d_482 0.1587 ms 31.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:02.0688697Z triton_convolution2d_477 0.1646 ms 30.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:02.0690251Z triton_convolution2d_478 0.4015 ms 12.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:20:02.0691232Z SingleProcess AUTOTUNE benchmarking takes 0.1850 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:20:02.2570452Z Autotune Choices Stats: 2025-09-07T09:20:02.2571837Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.0708480030298233, "best_triton_pos": 1, "best_triton_time": 0.07280000299215317, "best_triton_kernel": "triton_convolution2d_487", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:20:02.2628415Z AUTOTUNE convolution(8x2432x14x14, 1600x2432x1x1) 2025-09-07T09:20:02.2628803Z strides: [476672, 196, 14, 1], [2432, 1, 1, 1] 2025-09-07T09:20:02.2629122Z dtypes: torch.float16, torch.float16 2025-09-07T09:20:02.2629407Z convolution 0.0708 ms 100.0% 2025-09-07T09:20:02.2630186Z triton_convolution2d_487 0.0728 ms 97.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:02.2631435Z triton_convolution2d_486 0.0825 ms 85.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:02.2633023Z triton_convolution2d_488 0.0833 ms 85.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:02.2634518Z triton_convolution2d_489 0.0896 ms 79.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:02.2635807Z triton_convolution2d_484 0.1120 ms 63.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:02.2636930Z triton_convolution2d_483 0.1125 ms 63.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:02.2638160Z triton_convolution2d_485 0.1960 ms 36.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:20:02.2638860Z conv1x1_via_mm 0.2106 ms 33.6% 2025-09-07T09:20:02.2639305Z SingleProcess AUTOTUNE benchmarking takes 0.1932 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:20:02.4346002Z Autotune Choices Stats: 2025-09-07T09:20:02.4347374Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.03455999866127968, "best_triton_pos": 1, "best_triton_time": 0.05663999915122986, "best_triton_kernel": "triton_convolution2d_493", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8"} 2025-09-07T09:20:02.4405189Z AUTOTUNE convolution(8x1600x7x7, 2176x1600x1x1) 2025-09-07T09:20:02.4405532Z strides: [78400, 49, 7, 1], [1600, 1, 1, 1] 2025-09-07T09:20:02.4405838Z dtypes: torch.float16, torch.float16 2025-09-07T09:20:02.4406129Z convolution 0.0346 ms 100.0% 2025-09-07T09:20:02.4407287Z triton_convolution2d_493 0.0566 ms 61.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:02.4408643Z triton_convolution2d_495 0.0567 ms 60.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:02.4409962Z triton_convolution2d_494 0.0741 ms 46.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:02.4411271Z triton_convolution2d_490 0.0811 ms 42.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:02.4412589Z triton_convolution2d_496 0.0834 ms 41.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:02.4414098Z triton_convolution2d_491 0.1016 ms 34.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:02.4415417Z triton_convolution2d_492 0.1340 ms 25.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:20:02.4416173Z conv1x1_via_mm 0.3168 ms 10.9% 2025-09-07T09:20:02.4416566Z SingleProcess AUTOTUNE benchmarking takes 0.1759 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:20:02.6365800Z Autotune Choices Stats: 2025-09-07T09:20:02.6367448Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.032575998455286026, "best_triton_pos": 1, "best_triton_time": 0.07955200225114822, "best_triton_kernel": "triton_convolution2d_500", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8"} 2025-09-07T09:20:02.6425590Z AUTOTUNE convolution(8x2432x7x7, 1600x2432x1x1) 2025-09-07T09:20:02.6425938Z strides: [119168, 49, 7, 1], [2432, 1, 1, 1] 2025-09-07T09:20:02.6426241Z dtypes: torch.float16, torch.float16 2025-09-07T09:20:02.6426533Z convolution 0.0326 ms 100.0% 2025-09-07T09:20:02.6427307Z triton_convolution2d_500 0.0796 ms 40.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:02.6428577Z triton_convolution2d_502 0.0809 ms 40.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:02.6429328Z conv1x1_via_mm 0.0860 ms 37.9% 2025-09-07T09:20:02.6430086Z triton_convolution2d_501 0.1082 ms 30.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:02.6431324Z triton_convolution2d_497 0.1141 ms 28.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:02.6432547Z triton_convolution2d_503 0.1220 ms 26.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:02.6434121Z triton_convolution2d_498 0.1487 ms 21.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:02.6435622Z triton_convolution2d_499 0.1606 ms 20.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:20:02.6436470Z SingleProcess AUTOTUNE benchmarking takes 0.2004 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:20:02.8449556Z Autotune Choices Stats: 2025-09-07T09:20:02.8450963Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.034591998904943466, "best_triton_pos": 1, "best_triton_time": 0.08403199911117554, "best_triton_kernel": "triton_convolution2d_514", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8"} 2025-09-07T09:20:02.8509413Z AUTOTUNE convolution(8x2560x7x7, 1600x2560x1x1) 2025-09-07T09:20:02.8509807Z strides: [125440, 49, 7, 1], [2560, 1, 1, 1] 2025-09-07T09:20:02.8510117Z dtypes: torch.float16, torch.float16 2025-09-07T09:20:02.8510409Z convolution 0.0346 ms 100.0% 2025-09-07T09:20:02.8511192Z triton_convolution2d_514 0.0840 ms 41.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:02.8512456Z triton_convolution2d_516 0.0849 ms 40.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:02.8513226Z conv1x1_via_mm 0.0881 ms 39.3% 2025-09-07T09:20:02.8514544Z triton_convolution2d_515 0.1133 ms 30.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:02.8515830Z triton_convolution2d_511 0.1190 ms 29.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:02.8516967Z triton_convolution2d_517 0.1276 ms 27.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:20:02.8518206Z triton_convolution2d_512 0.1554 ms 22.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:20:02.8519338Z triton_convolution2d_513 0.1699 ms 20.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:20:02.8520240Z SingleProcess AUTOTUNE benchmarking takes 0.2038 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:20:41.5305697Z pass 2025-09-07T09:20:48.5075905Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T09:20:48.5078047Z import pynvml # type: ignore[import] 2025-09-07T09:20:51.5113091Z 2025-09-07T09:20:54.3972922Z loading model: 0it [00:00, ?it/s] 2025-09-07T09:20:54.3973338Z loading model: 0it [00:02, ?it/s] 2025-09-07T09:20:54.3973685Z cuda train eca_botnext26ts_256 2025-09-07T09:21:14.9217454Z Autotune Choices Stats: 2025-09-07T09:21:14.9218656Z {"num_choices": 7, "num_triton_choices": 6, "best_kernel": "triton_convolution2d_1", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.014911999925971031, "best_triton_pos": 0} 2025-09-07T09:21:14.9283092Z AUTOTUNE convolution(8x3x256x256, 24x3x3x3) 2025-09-07T09:21:14.9283436Z strides: [196608, 65536, 256, 1], [27, 9, 3, 1] 2025-09-07T09:21:14.9283902Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:14.9284692Z triton_convolution2d_1 0.0149 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:21:14.9285962Z triton_convolution2d_5 0.0156 ms 95.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:21:14.9287226Z triton_convolution2d_3 0.0168 ms 88.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:21:14.9288418Z triton_convolution2d_0 0.0178 ms 83.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:21:14.9289622Z triton_convolution2d_4 0.0207 ms 72.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:21:14.9290977Z triton_convolution2d_2 0.0230 ms 64.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:21:14.9291716Z convolution 0.0249 ms 59.8% 2025-09-07T09:21:14.9292436Z SingleProcess AUTOTUNE benchmarking takes 0.0979 seconds and 0.0003 seconds precompiling for 7 choices 2025-09-07T09:21:15.4657131Z Autotune Choices Stats: 2025-09-07T09:21:15.4658271Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_7", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.03420799970626831, "best_triton_pos": 0} 2025-09-07T09:21:15.4723973Z AUTOTUNE convolution(8x24x128x128, 32x24x3x3) 2025-09-07T09:21:15.4724352Z strides: [393216, 16384, 128, 1], [216, 9, 3, 1] 2025-09-07T09:21:15.4724676Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:15.4725505Z triton_convolution2d_7 0.0342 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:21:15.4726823Z triton_convolution2d_12 0.0350 ms 97.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:21:15.4728069Z triton_convolution2d_9 0.0382 ms 89.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:21:15.4729279Z triton_convolution2d_6 0.0386 ms 88.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:21:15.4730656Z triton_convolution2d_10 0.0428 ms 79.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:21:15.4731733Z convolution 0.0457 ms 74.9% 2025-09-07T09:21:15.4732453Z triton_convolution2d_11 0.0469 ms 72.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:21:15.4733666Z triton_convolution2d_8 0.0663 ms 51.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:21:15.4734811Z SingleProcess AUTOTUNE benchmarking takes 0.5436 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:21:15.5929450Z Autotune Choices Stats: 2025-09-07T09:21:15.5930743Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_14", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.04179200157523155, "best_triton_pos": 0} 2025-09-07T09:21:15.5991443Z AUTOTUNE convolution(8x32x128x128, 64x32x3x3) 2025-09-07T09:21:15.5991817Z strides: [524288, 16384, 128, 1], [288, 9, 3, 1] 2025-09-07T09:21:15.5992138Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:15.5992960Z triton_convolution2d_14 0.0418 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:21:15.5994717Z triton_convolution2d_19 0.0430 ms 97.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:21:15.5995995Z triton_convolution2d_16 0.0438 ms 95.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:21:15.5997774Z triton_convolution2d_17 0.0491 ms 85.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:21:15.5999065Z triton_convolution2d_13 0.0524 ms 79.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T09:21:15.6000305Z triton_convolution2d_18 0.0527 ms 79.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T09:21:15.6001077Z convolution 0.0567 ms 73.7% 2025-09-07T09:21:15.6001720Z triton_convolution2d_15 0.1312 ms 31.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T09:21:15.6002568Z SingleProcess AUTOTUNE benchmarking takes 0.1263 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:21:15.7328049Z Autotune Choices Stats: 2025-09-07T09:21:15.7329149Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_25", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8", "best_time": 0.009344000369310379, "best_triton_pos": 0} 2025-09-07T09:21:15.7389699Z AUTOTUNE convolution(8x64x64x64, 64x64x1x1) 2025-09-07T09:21:15.7390067Z strides: [262144, 4096, 64, 1], [64, 1, 1, 1] 2025-09-07T09:21:15.7390400Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:15.7391206Z triton_convolution2d_25 0.0093 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:15.7392788Z triton_convolution2d_24 0.0094 ms 99.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:15.7394404Z triton_convolution2d_20 0.0099 ms 94.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:15.7395159Z convolution 0.0100 ms 93.6% 2025-09-07T09:21:15.7395890Z triton_convolution2d_23 0.0100 ms 93.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:15.7397256Z triton_convolution2d_26 0.0106 ms 88.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:15.7398508Z triton_convolution2d_21 0.0113 ms 82.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:15.7399722Z triton_convolution2d_22 0.0125 ms 74.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:21:15.7400485Z conv1x1_via_mm 0.0510 ms 18.3% 2025-09-07T09:21:15.7400938Z SingleProcess AUTOTUNE benchmarking takes 0.1394 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:21:15.8709723Z Autotune Choices Stats: 2025-09-07T09:21:15.8711462Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01321600005030632, "best_triton_pos": 1, "best_triton_time": 0.014976000413298607, "best_triton_kernel": "triton_convolution2d_31", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:21:15.8771108Z AUTOTUNE convolution(8x64x64x64, 256x64x1x1) 2025-09-07T09:21:15.8771404Z strides: [262144, 4096, 64, 1], [64, 1, 1, 1] 2025-09-07T09:21:15.8771667Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:15.8771926Z convolution 0.0132 ms 100.0% 2025-09-07T09:21:15.8772605Z triton_convolution2d_31 0.0150 ms 88.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:15.8774050Z triton_convolution2d_30 0.0159 ms 83.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:15.8775202Z triton_convolution2d_32 0.0172 ms 76.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:15.8776331Z triton_convolution2d_33 0.0172 ms 76.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:15.8777444Z triton_convolution2d_27 0.0180 ms 73.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:15.8778560Z triton_convolution2d_28 0.0185 ms 71.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:15.8779961Z triton_convolution2d_29 0.0219 ms 60.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:21:15.8780654Z conv1x1_via_mm 0.0971 ms 13.6% 2025-09-07T09:21:15.8781093Z SingleProcess AUTOTUNE benchmarking takes 0.1377 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:21:16.0115448Z Autotune Choices Stats: 2025-09-07T09:21:16.0116795Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.015807999297976494, "best_triton_pos": 1, "best_triton_time": 0.016256000846624374, "best_triton_kernel": "triton_convolution2d_46", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8"} 2025-09-07T09:21:16.0176345Z AUTOTUNE convolution(8x256x64x64, 64x256x1x1) 2025-09-07T09:21:16.0176683Z strides: [1048576, 4096, 64, 1], [256, 1, 1, 1] 2025-09-07T09:21:16.0176969Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:16.0177225Z convolution 0.0158 ms 100.0% 2025-09-07T09:21:16.0177914Z triton_convolution2d_46 0.0163 ms 97.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.0179072Z triton_convolution2d_45 0.0164 ms 96.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.0180237Z triton_convolution2d_44 0.0171 ms 92.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.0181648Z triton_convolution2d_47 0.0186 ms 85.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.0182801Z triton_convolution2d_41 0.0197 ms 80.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.0184274Z triton_convolution2d_42 0.0233 ms 68.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.0185416Z triton_convolution2d_43 0.0276 ms 57.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:21:16.0186107Z conv1x1_via_mm 0.0980 ms 16.1% 2025-09-07T09:21:16.0186550Z SingleProcess AUTOTUNE benchmarking takes 0.1372 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:21:16.1524325Z Autotune Choices Stats: 2025-09-07T09:21:16.1525722Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.017184000462293625, "best_triton_pos": 1, "best_triton_time": 0.01852799952030182, "best_triton_kernel": "triton_convolution2d_59", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:21:16.1586561Z AUTOTUNE convolution(8x256x64x64, 128x256x1x1) 2025-09-07T09:21:16.1586855Z strides: [1048576, 4096, 64, 1], [256, 1, 1, 1] 2025-09-07T09:21:16.1587206Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:16.1587452Z convolution 0.0172 ms 100.0% 2025-09-07T09:21:16.1588104Z triton_convolution2d_59 0.0185 ms 92.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.1589473Z triton_convolution2d_58 0.0190 ms 90.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.1590531Z triton_convolution2d_61 0.0198 ms 86.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.1591627Z triton_convolution2d_55 0.0207 ms 82.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.1592679Z triton_convolution2d_60 0.0230 ms 74.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.1594062Z triton_convolution2d_56 0.0246 ms 69.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.1595134Z triton_convolution2d_57 0.0295 ms 58.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:21:16.1595785Z conv1x1_via_mm 0.1131 ms 15.2% 2025-09-07T09:21:16.1596196Z SingleProcess AUTOTUNE benchmarking takes 0.1378 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:21:16.2912747Z Autotune Choices Stats: 2025-09-07T09:21:16.2914694Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01142400037497282, "best_triton_pos": 1, "best_triton_time": 0.01235199999064207, "best_triton_kernel": "triton_convolution2d_66", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:21:16.2974456Z AUTOTUNE convolution(8x128x32x32, 512x128x1x1) 2025-09-07T09:21:16.2974819Z strides: [131072, 1024, 32, 1], [128, 1, 1, 1] 2025-09-07T09:21:16.2975131Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:16.2975423Z convolution 0.0114 ms 100.0% 2025-09-07T09:21:16.2976193Z triton_convolution2d_66 0.0124 ms 92.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.2977500Z triton_convolution2d_65 0.0134 ms 85.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.2978783Z triton_convolution2d_68 0.0138 ms 83.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.2980028Z triton_convolution2d_62 0.0145 ms 78.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.2981223Z triton_convolution2d_63 0.0155 ms 73.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.2982276Z triton_convolution2d_67 0.0156 ms 73.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.2983340Z triton_convolution2d_64 0.0175 ms 65.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:21:16.2984496Z conv1x1_via_mm 0.0601 ms 19.0% 2025-09-07T09:21:16.2984927Z SingleProcess AUTOTUNE benchmarking takes 0.1379 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:21:16.4040732Z Autotune Choices Stats: 2025-09-07T09:21:16.4041752Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_74", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8", "best_time": 0.018624000251293182, "best_triton_pos": 0} 2025-09-07T09:21:16.4102980Z AUTOTUNE convolution(8x256x64x64, 512x256x1x1) 2025-09-07T09:21:16.4103445Z strides: [1048576, 4096, 64, 1], [256, 1, 1, 1] 2025-09-07T09:21:16.4104068Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:16.4104886Z triton_convolution2d_74 0.0186 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.4106142Z triton_convolution2d_75 0.0230 ms 80.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.4107367Z triton_convolution2d_73 0.0236 ms 79.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.4108612Z triton_convolution2d_70 0.0246 ms 75.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.4110115Z triton_convolution2d_69 0.0249 ms 74.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.4111377Z triton_convolution2d_72 0.0257 ms 72.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.4112029Z convolution 0.0436 ms 42.8% 2025-09-07T09:21:16.4112688Z triton_convolution2d_71 0.1078 ms 17.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:21:16.4113534Z SingleProcess AUTOTUNE benchmarking takes 0.1119 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:21:16.5438818Z Autotune Choices Stats: 2025-09-07T09:21:16.5440183Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.011744000017642975, "best_triton_pos": 1, "best_triton_time": 0.018719999119639397, "best_triton_kernel": "triton_convolution2d_80", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:21:16.5501641Z AUTOTUNE convolution(8x512x32x32, 128x512x1x1) 2025-09-07T09:21:16.5501960Z strides: [524288, 1024, 32, 1], [512, 1, 1, 1] 2025-09-07T09:21:16.5502233Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:16.5502490Z convolution 0.0117 ms 100.0% 2025-09-07T09:21:16.5503176Z triton_convolution2d_80 0.0187 ms 62.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.5504520Z triton_convolution2d_81 0.0197 ms 59.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.5505954Z triton_convolution2d_79 0.0228 ms 51.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.5507102Z triton_convolution2d_82 0.0234 ms 50.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.5508263Z triton_convolution2d_76 0.0265 ms 44.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.5509386Z triton_convolution2d_77 0.0336 ms 34.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.5510520Z triton_convolution2d_78 0.0390 ms 30.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:21:16.5511221Z conv1x1_via_mm 0.0574 ms 20.5% 2025-09-07T09:21:16.5511626Z SingleProcess AUTOTUNE benchmarking takes 0.1394 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:21:16.6858363Z Autotune Choices Stats: 2025-09-07T09:21:16.6860154Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.012671999633312225, "best_triton_pos": 1, "best_triton_time": 0.01958400011062622, "best_triton_kernel": "triton_convolution2d_94", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:21:16.6920586Z AUTOTUNE convolution(8x512x32x32, 256x512x1x1) 2025-09-07T09:21:16.6921061Z strides: [524288, 1024, 32, 1], [512, 1, 1, 1] 2025-09-07T09:21:16.6921341Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:16.6921594Z convolution 0.0127 ms 100.0% 2025-09-07T09:21:16.6922278Z triton_convolution2d_94 0.0196 ms 64.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.6923429Z triton_convolution2d_93 0.0232 ms 54.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.6924889Z triton_convolution2d_96 0.0238 ms 53.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.6926022Z triton_convolution2d_95 0.0254 ms 49.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.6927139Z triton_convolution2d_90 0.0328 ms 38.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.6928257Z triton_convolution2d_91 0.0328 ms 38.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.6929383Z triton_convolution2d_92 0.0395 ms 32.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:21:16.6930295Z conv1x1_via_mm 0.0694 ms 18.3% 2025-09-07T09:21:16.6930742Z SingleProcess AUTOTUNE benchmarking takes 0.1386 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:21:16.8246528Z Autotune Choices Stats: 2025-09-07T09:21:16.8247867Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.009664000011980534, "best_triton_pos": 1, "best_triton_time": 0.013151999562978745, "best_triton_kernel": "triton_convolution2d_101", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:21:16.8309103Z AUTOTUNE convolution(8x256x16x16, 1024x256x1x1) 2025-09-07T09:21:16.8310264Z strides: [65536, 256, 16, 1], [256, 1, 1, 1] 2025-09-07T09:21:16.8310664Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:16.8310964Z convolution 0.0097 ms 100.0% 2025-09-07T09:21:16.8311809Z triton_convolution2d_101 0.0132 ms 73.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.8313074Z triton_convolution2d_102 0.0140 ms 68.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.8314841Z triton_convolution2d_100 0.0147 ms 65.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.8316077Z triton_convolution2d_103 0.0149 ms 64.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.8318089Z triton_convolution2d_97 0.0176 ms 54.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.8319340Z triton_convolution2d_98 0.0197 ms 49.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.8320636Z triton_convolution2d_99 0.0231 ms 41.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:21:16.8321441Z conv1x1_via_mm 0.0414 ms 23.4% 2025-09-07T09:21:16.8321897Z SingleProcess AUTOTUNE benchmarking takes 0.1379 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:21:16.9442121Z Autotune Choices Stats: 2025-09-07T09:21:16.9443336Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_108", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4", "best_time": 0.022175999358296394, "best_triton_pos": 0} 2025-09-07T09:21:16.9504017Z AUTOTUNE convolution(8x512x32x32, 1024x512x1x1) 2025-09-07T09:21:16.9504417Z strides: [524288, 1024, 32, 1], [512, 1, 1, 1] 2025-09-07T09:21:16.9504731Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:16.9505550Z triton_convolution2d_108 0.0222 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.9506815Z triton_convolution2d_109 0.0243 ms 91.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.9508688Z triton_convolution2d_107 0.0247 ms 89.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.9509451Z convolution 0.0308 ms 72.1% 2025-09-07T09:21:16.9510199Z triton_convolution2d_110 0.0330 ms 67.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:16.9511468Z triton_convolution2d_104 0.0339 ms 65.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.9512533Z triton_convolution2d_105 0.0363 ms 61.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:16.9513606Z triton_convolution2d_106 0.1746 ms 12.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:21:16.9514598Z SingleProcess AUTOTUNE benchmarking takes 0.1185 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:21:17.0929908Z Autotune Choices Stats: 2025-09-07T09:21:17.0931500Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.009983999654650688, "best_triton_pos": 1, "best_triton_time": 0.029472000896930695, "best_triton_kernel": "triton_convolution2d_115", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:21:17.0992235Z AUTOTUNE convolution(8x1024x16x16, 256x1024x1x1) 2025-09-07T09:21:17.0993226Z strides: [262144, 256, 16, 1], [1024, 1, 1, 1] 2025-09-07T09:21:17.0993549Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:17.0994008Z convolution 0.0100 ms 100.0% 2025-09-07T09:21:17.0994798Z triton_convolution2d_115 0.0295 ms 33.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:17.0996029Z triton_convolution2d_117 0.0361 ms 27.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:17.0997351Z triton_convolution2d_114 0.0363 ms 27.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:17.0998113Z conv1x1_via_mm 0.0382 ms 26.2% 2025-09-07T09:21:17.0998871Z triton_convolution2d_116 0.0393 ms 25.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:17.1000090Z triton_convolution2d_111 0.0545 ms 18.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:17.1001482Z triton_convolution2d_112 0.0564 ms 17.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:17.1002711Z triton_convolution2d_113 0.0728 ms 13.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:21:17.1004018Z SingleProcess AUTOTUNE benchmarking takes 0.1484 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:21:17.2318883Z Autotune Choices Stats: 2025-09-07T09:21:17.2320286Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.008799999952316284, "best_triton_pos": 1, "best_triton_time": 0.012160000391304493, "best_triton_kernel": "triton_convolution2d_122", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:21:17.2381184Z AUTOTUNE convolution(8x256x16x16, 384x256x1x1) 2025-09-07T09:21:17.2381667Z strides: [65536, 256, 16, 1], [256, 1, 1, 1] 2025-09-07T09:21:17.2381985Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:17.2382268Z convolution 0.0088 ms 100.0% 2025-09-07T09:21:17.2383067Z triton_convolution2d_122 0.0122 ms 72.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:17.2384697Z triton_convolution2d_123 0.0134 ms 65.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:17.2385934Z triton_convolution2d_121 0.0145 ms 60.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:17.2387148Z triton_convolution2d_124 0.0146 ms 60.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:17.2388731Z triton_convolution2d_118 0.0164 ms 53.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:17.2389973Z triton_convolution2d_119 0.0195 ms 45.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:17.2391249Z triton_convolution2d_120 0.0230 ms 38.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:21:17.2391908Z conv1x1_via_mm 0.0270 ms 32.6% 2025-09-07T09:21:17.2392325Z SingleProcess AUTOTUNE benchmarking takes 0.1379 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:21:17.4363852Z Autotune Choices Stats: 2025-09-07T09:21:17.4364852Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_bmm_135", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.007199999876320362, "best_triton_pos": 0} 2025-09-07T09:21:17.4428226Z AUTOTUNE bmm(32x256x16, 32x16x256) 2025-09-07T09:21:17.4428559Z strides: [4096, 1, 256], [4096, 256, 1] 2025-09-07T09:21:17.4428860Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:17.4429543Z triton_bmm_135 0.0072 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:17.4430608Z triton_bmm_130 0.0074 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:21:17.4431626Z triton_bmm_136 0.0074 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:17.4432821Z triton_bmm_131 0.0074 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:21:17.4433663Z triton_bmm_137 0.0075 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:17.4434693Z triton_bmm_129 0.0075 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:17.4435548Z triton_bmm_134 0.0075 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:17.4436406Z triton_bmm_133 0.0076 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:17.4437370Z triton_bmm_132 0.0076 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:17.4438219Z triton_bmm_138 0.0076 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:21:17.4438966Z SingleProcess AUTOTUNE benchmarking takes 0.2036 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T09:21:17.6169383Z Autotune Choices Stats: 2025-09-07T09:21:17.6170330Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_mm_142", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006496000103652477, "best_triton_pos": 0} 2025-09-07T09:21:17.6234133Z AUTOTUNE mm(8192x16, 16x31) 2025-09-07T09:21:17.6234735Z strides: [16, 1], [1, 16] 2025-09-07T09:21:17.6235015Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:17.6235687Z triton_mm_142 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:21:17.6236669Z triton_mm_141 0.0065 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:21:17.6237743Z triton_mm_144 0.0066 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:17.6238715Z triton_mm_145 0.0067 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:17.6239688Z triton_mm_146 0.0067 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:21:17.6240653Z triton_mm_150 0.0067 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:17.6241694Z triton_mm_143 0.0067 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:17.6242577Z triton_mm_147 0.0067 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:21:17.6243462Z triton_mm_148 0.0067 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:17.6244690Z triton_mm_149 0.0068 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:17.6245474Z SingleProcess AUTOTUNE benchmarking takes 0.1801 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T09:21:17.8455328Z Autotune Choices Stats: 2025-09-07T09:21:17.8456318Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_bmm_186", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.009600000455975533, "best_triton_pos": 0} 2025-09-07T09:21:17.8521173Z AUTOTUNE bmm(32x256x256, 32x256x64) 2025-09-07T09:21:17.8521534Z strides: [65536, 256, 1], [16384, 1, 256] 2025-09-07T09:21:17.8521818Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:17.8522495Z triton_bmm_186 0.0096 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:17.8523498Z triton_bmm_177 0.0098 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:17.8524771Z triton_bmm_181 0.0098 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:17.8525750Z triton_bmm_176 0.0099 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:21:17.8526718Z triton_bmm_180 0.0099 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:17.8527898Z triton_bmm_170 0.0100 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:21:17.8528896Z triton_bmm_172 0.0100 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:17.8529868Z triton_bmm_185 0.0101 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:17.8530835Z triton_bmm_171 0.0103 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:17.8531798Z triton_bmm_179 0.0103 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:17.8532597Z SingleProcess AUTOTUNE benchmarking takes 0.2263 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T09:21:17.9978261Z Autotune Choices Stats: 2025-09-07T09:21:17.9979646Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.011071999557316303, "best_triton_pos": 1, "best_triton_time": 0.029184000566601753, "best_triton_kernel": "triton_convolution2d_198", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:21:18.0043336Z AUTOTUNE convolution(8x1024x16x16, 512x1024x1x1) 2025-09-07T09:21:18.0043691Z strides: [262144, 256, 16, 1], [1024, 1, 1, 1] 2025-09-07T09:21:18.0044278Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:18.0044548Z convolution 0.0111 ms 100.0% 2025-09-07T09:21:18.0045288Z triton_convolution2d_198 0.0292 ms 37.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:18.0046788Z triton_convolution2d_197 0.0363 ms 30.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:18.0048023Z triton_convolution2d_200 0.0364 ms 30.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:18.0049250Z triton_convolution2d_199 0.0382 ms 29.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:18.0049994Z conv1x1_via_mm 0.0502 ms 22.0% 2025-09-07T09:21:18.0050718Z triton_convolution2d_194 0.0542 ms 20.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:18.0051934Z triton_convolution2d_195 0.0560 ms 19.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:18.0053078Z triton_convolution2d_196 0.0714 ms 15.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:21:18.0054142Z SingleProcess AUTOTUNE benchmarking takes 0.1492 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:21:18.1379774Z Autotune Choices Stats: 2025-09-07T09:21:18.1381376Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.010688000358641148, "best_triton_pos": 1, "best_triton_time": 0.018015999346971512, "best_triton_kernel": "triton_convolution2d_205", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:21:18.1445925Z AUTOTUNE convolution(8x512x16x16, 640x512x1x1) 2025-09-07T09:21:18.1446284Z strides: [131072, 256, 16, 1], [512, 1, 1, 1] 2025-09-07T09:21:18.1446572Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:18.1446847Z convolution 0.0107 ms 100.0% 2025-09-07T09:21:18.1447586Z triton_convolution2d_205 0.0180 ms 59.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:18.1448839Z triton_convolution2d_204 0.0218 ms 49.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:18.1450105Z triton_convolution2d_207 0.0218 ms 49.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:18.1451322Z triton_convolution2d_206 0.0220 ms 48.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:18.1452546Z triton_convolution2d_201 0.0285 ms 37.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:18.1454009Z triton_convolution2d_202 0.0317 ms 33.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:18.1455554Z triton_convolution2d_203 0.0394 ms 27.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:21:18.1456269Z conv1x1_via_mm 0.0417 ms 25.6% 2025-09-07T09:21:18.1456715Z SingleProcess AUTOTUNE benchmarking takes 0.1392 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:21:18.3830459Z Autotune Choices Stats: 2025-09-07T09:21:18.3831470Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_bmm_264", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.010143999941647053, "best_triton_pos": 0} 2025-09-07T09:21:18.3894066Z AUTOTUNE bmm(32x256x256, 32x256x128) 2025-09-07T09:21:18.3894398Z strides: [65536, 256, 1], [32768, 1, 256] 2025-09-07T09:21:18.3894702Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:18.3895411Z triton_bmm_264 0.0101 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:18.3896433Z triton_bmm_263 0.0103 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:18.3897398Z triton_bmm_259 0.0103 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:21:18.3898356Z triton_bmm_266 0.0107 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:18.3899596Z triton_bmm_262 0.0108 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:18.3900246Z bmm 0.0108 ms 93.8% 2025-09-07T09:21:18.3900832Z triton_bmm_270 0.0109 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:18.3901854Z triton_bmm_269 0.0111 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:18.3902688Z triton_bmm_265 0.0113 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:18.3903522Z triton_bmm_261 0.0113 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:18.3904409Z SingleProcess AUTOTUNE benchmarking takes 0.2380 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:21:18.5221817Z Autotune Choices Stats: 2025-09-07T09:21:18.5223203Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.010208000428974628, "best_triton_pos": 1, "best_triton_time": 0.017791999503970146, "best_triton_kernel": "triton_convolution2d_275", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:21:18.5287134Z AUTOTUNE convolution(8x512x8x8, 2048x512x1x1) 2025-09-07T09:21:18.5287545Z strides: [32768, 64, 8, 1], [512, 1, 1, 1] 2025-09-07T09:21:18.5287833Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:18.5288092Z convolution 0.0102 ms 100.0% 2025-09-07T09:21:18.5288851Z triton_convolution2d_275 0.0178 ms 57.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:18.5290387Z triton_convolution2d_274 0.0213 ms 48.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:18.5291616Z triton_convolution2d_277 0.0222 ms 45.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:18.5292813Z triton_convolution2d_276 0.0224 ms 45.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:18.5294106Z triton_convolution2d_271 0.0301 ms 33.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:18.5295257Z triton_convolution2d_273 0.0303 ms 33.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:21:18.5295944Z conv1x1_via_mm 0.0312 ms 32.7% 2025-09-07T09:21:18.5296639Z triton_convolution2d_272 0.0312 ms 32.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:18.5297541Z SingleProcess AUTOTUNE benchmarking takes 0.1389 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:21:18.6580900Z Autotune Choices Stats: 2025-09-07T09:21:18.6582504Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.027456000447273254, "best_triton_pos": 1, "best_triton_time": 0.03215999901294708, "best_triton_kernel": "triton_convolution2d_282", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:21:18.6646585Z AUTOTUNE convolution(8x1024x16x16, 2048x1024x1x1) 2025-09-07T09:21:18.6647041Z strides: [262144, 256, 16, 1], [1024, 1, 1, 1] 2025-09-07T09:21:18.6647335Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:18.6647627Z convolution 0.0275 ms 100.0% 2025-09-07T09:21:18.6648377Z triton_convolution2d_282 0.0322 ms 85.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:18.6649608Z triton_convolution2d_283 0.0391 ms 70.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:18.6650841Z triton_convolution2d_281 0.0400 ms 68.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:18.6652055Z triton_convolution2d_284 0.0568 ms 48.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:18.6653197Z triton_convolution2d_278 0.0600 ms 45.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:18.6654475Z triton_convolution2d_279 0.0674 ms 40.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:18.6655611Z triton_convolution2d_280 0.2156 ms 12.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:21:18.6656847Z SingleProcess AUTOTUNE benchmarking takes 0.1348 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:21:18.8324368Z Autotune Choices Stats: 2025-09-07T09:21:18.8325736Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.011648000217974186, "best_triton_pos": 2, "best_triton_time": 0.05084799975156784, "best_triton_kernel": "triton_convolution2d_289", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:21:18.8389785Z AUTOTUNE convolution(8x2048x8x8, 512x2048x1x1) 2025-09-07T09:21:18.8390166Z strides: [131072, 64, 8, 1], [2048, 1, 1, 1] 2025-09-07T09:21:18.8390505Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:18.8390795Z convolution 0.0116 ms 100.0% 2025-09-07T09:21:18.8391065Z conv1x1_via_mm 0.0324 ms 36.0% 2025-09-07T09:21:18.8391857Z triton_convolution2d_289 0.0508 ms 22.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:18.8393126Z triton_convolution2d_288 0.0632 ms 18.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:18.8394532Z triton_convolution2d_290 0.0653 ms 17.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:18.8396110Z triton_convolution2d_291 0.0675 ms 17.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:18.8397477Z triton_convolution2d_285 0.1014 ms 11.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:18.8398721Z triton_convolution2d_287 0.1025 ms 11.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:21:18.8399961Z triton_convolution2d_286 0.1085 ms 10.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:18.8400938Z SingleProcess AUTOTUNE benchmarking takes 0.1733 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:21:18.9714416Z Autotune Choices Stats: 2025-09-07T09:21:18.9715787Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.008511999621987343, "best_triton_pos": 1, "best_triton_time": 0.01744000054895878, "best_triton_kernel": "triton_convolution2d_296", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:21:18.9780820Z AUTOTUNE convolution(8x512x8x8, 640x512x1x1) 2025-09-07T09:21:18.9781167Z strides: [32768, 64, 8, 1], [512, 1, 1, 1] 2025-09-07T09:21:18.9781467Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:18.9781749Z convolution 0.0085 ms 100.0% 2025-09-07T09:21:18.9782516Z triton_convolution2d_296 0.0174 ms 48.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:18.9784248Z triton_convolution2d_295 0.0205 ms 41.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:18.9785521Z triton_convolution2d_297 0.0215 ms 39.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:18.9786765Z triton_convolution2d_298 0.0217 ms 39.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:21:18.9787525Z conv1x1_via_mm 0.0252 ms 33.8% 2025-09-07T09:21:18.9788273Z triton_convolution2d_292 0.0285 ms 29.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:18.9789510Z triton_convolution2d_294 0.0298 ms 28.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:21:18.9790743Z triton_convolution2d_293 0.0312 ms 27.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:21:18.9791734Z SingleProcess AUTOTUNE benchmarking takes 0.1380 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:21:19.1424940Z Autotune Choices Stats: 2025-09-07T09:21:19.1426216Z {"num_choices": 14, "num_triton_choices": 13, "best_kernel": "triton_bmm_306", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-09-07T09:21:19.1490637Z AUTOTUNE bmm(32x64x16, 32x16x64) 2025-09-07T09:21:19.1490937Z strides: [1024, 1, 64], [1024, 64, 1] 2025-09-07T09:21:19.1491216Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:19.1491993Z triton_bmm_306 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:19.1493171Z triton_bmm_301 0.0061 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:19.1494525Z triton_bmm_302 0.0061 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:19.1495575Z triton_bmm_300 0.0062 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:21:19.1496569Z triton_bmm_304 0.0062 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:21:19.1497553Z triton_bmm_309 0.0063 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:19.1498517Z triton_bmm_299 0.0063 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:21:19.1499475Z triton_bmm_310 0.0063 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:21:19.1500430Z triton_bmm_305 0.0064 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:21:19.1501776Z triton_bmm_307 0.0064 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:19.1502535Z SingleProcess AUTOTUNE benchmarking takes 0.1699 seconds and 0.0002 seconds precompiling for 14 choices 2025-09-07T09:21:19.2903183Z Autotune Choices Stats: 2025-09-07T09:21:19.2904471Z {"num_choices": 12, "num_triton_choices": 11, "best_kernel": "triton_mm_312", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-09-07T09:21:19.2969938Z AUTOTUNE mm(2048x16, 16x15) 2025-09-07T09:21:19.2970197Z strides: [16, 1], [1, 16] 2025-09-07T09:21:19.2970523Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:19.2971199Z triton_mm_312 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:21:19.2972383Z triton_mm_313 0.0061 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:21:19.2973471Z triton_mm_314 0.0062 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:21:19.2974694Z triton_mm_317 0.0062 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:19.2975680Z triton_mm_315 0.0062 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:19.2976942Z triton_mm_316 0.0062 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:21:19.2977911Z triton_mm_318 0.0066 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:19.2978875Z triton_mm_322 0.0066 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:19.2979833Z triton_mm_319 0.0068 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:19.2980802Z triton_mm_320 0.0068 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:19.2981649Z SingleProcess AUTOTUNE benchmarking takes 0.1474 seconds and 0.0002 seconds precompiling for 12 choices 2025-09-07T09:21:19.5082471Z Autotune Choices Stats: 2025-09-07T09:21:19.5083433Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_341", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.00684799998998642, "best_triton_pos": 0} 2025-09-07T09:21:19.5151219Z AUTOTUNE bmm(32x64x64, 32x64x128) 2025-09-07T09:21:19.5151511Z strides: [4096, 64, 1], [8192, 1, 64] 2025-09-07T09:21:19.5151801Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:19.5152473Z triton_bmm_341 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:21:19.5153468Z triton_bmm_337 0.0069 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:19.5155006Z triton_bmm_335 0.0069 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:21:19.5155960Z triton_bmm_348 0.0070 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:19.5156918Z triton_bmm_347 0.0071 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:19.5158035Z triton_bmm_345 0.0071 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:19.5159034Z triton_bmm_350 0.0071 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:19.5159991Z triton_bmm_342 0.0072 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:19.5160946Z triton_bmm_336 0.0072 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:19.5161963Z triton_bmm_346 0.0072 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:19.5162732Z SingleProcess AUTOTUNE benchmarking takes 0.2156 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:21:19.7696181Z Autotune Choices Stats: 2025-09-07T09:21:19.7697497Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_362", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.010847999714314938, "best_triton_pos": 0} 2025-09-07T09:21:19.7768317Z AUTOTUNE addmm(8x1000, 8x2048, 2048x1000) 2025-09-07T09:21:19.7768602Z strides: [0, 1], [2048, 1], [1, 2048] 2025-09-07T09:21:19.7768899Z dtypes: torch.float16, torch.float16, torch.float16 2025-09-07T09:21:19.7769573Z triton_mm_362 0.0108 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:21:19.7770543Z triton_mm_366 0.0115 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:19.7771156Z bias_addmm 0.0117 ms 92.4% 2025-09-07T09:21:19.7771751Z triton_mm_370 0.0141 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:19.7772355Z addmm 0.0147 ms 74.0% 2025-09-07T09:21:19.7772919Z triton_mm_374 0.0154 ms 70.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:19.7774124Z triton_mm_361 0.0172 ms 63.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:21:19.7775001Z triton_mm_360 0.0183 ms 59.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:19.7775877Z triton_mm_365 0.0184 ms 59.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:19.7776963Z triton_mm_359 0.0189 ms 57.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:21:19.7777749Z SingleProcess AUTOTUNE benchmarking takes 0.2586 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T09:21:38.2461195Z Autotune Choices Stats: 2025-09-07T09:21:38.2462297Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_404", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.006912000011652708, "best_triton_pos": 0} 2025-09-07T09:21:38.2531851Z AUTOTUNE mm(1000x8, 8x2048) 2025-09-07T09:21:38.2532161Z strides: [1, 1000], [2048, 1] 2025-09-07T09:21:38.2532425Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:38.2533146Z triton_mm_404 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:38.2534381Z triton_mm_399 0.0069 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:38.2535364Z triton_mm_401 0.0069 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:38.2536342Z triton_mm_397 0.0070 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:21:38.2537303Z triton_mm_398 0.0070 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:21:38.2538979Z triton_mm_403 0.0070 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:38.2539962Z triton_mm_402 0.0071 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:38.2540866Z triton_mm_400 0.0071 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:38.2541782Z triton_mm_405 0.0072 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:21:38.2542678Z triton_mm_396 0.0073 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:38.2543463Z SingleProcess AUTOTUNE benchmarking takes 0.1627 seconds and 0.0003 seconds precompiling for 17 choices 2025-09-07T09:21:38.9538224Z Autotune Choices Stats: 2025-09-07T09:21:38.9539824Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "mm", "best_time": 0.009631999768316746, "best_triton_pos": 1, "best_triton_time": 0.010015999898314476, "best_triton_kernel": "triton_mm_383", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T09:21:38.9606909Z AUTOTUNE mm(8x1000, 1000x2048) 2025-09-07T09:21:38.9607215Z strides: [1000, 1], [2048, 1] 2025-09-07T09:21:38.9607499Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:38.9607788Z mm 0.0096 ms 100.0% 2025-09-07T09:21:38.9608534Z triton_mm_383 0.0100 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:38.9610084Z triton_mm_387 0.0105 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:38.9611070Z triton_mm_379 0.0106 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:21:38.9612042Z triton_mm_391 0.0117 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:38.9613007Z triton_mm_377 0.0118 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:38.9614330Z triton_mm_378 0.0121 ms 79.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:21:38.9615304Z triton_mm_382 0.0123 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:38.9616264Z triton_mm_389 0.0133 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:38.9617246Z triton_mm_386 0.0134 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:38.9618104Z SingleProcess AUTOTUNE benchmarking takes 0.1864 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:21:39.1251406Z Autotune Choices Stats: 2025-09-07T09:21:39.1254330Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_415", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.0066559999249875546, "best_triton_pos": 0} 2025-09-07T09:21:39.1323581Z AUTOTUNE bmm(32x64x64, 32x64x128) 2025-09-07T09:21:39.1324207Z strides: [4096, 1, 64], [8192, 1, 64] 2025-09-07T09:21:39.1324566Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:39.1325439Z triton_bmm_415 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:21:39.1326961Z triton_bmm_416 0.0067 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:39.1328474Z triton_bmm_411 0.0068 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:39.1329929Z triton_bmm_421 0.0068 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:39.1331314Z triton_bmm_410 0.0068 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:39.1332714Z triton_bmm_422 0.0068 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:39.1334392Z triton_bmm_424 0.0069 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:39.1335817Z triton_bmm_419 0.0069 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:39.1337997Z triton_bmm_417 0.0070 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:39.1339433Z triton_bmm_420 0.0070 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:39.1340633Z SingleProcess AUTOTUNE benchmarking takes 0.1706 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:21:39.2906289Z Autotune Choices Stats: 2025-09-07T09:21:39.2907320Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_bmm_436", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.006976000033318996, "best_triton_pos": 0} 2025-09-07T09:21:39.2980043Z AUTOTUNE bmm(32x64x128, 32x128x64) 2025-09-07T09:21:39.2980547Z strides: [8192, 1, 64], [8192, 64, 1] 2025-09-07T09:21:39.2980946Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:39.2981848Z triton_bmm_436 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:39.2983207Z triton_bmm_428 0.0071 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:39.2985187Z triton_bmm_432 0.0071 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:21:39.2986578Z triton_bmm_439 0.0071 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:39.2988602Z triton_bmm_435 0.0072 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:39.2990073Z triton_bmm_427 0.0073 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:39.2991495Z triton_bmm_434 0.0073 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:39.2992322Z bmm 0.0076 ms 92.4% 2025-09-07T09:21:39.2993070Z triton_bmm_437 0.0076 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:39.2994718Z triton_bmm_433 0.0076 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:39.2995502Z SingleProcess AUTOTUNE benchmarking takes 0.1649 seconds and 0.0003 seconds precompiling for 16 choices 2025-09-07T09:21:40.0763280Z Autotune Choices Stats: 2025-09-07T09:21:40.0765042Z {"num_choices": 16, "num_triton_choices": 11, "best_kernel": "decompose_k_mm_16_split_3", "best_kernel_desc": "k_split=16", "best_time": 0.01071999967098236, "best_triton_pos": 5, "best_triton_time": 0.03206399828195572, "best_triton_kernel": "triton_mm_441", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1"} 2025-09-07T09:21:40.0846196Z AUTOTUNE mm(15x2048, 2048x16) 2025-09-07T09:21:40.0846472Z strides: [1, 15], [16, 1] 2025-09-07T09:21:40.0846709Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:40.0847011Z decompose_k_mm_16_split_3 0.0107 ms 100.0% k_split=16 2025-09-07T09:21:40.0847718Z mm 0.0109 ms 98.0% 2025-09-07T09:21:40.0847954Z decompose_k_mm_8_split_2 0.0119 ms 90.1% k_split=8 2025-09-07T09:21:40.0848263Z decompose_k_mm_4_split_1 0.0145 ms 73.8% k_split=4 2025-09-07T09:21:40.0848574Z decompose_k_mm_2_split_0 0.0187 ms 57.3% k_split=2 2025-09-07T09:21:40.0849228Z triton_mm_441 0.0321 ms 33.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1 2025-09-07T09:21:40.0850193Z triton_mm_450 0.0341 ms 31.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1 2025-09-07T09:21:40.0851154Z triton_mm_446 0.0348 ms 30.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=1 2025-09-07T09:21:40.0852118Z triton_mm_445 0.0403 ms 26.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1 2025-09-07T09:21:40.0853075Z triton_mm_448 0.0672 ms 15.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=1 2025-09-07T09:21:40.0854175Z SingleProcess AUTOTUNE benchmarking takes 0.7861 seconds and 0.0003 seconds precompiling for 16 choices 2025-09-07T09:21:40.2635240Z Autotune Choices Stats: 2025-09-07T09:21:40.2636784Z {"num_choices": 12, "num_triton_choices": 11, "best_kernel": "triton_mm_454", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-09-07T09:21:40.2705998Z AUTOTUNE mm(2048x15, 15x16) 2025-09-07T09:21:40.2706385Z strides: [15, 1], [16, 1] 2025-09-07T09:21:40.2706776Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:40.2708314Z triton_mm_454 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:40.2709687Z triton_mm_452 0.0060 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:21:40.2710662Z triton_mm_453 0.0060 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:21:40.2711619Z triton_mm_455 0.0060 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:21:40.2712582Z triton_mm_456 0.0060 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:40.2713548Z triton_mm_457 0.0060 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:40.2714700Z triton_mm_451 0.0060 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:21:40.2715670Z triton_mm_459 0.0060 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:40.2716643Z triton_mm_460 0.0060 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:21:40.2717714Z triton_mm_461 0.0060 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:40.2718867Z SingleProcess AUTOTUNE benchmarking takes 0.1771 seconds and 0.0002 seconds precompiling for 12 choices 2025-09-07T09:21:40.4045188Z Autotune Choices Stats: 2025-09-07T09:21:40.4046666Z {"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_bmm_486", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006304000038653612, "best_triton_pos": 0} 2025-09-07T09:21:40.4115671Z AUTOTUNE bmm(32x16x64, 32x64x64) 2025-09-07T09:21:40.4116056Z strides: [1024, 64, 1], [4096, 64, 1] 2025-09-07T09:21:40.4116449Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:40.4117473Z triton_bmm_486 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:40.4118931Z triton_bmm_487 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:21:40.4120296Z triton_bmm_493 0.0063 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:40.4121639Z triton_bmm_488 0.0064 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:21:40.4122983Z triton_bmm_494 0.0064 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:40.4124528Z triton_bmm_485 0.0064 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:21:40.4126354Z triton_bmm_492 0.0066 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:40.4127772Z triton_bmm_491 0.0066 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:40.4129169Z triton_bmm_495 0.0067 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:40.4130566Z triton_bmm_490 0.0068 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:21:40.4131783Z SingleProcess AUTOTUNE benchmarking takes 0.1276 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T09:21:40.5439906Z Autotune Choices Stats: 2025-09-07T09:21:40.5440840Z {"num_choices": 12, "num_triton_choices": 11, "best_kernel": "triton_bmm_498", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006271999794989824, "best_triton_pos": 0} 2025-09-07T09:21:40.5512578Z AUTOTUNE bmm(32x64x64, 32x64x16) 2025-09-07T09:21:40.5512855Z strides: [4096, 64, 1], [1024, 1, 64] 2025-09-07T09:21:40.5513108Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:40.5514140Z triton_bmm_498 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:21:40.5515093Z triton_bmm_505 0.0063 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:40.5516033Z triton_bmm_504 0.0064 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:40.5517510Z triton_bmm_499 0.0064 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:40.5518406Z triton_bmm_500 0.0066 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:40.5519304Z triton_bmm_506 0.0067 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:40.5520197Z triton_bmm_503 0.0067 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:40.5521104Z triton_bmm_502 0.0068 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:21:40.5521670Z bmm 0.0069 ms 91.2% 2025-09-07T09:21:40.5522193Z triton_bmm_497 0.0069 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:21:40.5522969Z SingleProcess AUTOTUNE benchmarking takes 0.1391 seconds and 0.0002 seconds precompiling for 12 choices 2025-09-07T09:21:40.7419564Z Autotune Choices Stats: 2025-09-07T09:21:40.7420743Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_bmm_518", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.009952000342309475, "best_triton_pos": 0} 2025-09-07T09:21:40.7493290Z AUTOTUNE bmm(32x256x256, 32x256x128) 2025-09-07T09:21:40.7494201Z strides: [65536, 1, 256], [32768, 1, 256] 2025-09-07T09:21:40.7494540Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:40.7495256Z triton_bmm_518 0.0100 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:40.7495934Z bmm 0.0100 ms 99.7% 2025-09-07T09:21:40.7496556Z triton_bmm_514 0.0100 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:21:40.7497625Z triton_bmm_519 0.0100 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:40.7498698Z triton_bmm_517 0.0102 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:40.7499758Z triton_bmm_521 0.0102 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:40.7500833Z triton_bmm_525 0.0105 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:40.7501887Z triton_bmm_516 0.0108 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:40.7502929Z triton_bmm_520 0.0108 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:40.7504179Z triton_bmm_524 0.0108 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:40.7505348Z SingleProcess AUTOTUNE benchmarking takes 0.1973 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T09:21:40.9456946Z Autotune Choices Stats: 2025-09-07T09:21:40.9458092Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_bmm_537", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.009344000369310379, "best_triton_pos": 0} 2025-09-07T09:21:40.9530924Z AUTOTUNE bmm(32x256x128, 32x128x256) 2025-09-07T09:21:40.9531268Z strides: [32768, 1, 256], [32768, 256, 1] 2025-09-07T09:21:40.9531580Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:40.9532305Z triton_bmm_537 0.0093 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:40.9533439Z triton_bmm_536 0.0095 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:40.9534773Z triton_bmm_540 0.0095 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:40.9535833Z triton_bmm_535 0.0096 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:40.9536884Z triton_bmm_539 0.0096 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:40.9538029Z triton_bmm_538 0.0097 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:40.9539503Z triton_bmm_542 0.0097 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:40.9540548Z triton_bmm_543 0.0097 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:40.9541159Z bmm 0.0098 ms 95.4% 2025-09-07T09:21:40.9541725Z triton_bmm_533 0.0098 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:21:40.9542581Z SingleProcess AUTOTUNE benchmarking takes 0.2030 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T09:21:42.1695920Z Autotune Choices Stats: 2025-09-07T09:21:42.1697936Z {"num_choices": 18, "num_triton_choices": 11, "best_kernel": "decompose_k_mm_64_split_13", "best_kernel_desc": "k_split=64", "best_time": 0.01104000024497509, "best_triton_pos": 7, "best_triton_time": 0.08259200304746628, "best_triton_kernel": "triton_mm_554", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2"} 2025-09-07T09:21:42.1785842Z AUTOTUNE mm(31x8192, 8192x16) 2025-09-07T09:21:42.1786250Z strides: [1, 31], [16, 1] 2025-09-07T09:21:42.1786605Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:42.1787049Z decompose_k_mm_64_split_13 0.0110 ms 100.0% k_split=64 2025-09-07T09:21:42.1787530Z decompose_k_mm_32_split_12 0.0122 ms 90.3% k_split=32 2025-09-07T09:21:42.1787945Z mm 0.0132 ms 83.9% 2025-09-07T09:21:42.1788312Z decompose_k_mm_16_split_11 0.0146 ms 75.5% k_split=16 2025-09-07T09:21:42.1788819Z decompose_k_mm_8_split_10 0.0194 ms 56.9% k_split=8 2025-09-07T09:21:42.1789356Z decompose_k_mm_4_split_9 0.0281 ms 39.3% k_split=4 2025-09-07T09:21:42.1790500Z decompose_k_mm_2_split_8 0.0472 ms 23.4% k_split=2 2025-09-07T09:21:42.1791530Z triton_mm_554 0.0826 ms 13.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T09:21:42.1793000Z triton_mm_548 0.0841 ms 13.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:21:42.1794660Z triton_mm_551 0.1433 ms 7.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T09:21:42.1795922Z SingleProcess AUTOTUNE benchmarking takes 1.2248 seconds and 0.0003 seconds precompiling for 18 choices 2025-09-07T09:21:42.3068360Z Autotune Choices Stats: 2025-09-07T09:21:42.3069566Z {"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_mm_557", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2", "best_time": 0.006463999859988689, "best_triton_pos": 0} 2025-09-07T09:21:42.3145311Z AUTOTUNE mm(8192x31, 31x16) 2025-09-07T09:21:42.3145665Z strides: [31, 1], [16, 1] 2025-09-07T09:21:42.3145907Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:42.3146526Z triton_mm_557 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:21:42.3147448Z triton_mm_559 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:42.3148351Z triton_mm_558 0.0065 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:21:42.3149681Z triton_mm_561 0.0065 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:21:42.3150624Z triton_mm_562 0.0065 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:42.3151516Z triton_mm_563 0.0065 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:42.3152408Z triton_mm_556 0.0067 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:21:42.3153289Z triton_mm_560 0.0067 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:21:42.3154576Z triton_mm_564 0.0067 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:42.3155516Z triton_mm_565 0.0070 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:42.3156308Z SingleProcess AUTOTUNE benchmarking takes 0.1264 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T09:21:42.4966117Z Autotune Choices Stats: 2025-09-07T09:21:42.4967677Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_594", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.008736000396311283, "best_triton_pos": 0} 2025-09-07T09:21:42.5045784Z AUTOTUNE bmm(32x16x256, 32x256x256) 2025-09-07T09:21:42.5046825Z strides: [4096, 256, 1], [65536, 256, 1] 2025-09-07T09:21:42.5047261Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:42.5048187Z triton_bmm_594 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:21:42.5049679Z triton_bmm_599 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:42.5051208Z triton_bmm_593 0.0088 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:42.5052708Z triton_bmm_598 0.0089 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:42.5054771Z triton_bmm_595 0.0089 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:21:42.5056291Z triton_bmm_607 0.0090 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:42.5057238Z bmm 0.0091 ms 96.5% 2025-09-07T09:21:42.5058130Z triton_bmm_602 0.0091 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:42.5059612Z triton_bmm_605 0.0092 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:42.5061482Z triton_bmm_592 0.0092 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:21:42.5062806Z SingleProcess AUTOTUNE benchmarking takes 0.1759 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:21:42.6673633Z Autotune Choices Stats: 2025-09-07T09:21:42.6675436Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_bmm_618", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.008767999708652496, "best_triton_pos": 0} 2025-09-07T09:21:42.6752847Z AUTOTUNE bmm(32x256x256, 32x256x16) 2025-09-07T09:21:42.6753334Z strides: [65536, 256, 1], [4096, 1, 256] 2025-09-07T09:21:42.6753983Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:42.6754953Z triton_bmm_618 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:42.6756482Z triton_bmm_612 0.0088 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:42.6758094Z triton_bmm_610 0.0090 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:21:42.6759523Z triton_bmm_611 0.0091 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:42.6760949Z triton_bmm_615 0.0091 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:42.6762428Z triton_bmm_623 0.0091 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:42.6764253Z bmm 0.0092 ms 94.8% 2025-09-07T09:21:42.6765146Z triton_bmm_622 0.0093 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:42.6766634Z triton_bmm_609 0.0094 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:21:42.6768111Z triton_bmm_617 0.0095 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:42.6769392Z SingleProcess AUTOTUNE benchmarking takes 0.1699 seconds and 0.0003 seconds precompiling for 17 choices 2025-09-07T09:21:42.8566186Z Autotune Choices Stats: 2025-09-07T09:21:42.8567120Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_bmm_632", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008991999551653862, "best_triton_pos": 0} 2025-09-07T09:21:42.8645536Z AUTOTUNE bmm(32x256x256, 32x256x64) 2025-09-07T09:21:42.8645851Z strides: [65536, 1, 256], [16384, 1, 256] 2025-09-07T09:21:42.8646097Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:42.8646654Z triton_bmm_632 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:42.8647465Z triton_bmm_636 0.0091 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:42.8648257Z triton_bmm_631 0.0093 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:21:42.8649448Z triton_bmm_635 0.0093 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:42.8649972Z bmm 0.0095 ms 94.6% 2025-09-07T09:21:42.8650444Z triton_bmm_641 0.0095 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:42.8651220Z triton_bmm_634 0.0096 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:42.8651991Z triton_bmm_626 0.0097 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:42.8652761Z triton_bmm_627 0.0097 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:21:42.8653546Z triton_bmm_640 0.0098 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:42.8654708Z SingleProcess AUTOTUNE benchmarking takes 0.1886 seconds and 0.0003 seconds precompiling for 19 choices 2025-09-07T09:21:43.0488727Z Autotune Choices Stats: 2025-09-07T09:21:43.0489683Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_bmm_649", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.008224000222980976, "best_triton_pos": 0} 2025-09-07T09:21:43.0563515Z AUTOTUNE bmm(32x256x64, 32x64x256) 2025-09-07T09:21:43.0564213Z strides: [16384, 1, 256], [16384, 256, 1] 2025-09-07T09:21:43.0564521Z dtypes: torch.float16, torch.float16 2025-09-07T09:21:43.0565584Z triton_bmm_649 0.0082 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:21:43.0566585Z triton_bmm_651 0.0083 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:43.0567583Z triton_bmm_648 0.0083 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:21:43.0568541Z triton_bmm_650 0.0083 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:21:43.0569520Z triton_bmm_653 0.0084 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:43.0570596Z triton_bmm_654 0.0084 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:21:43.0571564Z triton_bmm_652 0.0084 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:43.0572526Z triton_bmm_655 0.0084 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:43.0573488Z triton_bmm_656 0.0084 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:21:43.0574825Z triton_bmm_659 0.0085 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:21:43.0575696Z SingleProcess AUTOTUNE benchmarking takes 0.1911 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T09:21:52.5571848Z W0907 09:21:52.556000 76163 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T09:22:07.0366934Z pass 2025-09-07T09:22:12.5843313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T09:22:12.5845576Z import pynvml # type: ignore[import] 2025-09-07T09:22:15.5917505Z 2025-09-07T09:22:16.9029277Z loading model: 0it [00:00, ?it/s] 2025-09-07T09:22:16.9029691Z loading model: 0it [00:01, ?it/s] 2025-09-07T09:22:16.9029995Z cuda train eca_halonext26ts 2025-09-07T09:22:38.5139412Z Autotune Choices Stats: 2025-09-07T09:22:38.5141275Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.00800000037997961, "best_triton_pos": 1, "best_triton_time": 0.012095999903976917, "best_triton_kernel": "triton_convolution2d_122", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:22:38.5215904Z AUTOTUNE convolution(8x256x16x16, 128x256x1x1) 2025-09-07T09:22:38.5216260Z strides: [65536, 256, 16, 1], [256, 1, 1, 1] 2025-09-07T09:22:38.5216567Z dtypes: torch.float16, torch.float16 2025-09-07T09:22:38.5216854Z convolution 0.0080 ms 100.0% 2025-09-07T09:22:38.5217649Z triton_convolution2d_122 0.0121 ms 66.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:22:38.5219493Z triton_convolution2d_123 0.0123 ms 65.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:22:38.5220848Z triton_convolution2d_121 0.0145 ms 55.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:22:38.5222178Z triton_convolution2d_124 0.0145 ms 55.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:22:38.5223499Z triton_convolution2d_118 0.0154 ms 52.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:22:38.5225032Z triton_convolution2d_119 0.0199 ms 40.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:22:38.5226359Z triton_convolution2d_120 0.0232 ms 34.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:22:38.5227171Z conv1x1_via_mm 0.0252 ms 31.7% 2025-09-07T09:22:38.5227688Z SingleProcess AUTOTUNE benchmarking takes 0.1426 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:22:38.7050042Z Autotune Choices Stats: 2025-09-07T09:22:38.7051297Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_bmm_137", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.0080960001796484, "best_triton_pos": 0} 2025-09-07T09:22:38.7122494Z AUTOTUNE bmm(256x64x16, 256x16x144) 2025-09-07T09:22:38.7122778Z strides: [1024, 16, 1], [2304, 144, 1] 2025-09-07T09:22:38.7123068Z dtypes: torch.float16, torch.float16 2025-09-07T09:22:38.7123944Z triton_bmm_137 0.0081 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:22:38.7124963Z triton_bmm_139 0.0083 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:22:38.7125957Z triton_bmm_143 0.0084 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:22:38.7126937Z triton_bmm_146 0.0085 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:22:38.7127933Z triton_bmm_136 0.0085 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:22:38.7128940Z triton_bmm_145 0.0086 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:22:38.7130047Z triton_bmm_141 0.0087 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:22:38.7131015Z triton_bmm_140 0.0087 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:22:38.7131989Z triton_bmm_132 0.0088 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:22:38.7133243Z triton_bmm_142 0.0088 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:22:38.7134281Z SingleProcess AUTOTUNE benchmarking takes 0.1871 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T09:22:38.8812625Z Autotune Choices Stats: 2025-09-07T09:22:38.8813589Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_mm_147", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2", "best_time": 0.006816000211983919, "best_triton_pos": 0} 2025-09-07T09:22:38.8885006Z AUTOTUNE mm(16384x16, 16x23) 2025-09-07T09:22:38.8885257Z strides: [16, 1], [1, 16] 2025-09-07T09:22:38.8885530Z dtypes: torch.float16, torch.float16 2025-09-07T09:22:38.8886222Z triton_mm_147 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:22:38.8887221Z triton_mm_155 0.0070 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:22:38.8888182Z triton_mm_151 0.0070 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:22:38.8889180Z triton_mm_148 0.0070 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:22:38.8890552Z triton_mm_149 0.0070 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:22:38.8891523Z triton_mm_159 0.0071 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:22:38.8892487Z triton_mm_150 0.0072 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:22:38.8893435Z triton_mm_152 0.0072 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:22:38.8894548Z triton_mm_153 0.0072 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:22:38.8895504Z triton_mm_156 0.0072 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:22:38.8896334Z SingleProcess AUTOTUNE benchmarking takes 0.1758 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T09:22:39.0605633Z Autotune Choices Stats: 2025-09-07T09:22:39.0606652Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_bmm_184", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.01056000031530857, "best_triton_pos": 0} 2025-09-07T09:22:39.0678816Z AUTOTUNE bmm(256x64x144, 256x144x32) 2025-09-07T09:22:39.0679101Z strides: [9216, 144, 1], [4608, 32, 1] 2025-09-07T09:22:39.0679392Z dtypes: torch.float16, torch.float16 2025-09-07T09:22:39.0680056Z triton_bmm_184 0.0106 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:22:39.0681352Z triton_bmm_178 0.0106 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:22:39.0682346Z triton_bmm_177 0.0107 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:22:39.0683343Z triton_bmm_183 0.0108 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:22:39.0684695Z triton_bmm_188 0.0108 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:22:39.0685676Z triton_bmm_182 0.0108 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:22:39.0686668Z triton_bmm_186 0.0109 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:22:39.0687661Z triton_bmm_185 0.0110 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:22:39.0688282Z bmm 0.0110 ms 95.7% 2025-09-07T09:22:39.0688865Z triton_bmm_176 0.0113 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:22:39.0689820Z SingleProcess AUTOTUNE benchmarking takes 0.1768 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T09:22:39.1785659Z Autotune Choices Stats: 2025-09-07T09:22:39.1787077Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_207", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4", "best_time": 0.01913600042462349, "best_triton_pos": 0} 2025-09-07T09:22:39.1856791Z AUTOTUNE convolution(8x512x16x16, 128x512x1x1) 2025-09-07T09:22:39.1857100Z strides: [131072, 256, 16, 1], [512, 1, 1, 1] 2025-09-07T09:22:39.1857389Z dtypes: torch.float16, torch.float16 2025-09-07T09:22:39.1858147Z triton_convolution2d_207 0.0191 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:22:39.1858906Z convolution 0.0197 ms 96.9% 2025-09-07T09:22:39.1859744Z triton_convolution2d_206 0.0242 ms 79.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:22:39.1861175Z triton_convolution2d_203 0.0305 ms 62.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:22:39.1862441Z triton_convolution2d_208 0.0320 ms 59.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:22:39.1863681Z triton_convolution2d_209 0.0332 ms 57.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:22:39.1872891Z triton_convolution2d_204 0.0378 ms 50.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:22:39.1874246Z triton_convolution2d_205 0.1122 ms 17.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:22:39.1875293Z SingleProcess AUTOTUNE benchmarking takes 0.1121 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T09:22:39.3367481Z Autotune Choices Stats: 2025-09-07T09:22:39.3368496Z {"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_bmm_219", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.00723200011998415, "best_triton_pos": 0} 2025-09-07T09:22:39.3440904Z AUTOTUNE bmm(256x16x16, 256x16x144) 2025-09-07T09:22:39.3441222Z strides: [256, 16, 1], [2304, 144, 1] 2025-09-07T09:22:39.3441520Z dtypes: torch.float16, torch.float16 2025-09-07T09:22:39.3442198Z triton_bmm_219 0.0072 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:22:39.3443185Z triton_bmm_226 0.0072 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:22:39.3444480Z triton_bmm_222 0.0073 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:22:39.3445449Z triton_bmm_218 0.0073 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:22:39.3446470Z triton_bmm_220 0.0074 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:22:39.3447713Z triton_bmm_223 0.0074 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:22:39.3448736Z triton_bmm_225 0.0074 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:22:39.3449868Z triton_bmm_227 0.0074 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:22:39.3450838Z triton_bmm_217 0.0074 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:22:39.3451806Z triton_bmm_221 0.0074 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:22:39.3452664Z SingleProcess AUTOTUNE benchmarking takes 0.1553 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T09:22:39.5145193Z Autotune Choices Stats: 2025-09-07T09:22:39.5146216Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_mm_230", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006591999903321266, "best_triton_pos": 0} 2025-09-07T09:22:39.5218813Z AUTOTUNE mm(4096x16, 16x23) 2025-09-07T09:22:39.5219071Z strides: [16, 1], [1, 16] 2025-09-07T09:22:39.5219331Z dtypes: torch.float16, torch.float16 2025-09-07T09:22:39.5220127Z triton_mm_230 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:22:39.5221302Z triton_mm_231 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:22:39.5222572Z triton_mm_229 0.0067 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:22:39.5223563Z triton_mm_232 0.0067 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:22:39.5224966Z triton_mm_233 0.0068 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:22:39.5225956Z triton_mm_236 0.0069 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:22:39.5226945Z triton_mm_235 0.0069 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:22:39.5227944Z triton_mm_234 0.0069 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:22:39.5228925Z triton_mm_237 0.0069 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:22:39.5229917Z triton_mm_238 0.0070 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:22:39.5230649Z SingleProcess AUTOTUNE benchmarking takes 0.1774 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T09:22:39.6832588Z Autotune Choices Stats: 2025-09-07T09:22:39.6834188Z {"num_choices": 14, "num_triton_choices": 13, "best_kernel": "triton_bmm_259", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009568000212311745, "best_triton_pos": 0} 2025-09-07T09:22:39.6906841Z AUTOTUNE bmm(256x16x144, 256x144x64) 2025-09-07T09:22:39.6907148Z strides: [2304, 144, 1], [9216, 64, 1] 2025-09-07T09:22:39.6907436Z dtypes: torch.float16, torch.float16 2025-09-07T09:22:39.6908107Z triton_bmm_259 0.0096 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:22:39.6909114Z triton_bmm_269 0.0096 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:22:39.6910112Z triton_bmm_260 0.0096 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:22:39.6911137Z triton_bmm_267 0.0096 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:22:39.6912121Z triton_bmm_264 0.0098 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:22:39.6913099Z triton_bmm_266 0.0100 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:22:39.6914248Z triton_bmm_261 0.0101 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:22:39.6915237Z triton_bmm_265 0.0101 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:22:39.6916442Z triton_bmm_268 0.0101 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:22:39.6917143Z bmm 0.0102 ms 93.4% 2025-09-07T09:22:39.6917612Z SingleProcess AUTOTUNE benchmarking takes 0.1661 seconds and 0.0002 seconds precompiling for 14 choices 2025-09-07T09:22:39.8288517Z Autotune Choices Stats: 2025-09-07T09:22:39.8290040Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.008031999692320824, "best_triton_pos": 1, "best_triton_time": 0.01772800087928772, "best_triton_kernel": "triton_convolution2d_295", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T09:22:39.8363567Z AUTOTUNE convolution(8x512x8x8, 128x512x1x1) 2025-09-07T09:22:39.8364104Z strides: [32768, 64, 8, 1], [512, 1, 1, 1] 2025-09-07T09:22:39.8364407Z dtypes: torch.float16, torch.float16 2025-09-07T09:22:39.8364695Z convolution 0.0080 ms 100.0% 2025-09-07T09:22:39.8365439Z triton_convolution2d_295 0.0177 ms 45.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:22:39.8366696Z triton_convolution2d_296 0.0179 ms 45.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:22:39.8367445Z conv1x1_via_mm 0.0218 ms 36.9% 2025-09-07T09:22:39.8368196Z triton_convolution2d_294 0.0218 ms 36.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:22:39.8369707Z triton_convolution2d_297 0.0223 ms 36.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T09:22:39.8371012Z triton_convolution2d_291 0.0261 ms 30.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:22:39.8372228Z triton_convolution2d_293 0.0319 ms 25.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T09:22:39.8373482Z triton_convolution2d_292 0.0340 ms 23.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T09:22:39.8374635Z SingleProcess AUTOTUNE benchmarking takes 0.1372 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T09:22:40.0207563Z Autotune Choices Stats: 2025-09-07T09:22:40.0208568Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_bmm_307", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.007040000054985285, "best_triton_pos": 0} 2025-09-07T09:22:40.0281604Z AUTOTUNE bmm(64x64x16, 64x16x144) 2025-09-07T09:22:40.0281900Z strides: [1024, 1, 64], [2304, 144, 1] 2025-09-07T09:22:40.0282193Z dtypes: torch.float16, torch.float16 2025-09-07T09:22:40.0282876Z triton_bmm_307 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:22:40.0284076Z triton_bmm_305 0.0072 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:22:40.0285324Z triton_bmm_310 0.0073 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:22:40.0286302Z triton_bmm_311 0.0073 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:22:40.0287282Z triton_bmm_312 0.0073 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:22:40.0288264Z triton_bmm_313 0.0074 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:22:40.0289240Z triton_bmm_316 0.0074 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:22:40.0290372Z triton_bmm_317 0.0074 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:22:40.0291332Z triton_bmm_308 0.0075 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:22:40.0292288Z triton_bmm_309 0.0075 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:22:40.0293133Z SingleProcess AUTOTUNE benchmarking takes 0.1881 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T09:22:40.2136603Z Autotune Choices Stats: 2025-09-07T09:22:40.2137848Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_bmm_360", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.008191999979317188, "best_triton_pos": 0} 2025-09-07T09:22:40.2211219Z AUTOTUNE bmm(64x64x144, 64x144x64) 2025-09-07T09:22:40.2211482Z strides: [9216, 144, 1], [9216, 1, 144] 2025-09-07T09:22:40.2211762Z dtypes: torch.float16, torch.float16 2025-09-07T09:22:40.2212427Z triton_bmm_360 0.0082 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:22:40.2213424Z triton_bmm_362 0.0082 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:22:40.2214722Z triton_bmm_355 0.0082 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:22:40.2215706Z triton_bmm_350 0.0083 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:22:40.2216676Z triton_bmm_351 0.0083 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:22:40.2217651Z triton_bmm_356 0.0083 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:22:40.2218625Z triton_bmm_359 0.0083 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:22:40.2219617Z triton_bmm_358 0.0084 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:22:40.2220798Z triton_bmm_352 0.0085 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:22:40.2221406Z bmm 0.0086 ms 95.2% 2025-09-07T09:22:40.2221820Z SingleProcess AUTOTUNE benchmarking takes 0.1881 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T09:23:01.7646375Z Autotune Choices Stats: 2025-09-07T09:23:01.7647578Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_bmm_432", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.0077760000713169575, "best_triton_pos": 0} 2025-09-07T09:23:01.7727131Z AUTOTUNE bmm(64x144x64, 64x64x64) 2025-09-07T09:23:01.7727575Z strides: [9216, 1, 144], [4096, 1, 64] 2025-09-07T09:23:01.7727898Z dtypes: torch.float16, torch.float16 2025-09-07T09:23:01.7728546Z triton_bmm_432 0.0078 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:23:01.7729548Z triton_bmm_437 0.0078 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:23:01.7730529Z triton_bmm_427 0.0080 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:23:01.7731495Z triton_bmm_428 0.0080 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:01.7732903Z triton_bmm_429 0.0080 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:01.7734358Z triton_bmm_436 0.0080 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:01.7735380Z triton_bmm_430 0.0080 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:23:01.7736434Z triton_bmm_431 0.0081 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:01.7737397Z triton_bmm_433 0.0081 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:01.7738364Z triton_bmm_422 0.0083 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:23:01.7739091Z SingleProcess AUTOTUNE benchmarking takes 0.1794 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T09:23:01.9338314Z Autotune Choices Stats: 2025-09-07T09:23:01.9339350Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_442", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007840000092983246, "best_triton_pos": 0} 2025-09-07T09:23:01.9415706Z AUTOTUNE bmm(64x64x64, 64x64x144) 2025-09-07T09:23:01.9416088Z strides: [4096, 1, 64], [9216, 144, 1] 2025-09-07T09:23:01.9416383Z dtypes: torch.float16, torch.float16 2025-09-07T09:23:01.9417049Z triton_bmm_442 0.0078 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:01.9418664Z triton_bmm_454 0.0079 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:23:01.9419632Z triton_bmm_446 0.0080 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:01.9420583Z triton_bmm_447 0.0080 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:01.9421556Z triton_bmm_445 0.0081 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:23:01.9422602Z triton_bmm_449 0.0081 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:01.9423598Z triton_bmm_452 0.0081 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:23:01.9424833Z triton_bmm_448 0.0082 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:23:01.9425803Z triton_bmm_439 0.0082 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:23:01.9426764Z triton_bmm_441 0.0082 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:23:01.9427829Z SingleProcess AUTOTUNE benchmarking takes 0.1684 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:23:02.8046516Z Autotune Choices Stats: 2025-09-07T09:23:02.8048113Z {"num_choices": 17, "num_triton_choices": 11, "best_kernel": "decompose_k_mm_32_split_4", "best_kernel_desc": "k_split=32", "best_time": 0.011264000087976456, "best_triton_pos": 6, "best_triton_time": 0.04294399917125702, "best_triton_kernel": "triton_mm_464", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2"} 2025-09-07T09:23:02.8138851Z AUTOTUNE mm(23x4096, 4096x16) 2025-09-07T09:23:02.8139256Z strides: [1, 23], [16, 1] 2025-09-07T09:23:02.8139539Z dtypes: torch.float16, torch.float16 2025-09-07T09:23:02.8139891Z decompose_k_mm_32_split_4 0.0113 ms 100.0% k_split=32 2025-09-07T09:23:02.8140288Z decompose_k_mm_16_split_3 0.0129 ms 87.1% k_split=16 2025-09-07T09:23:02.8140654Z decompose_k_mm_8_split_2 0.0140 ms 80.5% k_split=8 2025-09-07T09:23:02.8141023Z mm 0.0183 ms 61.6% 2025-09-07T09:23:02.8141293Z decompose_k_mm_4_split_1 0.0186 ms 60.6% k_split=4 2025-09-07T09:23:02.8141640Z decompose_k_mm_2_split_0 0.0277 ms 40.7% k_split=2 2025-09-07T09:23:02.8142365Z triton_mm_464 0.0429 ms 26.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T09:23:02.8143434Z triton_mm_458 0.0437 ms 25.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:23:02.8144853Z triton_mm_465 0.0723 ms 15.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:23:02.8145876Z triton_mm_461 0.0731 ms 15.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T09:23:02.8147269Z SingleProcess AUTOTUNE benchmarking takes 0.8718 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T09:23:02.9397778Z Autotune Choices Stats: 2025-09-07T09:23:02.9398826Z {"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_mm_472", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.006560000125318766, "best_triton_pos": 0} 2025-09-07T09:23:02.9475729Z AUTOTUNE mm(4096x23, 23x16) 2025-09-07T09:23:02.9476009Z strides: [23, 1], [16, 1] 2025-09-07T09:23:02.9476273Z dtypes: torch.float16, torch.float16 2025-09-07T09:23:02.9476961Z triton_mm_472 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:02.9478112Z triton_mm_473 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:23:02.9478964Z triton_mm_468 0.0066 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:23:02.9479802Z triton_mm_469 0.0066 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:02.9480622Z triton_mm_471 0.0067 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:23:02.9481452Z triton_mm_467 0.0068 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:23:02.9482537Z triton_mm_470 0.0068 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:23:02.9483387Z triton_mm_466 0.0068 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:23:02.9484422Z triton_mm_474 0.0068 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:02.9485272Z triton_mm_475 0.0071 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:23:02.9486017Z SingleProcess AUTOTUNE benchmarking takes 0.1256 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T09:23:03.1185178Z Autotune Choices Stats: 2025-09-07T09:23:03.1186190Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_504", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.007104000076651573, "best_triton_pos": 0} 2025-09-07T09:23:03.1264264Z AUTOTUNE bmm(64x16x64, 64x64x144) 2025-09-07T09:23:03.1264595Z strides: [1024, 64, 1], [9216, 144, 1] 2025-09-07T09:23:03.1264897Z dtypes: torch.float16, torch.float16 2025-09-07T09:23:03.1265593Z triton_bmm_504 0.0071 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:23:03.1266581Z triton_bmm_509 0.0071 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:03.1267576Z triton_bmm_508 0.0071 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:03.1268824Z triton_bmm_505 0.0072 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:23:03.1269668Z triton_bmm_515 0.0072 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:23:03.1270494Z triton_bmm_502 0.0073 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:23:03.1271325Z triton_bmm_514 0.0073 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:03.1272166Z triton_bmm_517 0.0073 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:23:03.1273000Z triton_bmm_503 0.0073 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:03.1273990Z triton_bmm_510 0.0074 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:03.1274728Z SingleProcess AUTOTUNE benchmarking takes 0.1665 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:23:03.2444911Z Autotune Choices Stats: 2025-09-07T09:23:03.2446291Z {"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_bmm_520", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.007296000141650438, "best_triton_pos": 0} 2025-09-07T09:23:03.2522538Z AUTOTUNE bmm(64x64x144, 64x144x16) 2025-09-07T09:23:03.2522922Z strides: [9216, 144, 1], [2304, 1, 144] 2025-09-07T09:23:03.2523230Z dtypes: torch.float16, torch.float16 2025-09-07T09:23:03.2524209Z triton_bmm_520 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:23:03.2525225Z triton_bmm_525 0.0074 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:03.2526194Z triton_bmm_521 0.0075 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:03.2527175Z triton_bmm_529 0.0075 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:03.2528333Z triton_bmm_522 0.0076 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:03.2529316Z triton_bmm_528 0.0077 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:23:03.2530290Z triton_bmm_527 0.0077 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:23:03.2531258Z triton_bmm_526 0.0078 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:03.2532235Z triton_bmm_519 0.0080 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:23:03.2533134Z bmm 0.0080 ms 90.8% 2025-09-07T09:23:03.2533581Z SingleProcess AUTOTUNE benchmarking takes 0.1254 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T09:23:03.3961853Z Autotune Choices Stats: 2025-09-07T09:23:03.3962833Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_bmm_537", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008128000423312187, "best_triton_pos": 0} 2025-09-07T09:23:03.4043244Z AUTOTUNE bmm(256x144x16, 256x16x64) 2025-09-07T09:23:03.4043854Z strides: [2304, 1, 144], [1024, 64, 1] 2025-09-07T09:23:03.4044176Z dtypes: torch.float16, torch.float16 2025-09-07T09:23:03.4044884Z triton_bmm_537 0.0081 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:03.4045931Z triton_bmm_538 0.0082 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:03.4046914Z triton_bmm_535 0.0082 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:23:03.4047994Z triton_bmm_540 0.0083 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:23:03.4049069Z triton_bmm_544 0.0085 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:23:03.4050395Z triton_bmm_542 0.0085 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:23:03.4051411Z triton_bmm_541 0.0086 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:03.4052393Z triton_bmm_543 0.0086 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:23:03.4053373Z triton_bmm_530 0.0086 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:23:03.4054500Z triton_bmm_532 0.0087 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:23:03.4055350Z SingleProcess AUTOTUNE benchmarking takes 0.1515 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T09:23:03.5664362Z Autotune Choices Stats: 2025-09-07T09:23:03.5665374Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_553", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.00902399979531765, "best_triton_pos": 0} 2025-09-07T09:23:03.5743014Z AUTOTUNE bmm(256x16x64, 256x64x144) 2025-09-07T09:23:03.5743348Z strides: [1024, 64, 1], [9216, 1, 64] 2025-09-07T09:23:03.5743648Z dtypes: torch.float16, torch.float16 2025-09-07T09:23:03.5744508Z triton_bmm_553 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:03.5745560Z triton_bmm_552 0.0091 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:03.5747001Z triton_bmm_546 0.0091 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:23:03.5748015Z triton_bmm_549 0.0092 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:23:03.5748884Z triton_bmm_548 0.0093 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:23:03.5749713Z triton_bmm_547 0.0093 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:03.5750541Z triton_bmm_551 0.0093 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:23:03.5751367Z triton_bmm_558 0.0093 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:03.5752206Z triton_bmm_559 0.0093 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:23:03.5753046Z triton_bmm_556 0.0094 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:03.5753933Z SingleProcess AUTOTUNE benchmarking takes 0.1694 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:23:03.7147275Z Autotune Choices Stats: 2025-09-07T09:23:03.7148640Z {"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_bmm_612", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.007040000054985285, "best_triton_pos": 0} 2025-09-07T09:23:03.7225587Z AUTOTUNE bmm(256x16x16, 256x16x144) 2025-09-07T09:23:03.7225883Z strides: [256, 1, 16], [2304, 144, 1] 2025-09-07T09:23:03.7226176Z dtypes: torch.float16, torch.float16 2025-09-07T09:23:03.7226837Z triton_bmm_612 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:23:03.7227829Z triton_bmm_610 0.0071 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:03.7228723Z triton_bmm_613 0.0071 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:03.7229567Z triton_bmm_614 0.0072 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:03.7230402Z triton_bmm_611 0.0072 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:23:03.7231260Z triton_bmm_615 0.0072 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:23:03.7232117Z triton_bmm_617 0.0072 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:23:03.7232959Z triton_bmm_618 0.0073 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:23:03.7234374Z triton_bmm_616 0.0073 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:23:03.7235233Z triton_bmm_619 0.0073 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:23:03.7235964Z SingleProcess AUTOTUNE benchmarking takes 0.1238 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T09:23:03.8317593Z Autotune Choices Stats: 2025-09-07T09:23:03.8318672Z {"num_choices": 12, "num_triton_choices": 11, "best_kernel": "triton_bmm_622", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1", "best_time": 0.007712000049650669, "best_triton_pos": 0} 2025-09-07T09:23:03.8395270Z AUTOTUNE bmm(256x16x144, 256x144x16) 2025-09-07T09:23:03.8395590Z strides: [2304, 144, 1], [2304, 1, 144] 2025-09-07T09:23:03.8395890Z dtypes: torch.float16, torch.float16 2025-09-07T09:23:03.8396563Z triton_bmm_622 0.0077 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1 2025-09-07T09:23:03.8397687Z triton_bmm_630 0.0077 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1 2025-09-07T09:23:03.8398710Z triton_bmm_626 0.0077 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=1 2025-09-07T09:23:03.8399560Z triton_bmm_623 0.0078 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=1 2025-09-07T09:23:03.8400733Z triton_bmm_629 0.0079 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=1 2025-09-07T09:23:03.8401610Z triton_bmm_627 0.0079 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=1 2025-09-07T09:23:03.8402456Z triton_bmm_628 0.0080 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=1 2025-09-07T09:23:03.8403301Z triton_bmm_621 0.0081 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1 2025-09-07T09:23:03.8404158Z bmm 0.0087 ms 88.9% 2025-09-07T09:23:03.8404668Z triton_bmm_625 0.0091 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=1 2025-09-07T09:23:03.8405427Z SingleProcess AUTOTUNE benchmarking takes 0.1165 seconds and 0.0002 seconds precompiling for 12 choices 2025-09-07T09:23:04.0018114Z Autotune Choices Stats: 2025-09-07T09:23:04.0019392Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_641", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.009631999768316746, "best_triton_pos": 0} 2025-09-07T09:23:04.0098718Z AUTOTUNE bmm(256x144x64, 256x64x32) 2025-09-07T09:23:04.0099246Z strides: [9216, 1, 144], [2048, 32, 1] 2025-09-07T09:23:04.0099566Z dtypes: torch.float16, torch.float16 2025-09-07T09:23:04.0100276Z triton_bmm_641 0.0096 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:04.0101760Z triton_bmm_635 0.0097 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:04.0102750Z triton_bmm_642 0.0097 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:23:04.0104169Z triton_bmm_637 0.0097 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:23:04.0105253Z triton_bmm_639 0.0097 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:04.0106220Z triton_bmm_633 0.0098 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:04.0107177Z triton_bmm_638 0.0099 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T09:23:04.0108275Z triton_bmm_647 0.0099 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:23:04.0109311Z triton_bmm_640 0.0100 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:23:04.0110266Z triton_bmm_636 0.0100 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:23:04.0111108Z SingleProcess AUTOTUNE benchmarking takes 0.1698 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:23:04.1630456Z Autotune Choices Stats: 2025-09-07T09:23:04.1631516Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_bmm_654", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.008927999995648861, "best_triton_pos": 0} 2025-09-07T09:23:04.1711221Z AUTOTUNE bmm(256x64x32, 256x32x144) 2025-09-07T09:23:04.1711528Z strides: [2048, 32, 1], [4608, 1, 32] 2025-09-07T09:23:04.1711828Z dtypes: torch.float16, torch.float16 2025-09-07T09:23:04.1712527Z triton_bmm_654 0.0089 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:23:04.1713545Z triton_bmm_660 0.0090 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:04.1714929Z triton_bmm_656 0.0090 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:04.1715919Z triton_bmm_657 0.0092 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:04.1716892Z triton_bmm_663 0.0092 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:23:04.1717980Z triton_bmm_658 0.0092 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:23:04.1718927Z triton_bmm_662 0.0093 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:23:04.1720015Z triton_bmm_659 0.0094 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:23:04.1720557Z bmm 0.0094 ms 94.6% 2025-09-07T09:23:04.1721041Z triton_bmm_653 0.0095 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:23:04.1721771Z SingleProcess AUTOTUNE benchmarking takes 0.1608 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T09:23:05.0538035Z Autotune Choices Stats: 2025-09-07T09:23:05.0539743Z {"num_choices": 19, "num_triton_choices": 11, "best_kernel": "decompose_k_mm_128_split_26", "best_kernel_desc": "k_split=128", "best_time": 0.011648000217974186, "best_triton_pos": 8, "best_triton_time": 0.15807999670505524, "best_triton_kernel": "triton_mm_673", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2"} 2025-09-07T09:23:05.0638548Z AUTOTUNE mm(23x16384, 16384x16) 2025-09-07T09:23:05.0638826Z strides: [1, 23], [16, 1] 2025-09-07T09:23:05.0639086Z dtypes: torch.float16, torch.float16 2025-09-07T09:23:05.0639424Z decompose_k_mm_128_split_26 0.0116 ms 100.0% k_split=128 2025-09-07T09:23:05.0639811Z decompose_k_mm_64_split_25 0.0127 ms 91.5% k_split=64 2025-09-07T09:23:05.0640167Z decompose_k_mm_32_split_24 0.0151 ms 77.1% k_split=32 2025-09-07T09:23:05.0640473Z mm 0.0154 ms 75.8% 2025-09-07T09:23:05.0640732Z decompose_k_mm_16_split_23 0.0193 ms 60.4% k_split=16 2025-09-07T09:23:05.0641070Z decompose_k_mm_8_split_22 0.0281 ms 41.5% k_split=8 2025-09-07T09:23:05.0641400Z decompose_k_mm_4_split_21 0.0452 ms 25.7% k_split=4 2025-09-07T09:23:05.0641725Z decompose_k_mm_2_split_20 0.0831 ms 14.0% k_split=2 2025-09-07T09:23:05.0642817Z triton_mm_673 0.1581 ms 7.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T09:23:05.0644413Z triton_mm_667 0.1585 ms 7.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:23:05.0645289Z SingleProcess AUTOTUNE benchmarking takes 0.8922 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T09:23:05.1916318Z Autotune Choices Stats: 2025-09-07T09:23:05.1917327Z {"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_mm_680", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.006527999881654978, "best_triton_pos": 0} 2025-09-07T09:23:05.1998655Z AUTOTUNE mm(16384x24, 24x16) 2025-09-07T09:23:05.1998906Z strides: [24, 1], [16, 1] 2025-09-07T09:23:05.1999124Z dtypes: torch.float16, torch.float16 2025-09-07T09:23:05.1999717Z triton_mm_680 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:23:05.2000578Z triton_mm_681 0.0066 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:05.2001440Z triton_mm_682 0.0066 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:23:05.2002288Z triton_mm_684 0.0067 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T09:23:05.2003135Z triton_mm_677 0.0067 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:23:05.2004615Z triton_mm_678 0.0067 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:05.2005516Z triton_mm_686 0.0067 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T09:23:05.2006362Z triton_mm_676 0.0068 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:23:05.2007206Z triton_mm_685 0.0068 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T09:23:05.2008054Z triton_mm_675 0.0069 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T09:23:05.2008804Z SingleProcess AUTOTUNE benchmarking takes 0.1272 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T09:23:05.3744324Z Autotune Choices Stats: 2025-09-07T09:23:05.3745239Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_714", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.009247999638319016, "best_triton_pos": 0} 2025-09-07T09:23:05.3826840Z AUTOTUNE bmm(256x16x64, 256x64x144) 2025-09-07T09:23:05.3827200Z strides: [1024, 1, 16], [9216, 144, 1] 2025-09-07T09:23:05.3827508Z dtypes: torch.float16, torch.float16 2025-09-07T09:23:05.3828199Z triton_bmm_714 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:23:05.3829502Z triton_bmm_716 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T09:23:05.3830362Z triton_bmm_724 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:23:05.3831199Z triton_bmm_713 0.0093 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:23:05.3832041Z triton_bmm_717 0.0093 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:05.3832893Z triton_bmm_723 0.0093 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:05.3834055Z triton_bmm_711 0.0093 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:23:05.3834928Z triton_bmm_712 0.0093 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:05.3835806Z triton_bmm_718 0.0093 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:05.3836646Z triton_bmm_719 0.0094 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:05.3837483Z SingleProcess AUTOTUNE benchmarking takes 0.1695 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T09:23:05.4738138Z Autotune Choices Stats: 2025-09-07T09:23:05.4739302Z {"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_bmm_734", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.009696000255644321, "best_triton_pos": 0} 2025-09-07T09:23:05.4814188Z AUTOTUNE bmm(256x64x144, 256x144x16) 2025-09-07T09:23:05.4814500Z strides: [9216, 144, 1], [2304, 1, 144] 2025-09-07T09:23:05.4814778Z dtypes: torch.float16, torch.float16 2025-09-07T09:23:05.4815426Z triton_bmm_734 0.0097 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:05.4816418Z triton_bmm_737 0.0097 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:23:05.4817411Z triton_bmm_738 0.0097 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:05.4818383Z triton_bmm_731 0.0098 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:05.4819344Z triton_bmm_729 0.0098 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T09:23:05.4820273Z triton_bmm_730 0.0098 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T09:23:05.4821182Z triton_bmm_736 0.0098 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T09:23:05.4822368Z triton_bmm_735 0.0099 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T09:23:05.4823276Z triton_bmm_728 0.0101 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=16, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T09:23:05.4824018Z bmm 0.0102 ms 95.3% 2025-09-07T09:23:05.4824434Z SingleProcess AUTOTUNE benchmarking takes 0.0983 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T09:23:12.9716738Z skipping cudagraphs due to disabling cudagraphs due to incompatible op aten.index_put_.default Found from File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 442, in torch_dynamo_resume_in_forward_and_backward_pass_at_440 2025-09-07T09:23:12.9717917Z pred = mod(*cloned_inputs) 2025-09-07T09:23:12.9718468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/byobnet.py", line 1433, in forward 2025-09-07T09:23:12.9719008Z x = self.forward_features(x) 2025-09-07T09:23:12.9719536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/byobnet.py", line 1425, in forward_features 2025-09-07T09:23:12.9720103Z x = self.stages(x) 2025-09-07T09:23:12.9720578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/models/byobnet.py", line 894, in forward 2025-09-07T09:23:12.9721126Z x = self.self_attn(x) 2025-09-07T09:23:12.9721673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/timm/layers/halo_attn.py", line 191, in forward 2025-09-07T09:23:12.9722341Z kv = kv.unfold(2, self.win_size, self.block_size).unfold(3, self.win_size, self.block_size).reshape( 2025-09-07T09:23:12.9722689Z 2025-09-07T09:23:12.9722693Z 2025-09-07T09:23:15.5046437Z W0907 09:23:15.503000 86115 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T09:23:30.8152620Z pass 2025-09-07T09:23:35.2990445Z accuracy pass_rate=100.00% 2025-09-07T09:23:35.2992455Z calls_captured gmean=1214.80x mean=1314.000x 2025-09-07T09:23:35.2996364Z unique_graphs gmean=2.85x mean=2.875x 2025-09-07T09:23:35.2999820Z graph_breaks gmean=6.87x mean=6.875x 2025-09-07T09:23:35.3003228Z unique_graph_breaks gmean=5.00x mean=5.000x 2025-09-07T09:23:35.3007002Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T09:23:35.3010430Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T09:23:35.3013962Z cudagraph_skips gmean=0.00x mean=0.125x 2025-09-07T09:23:35.3015417Z compilation_latency mean=85.151 seconds 2025-09-07T09:23:36.3496166Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *cudagraphs_low_precision-true* ]] 2025-09-07T09:23:36.3497438Z + [[ training == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T09:23:36.3497707Z + for target in "${targets[@]}" 2025-09-07T09:23:36.3497941Z + target_flag=('--performance') 2025-09-07T09:23:36.3498173Z + local target_flag 2025-09-07T09:23:36.3498389Z + [[ performance == \p\e\r\f\o\r\m\a\n\c\e ]] 2025-09-07T09:23:36.3498669Z + target_flag+=(--cold-start-latency) 2025-09-07T09:23:36.3499742Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *freezing-true* ]] 2025-09-07T09:23:36.3501601Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *default-true* ]] 2025-09-07T09:23:36.3504622Z + python benchmarks/dynamo/timm_models.py --performance --cold-start-latency --training --amp --backend inductor --disable-cudagraphs --device cuda --total-partitions 7 --partition-id 1 --output /var/lib/jenkins/workspace/test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_performance.csv 2025-09-07T09:23:37.3135522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T09:23:37.3136700Z import pynvml # type: ignore[import] 2025-09-07T09:23:41.7671846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T09:23:41.7672907Z import pynvml # type: ignore[import] 2025-09-07T09:23:44.7576254Z 2025-09-07T09:23:46.2350358Z loading model: 0it [00:00, ?it/s] 2025-09-07T09:23:46.2350723Z loading model: 0it [00:01, ?it/s] 2025-09-07T09:23:46.2351017Z cuda train crossvit_9_240 2025-09-07T09:24:28.7140585Z W0907 09:24:28.713000 94097 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T09:25:07.5826258Z 2025-09-07T09:25:07.7066999Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T09:26:41.0598460Z 2025-09-07T09:26:41.1756508Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T09:27:37.5156225Z 2025-09-07T09:27:37.6724307Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T09:29:44.9698167Z 2025-09-07T09:29:45.1489015Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T09:31:17.9586599Z 2025-09-07T09:31:18.1176319Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T09:34:09.8074495Z 2025-09-07T09:34:09.9794903Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T09:35:37.2137337Z 2025-09-07T09:35:37.3835325Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T09:37:28.3836051Z 2025-09-07T09:37:28.6005055Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T09:39:07.1736293Z 2025-09-07T09:39:07.3443317Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T09:40:05.7887399Z 2025-09-07T09:40:05.9448126Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T09:42:19.5008812Z 2025-09-07T09:42:19.7797115Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T09:43:57.9591051Z 2025-09-07T09:43:58.2007251Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T09:47:02.4167467Z 2025-09-07T09:47:02.6253991Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T09:48:34.8675416Z ERROR:common:Backend dynamo failed in warmup() 2025-09-07T09:48:34.8675925Z Traceback (most recent call last): 2025-09-07T09:48:34.8676398Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2648, in warmup 2025-09-07T09:48:34.8676865Z fn(model, example_inputs) 2025-09-07T09:48:34.8677506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper 2025-09-07T09:48:34.8678058Z return fn(*args, **kwargs) 2025-09-07T09:48:34.8678578Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 439, in forward_and_backward_pass 2025-09-07T09:48:34.8679127Z cloned_inputs = clone_inputs(inputs) 2025-09-07T09:48:34.8679777Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 440, in torch_dynamo_resume_in_forward_and_backward_pass_at_439 2025-09-07T09:48:34.8680428Z self.optimizer_zero_grad(mod) 2025-09-07T09:48:34.8681071Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 440, in torch_dynamo_resume_in_forward_and_backward_pass_at_440 2025-09-07T09:48:34.8681801Z self.optimizer_zero_grad(mod) 2025-09-07T09:48:34.8693029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T09:48:34.8693565Z return fn(*args, **kwargs) 2025-09-07T09:48:34.8694249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1130, in forward 2025-09-07T09:48:34.8694729Z return compiled_fn(full_args) 2025-09-07T09:48:34.8695264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 339, in runtime_wrapper 2025-09-07T09:48:34.8695794Z all_outs = call_func_at_runtime_with_args( 2025-09-07T09:48:34.8696343Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args 2025-09-07T09:48:34.8697233Z out = normalize_as_list(f(args)) 2025-09-07T09:48:34.8697684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 103, in g 2025-09-07T09:48:34.8698111Z return f(*args) 2025-09-07T09:48:34.8698508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/function.py", line 581, in apply 2025-09-07T09:48:34.8698976Z return super().apply(*args, **kwargs) # type: ignore[misc] 2025-09-07T09:48:34.8699529Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2118, in forward 2025-09-07T09:48:34.8700051Z fw_outs = call_func_at_runtime_with_args( 2025-09-07T09:48:34.8700580Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args 2025-09-07T09:48:34.8701088Z out = normalize_as_list(f(args)) 2025-09-07T09:48:34.8701570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper 2025-09-07T09:48:34.8702072Z return compiled_fn(runtime_args) 2025-09-07T09:48:34.8702550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 724, in inner_fn 2025-09-07T09:48:34.8703029Z outs = compiled_fn(args) 2025-09-07T09:48:34.8703427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 613, in __call__ 2025-09-07T09:48:34.8704036Z return self.current_callable(inputs) 2025-09-07T09:48:34.8704499Z File "/tmp/tmpf5fgvepo/xd/cxdh3lvla76sfmajufxz4tlwxsixyu4w4mujbudlca3ppc7dnxeb.py", line 9409, in call 2025-09-07T09:48:34.8710282Z (buf0, buf1, buf2, buf6, buf9, buf11, buf12, buf13, buf17, buf20, buf22, buf23, buf24, buf28, buf31, buf33, buf34, buf35, buf36, buf37, buf41, buf44, buf46, buf47, buf48, buf52, buf55, buf58, buf59, buf60, buf61, buf62, buf63, buf67, buf70, buf71, buf72, buf76, buf79, buf81, buf82, buf83, buf87, buf90, buf92, buf93, buf94, buf98, buf101, buf104, buf105, buf106, buf107, buf108, buf109, buf113, buf116, buf118, buf394, buf119, buf120, buf124, buf127, buf129, buf130, buf131, buf135, buf138, buf141, buf142, buf143, buf144, buf145, buf146, buf147, buf150, buf151, buf152, buf153, buf156, buf158, buf159, buf160, buf164, buf167, buf169, buf170, buf171, buf175, buf178, buf181, buf182, buf183, buf184, buf185, buf186, buf187, buf190, buf192, buf393, buf193, buf194, buf198, buf201, buf203, buf204, buf205, buf209, buf212, buf215, buf216, buf217, buf218, buf219, buf220, buf221, buf224, buf225, buf226, buf227, buf230, buf232, buf233, buf234, buf238, buf241, buf243, buf244, buf246, buf248, buf251, buf249, buf250, buf255, buf256, buf260, buf261, buf262, buf263, buf264, buf265, buf272, buf275, buf277, buf278, buf279, buf280, buf283, buf285, buf390, buf286, buf287, buf288, buf291, buf293, buf294, buf296, buf298, buf302, buf299, buf300, buf301, buf304, buf305, buf306, buf308, buf309, buf310, buf311, buf312, buf313, buf317, buf320, buf322, buf323, buf324, buf325, buf328, buf329, buf330, buf331, buf334, buf336, buf337, buf338, buf339, buf342, buf344, buf345, buf347, buf389, buf351, buf352, buf353, buf355, buf356, buf357, buf359, buf360, buf361, buf362, buf387, buf364, buf368, buf371, buf373, buf374, buf375, buf376, buf379, buf386, buf382, buf383, buf385, buf388, buf391, buf392, primals_6, primals_7, primals_12, primals_13, primals_18, primals_19, primals_24, primals_25, primals_30, primals_31, primals_37, primals_38, primals_43, primals_44, primals_49, primals_50, primals_55, primals_56, primals_62, primals_68, primals_69, primals_74, primals_75, primals_81, primals_82, primals_87, primals_88, primals_93, primals_94, primals_99, primals_100, primals_106, primals_112, primals_113, primals_118, primals_119, primals_125, primals_126, primals_131, primals_132, primals_137, primals_138, primals_146, primals_147, primals_152, primals_158, primals_159, primals_167, primals_168, primals_173, primals_174, primals_179, primals_180, primals_185, primals_186, primals_194, primals_195, primals_200) = self.partitions[0](partition0_args) 2025-09-07T09:48:34.8716340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1772, in run 2025-09-07T09:48:34.8716818Z return compiled_fn(new_inputs) # type: ignore[arg-type] 2025-09-07T09:48:34.8717428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 388, in deferred_cudagraphify 2025-09-07T09:48:34.8717902Z return fn(inputs) 2025-09-07T09:48:34.8718268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2958, in run 2025-09-07T09:48:34.8718661Z out = model(new_inputs) 2025-09-07T09:48:34.8719063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 2012, in run 2025-09-07T09:48:34.8719499Z out = self._run(new_inputs, function_id) 2025-09-07T09:48:34.8719950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 2182, in _run 2025-09-07T09:48:34.8720429Z return self.record_function(new_inputs, function_id) 2025-09-07T09:48:34.8720938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 2238, in record_function 2025-09-07T09:48:34.8721409Z torch.cuda.synchronize() 2025-09-07T09:48:34.8721800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 1083, in synchronize 2025-09-07T09:48:34.8722236Z return torch._C._cuda_synchronize() 2025-09-07T09:48:34.8722568Z torch.AcceleratorError: CUDA error: an illegal memory access was encountered 2025-09-07T09:48:34.8723195Z Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-09-07T09:48:34.8724233Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-09-07T09:48:34.8724734Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-09-07T09:48:34.8725097Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-09-07T09:48:34.8725333Z 2025-09-07T09:48:34.8725402Z warmup_failed 2025-09-07T09:48:39.3790498Z Run failed with return code: 255 2025-09-07T09:48:39.3790891Z Output: None 2025-09-07T09:48:39.3791107Z Error: None 2025-09-07T09:48:39.3858230Z speedup gmean=0.00x mean=1.685x 2025-09-07T09:48:39.3861661Z abs_latency gmean=0.00x mean=32.570x 2025-09-07T09:48:39.3863392Z compilation_latency mean=70.602 seconds 2025-09-07T09:48:39.3864275Z compression_ratio mean=0.988x 2025-09-07T09:48:39.3867394Z eager_peak_mem gmean=0.00x mean=7.224x 2025-09-07T09:48:39.3870982Z dynamo_peak_mem gmean=0.00x mean=6.458x 2025-09-07T09:48:39.3874912Z calls_captured gmean=0.00x mean=1203.250x 2025-09-07T09:48:39.3878482Z unique_graphs gmean=0.00x mean=2.500x 2025-09-07T09:48:39.3881823Z graph_breaks gmean=0.00x mean=6.875x 2025-09-07T09:48:39.3885756Z unique_graph_breaks gmean=0.00x mean=5.250x 2025-09-07T09:48:39.3888977Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T09:48:39.3892274Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T09:48:39.3895907Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T09:48:41.9529640Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *dynamic-true* ]] 2025-09-07T09:48:41.9531888Z + python benchmarks/dynamo/timm_models.py --performance --cold-start-latency --training --amp --backend inductor --dynamic-shapes --dynamic-batch-only --device cuda --total-partitions 7 --partition-id 1 --output /var/lib/jenkins/workspace/test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_performance.csv 2025-09-07T09:48:42.9868712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T09:48:42.9870712Z import pynvml # type: ignore[import] 2025-09-07T09:48:47.3765387Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T09:48:47.3766806Z import pynvml # type: ignore[import] 2025-09-07T09:48:50.3500619Z 2025-09-07T09:48:53.9004604Z loading model: 0it [00:00, ?it/s] 2025-09-07T09:48:53.9004977Z loading model: 0it [00:03, ?it/s] 2025-09-07T09:48:53.9005306Z cuda train crossvit_9_240 2025-09-07T09:49:51.8639074Z W0907 09:49:51.862000 163043 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T09:50:32.9551978Z 2025-09-07T09:50:33.2227540Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T09:52:10.1320717Z 2025-09-07T09:52:10.4221734Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T09:53:09.1618208Z 2025-09-07T09:53:09.3405367Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T09:55:22.4350306Z 2025-09-07T09:55:22.9083079Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T09:56:59.9999533Z 2025-09-07T09:57:00.4291155Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T09:59:56.4634062Z 2025-09-07T09:59:56.7974446Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T10:01:23.1088755Z ERROR:common:Backend dynamo failed in warmup() 2025-09-07T10:01:23.1089431Z Traceback (most recent call last): 2025-09-07T10:01:23.1090306Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2648, in warmup 2025-09-07T10:01:23.1090844Z fn(model, example_inputs) 2025-09-07T10:01:23.1091412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper 2025-09-07T10:01:23.1091952Z return fn(*args, **kwargs) 2025-09-07T10:01:23.1092461Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 439, in forward_and_backward_pass 2025-09-07T10:01:23.1093001Z cloned_inputs = clone_inputs(inputs) 2025-09-07T10:01:23.1093626Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 440, in torch_dynamo_resume_in_forward_and_backward_pass_at_439 2025-09-07T10:01:23.1094757Z self.optimizer_zero_grad(mod) 2025-09-07T10:01:23.1095386Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/timm_models.py", line 440, in torch_dynamo_resume_in_forward_and_backward_pass_at_440 2025-09-07T10:01:23.1096013Z self.optimizer_zero_grad(mod) 2025-09-07T10:01:23.1096503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T10:01:23.1097440Z return fn(*args, **kwargs) 2025-09-07T10:01:23.1097956Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1130, in forward 2025-09-07T10:01:23.1098488Z return compiled_fn(full_args) 2025-09-07T10:01:23.1099083Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 339, in runtime_wrapper 2025-09-07T10:01:23.1099727Z all_outs = call_func_at_runtime_with_args( 2025-09-07T10:01:23.1100355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args 2025-09-07T10:01:23.1100878Z out = normalize_as_list(f(args)) 2025-09-07T10:01:23.1101313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 103, in g 2025-09-07T10:01:23.1101742Z return f(*args) 2025-09-07T10:01:23.1102124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/function.py", line 581, in apply 2025-09-07T10:01:23.1102602Z return super().apply(*args, **kwargs) # type: ignore[misc] 2025-09-07T10:01:23.1103147Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2118, in forward 2025-09-07T10:01:23.1103652Z fw_outs = call_func_at_runtime_with_args( 2025-09-07T10:01:23.1104330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args 2025-09-07T10:01:23.1104874Z out = normalize_as_list(f(args)) 2025-09-07T10:01:23.1105361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper 2025-09-07T10:01:23.1105854Z return compiled_fn(runtime_args) 2025-09-07T10:01:23.1106342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 724, in inner_fn 2025-09-07T10:01:23.1107056Z outs = compiled_fn(args) 2025-09-07T10:01:23.1107463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 613, in __call__ 2025-09-07T10:01:23.1107907Z return self.current_callable(inputs) 2025-09-07T10:01:23.1108369Z File "/tmp/tmpn2k7t5l9/ee/ceeniv7prnou2yllbdposw6mgygbct733pwnyqrn4l6hwmqurtfr.py", line 9260, in call 2025-09-07T10:01:23.1114326Z (buf0, buf1, buf2, buf6, buf9, buf11, buf12, buf13, buf17, buf20, buf22, buf23, buf24, buf28, buf31, buf33, buf34, buf35, buf36, buf37, buf41, buf44, buf46, buf47, buf48, buf52, buf55, buf58, buf59, buf60, buf61, buf62, buf63, buf67, buf70, buf71, buf72, buf76, buf79, buf81, buf82, buf83, buf87, buf90, buf92, buf93, buf94, buf98, buf101, buf104, buf105, buf106, buf107, buf108, buf109, buf113, buf116, buf118, buf390, buf119, buf120, buf124, buf127, buf129, buf130, buf131, buf135, buf138, buf141, buf142, buf143, buf144, buf145, buf146, buf147, buf150, buf151, buf152, buf153, buf156, buf158, buf159, buf160, buf164, buf167, buf169, buf170, buf171, buf175, buf178, buf181, buf182, buf183, buf184, buf185, buf186, buf187, buf190, buf192, buf389, buf193, buf194, buf198, buf201, buf203, buf204, buf205, buf209, buf212, buf215, buf216, buf217, buf218, buf219, buf220, buf221, buf224, buf225, buf226, buf227, buf230, buf232, buf233, buf234, buf238, buf241, buf243, buf244, buf246, buf248, buf252, buf249, buf250, buf251, buf254, buf255, buf256, buf258, buf259, buf260, buf261, buf262, buf263, buf270, buf273, buf275, buf276, buf277, buf278, buf281, buf283, buf388, buf284, buf285, buf286, buf289, buf291, buf292, buf294, buf296, buf300, buf297, buf298, buf299, buf302, buf303, buf304, buf306, buf307, buf308, buf309, buf310, buf311, buf315, buf318, buf320, buf321, buf322, buf323, buf326, buf327, buf328, buf329, buf332, buf334, buf335, buf336, buf337, buf340, buf342, buf343, buf345, buf387, buf349, buf350, buf351, buf353, buf354, buf355, buf357, buf358, buf359, buf360, buf385, buf362, buf366, buf369, buf371, buf372, buf373, buf374, buf377, buf384, buf380, buf381, buf383, buf386, primals_6, primals_7, primals_12, primals_13, primals_18, primals_19, primals_24, primals_25, primals_30, primals_31, primals_37, primals_38, primals_43, primals_44, primals_49, primals_50, primals_55, primals_56, primals_62, primals_68, primals_69, primals_74, primals_75, primals_81, primals_82, primals_87, primals_88, primals_93, primals_94, primals_99, primals_100, primals_106, primals_112, primals_113, primals_118, primals_119, primals_125, primals_126, primals_131, primals_132, primals_137, primals_138, primals_146, primals_147, primals_152, primals_158, primals_159, primals_167, primals_168, primals_173, primals_174, primals_179, primals_180, primals_185, primals_186, primals_194, primals_195, primals_200) = self.partitions[0](partition0_args) 2025-09-07T10:01:23.1120257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1772, in run 2025-09-07T10:01:23.1120734Z return compiled_fn(new_inputs) # type: ignore[arg-type] 2025-09-07T10:01:23.1121268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 388, in deferred_cudagraphify 2025-09-07T10:01:23.1121748Z return fn(inputs) 2025-09-07T10:01:23.1122103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2958, in run 2025-09-07T10:01:23.1122496Z out = model(new_inputs) 2025-09-07T10:01:23.1122889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 2012, in run 2025-09-07T10:01:23.1123330Z out = self._run(new_inputs, function_id) 2025-09-07T10:01:23.1123891Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 2182, in _run 2025-09-07T10:01:23.1124362Z return self.record_function(new_inputs, function_id) 2025-09-07T10:01:23.1124862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 2238, in record_function 2025-09-07T10:01:23.1125526Z torch.cuda.synchronize() 2025-09-07T10:01:23.1125922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 1083, in synchronize 2025-09-07T10:01:23.1126342Z return torch._C._cuda_synchronize() 2025-09-07T10:01:23.1126671Z torch.AcceleratorError: CUDA error: an illegal memory access was encountered 2025-09-07T10:01:23.1127271Z Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-09-07T10:01:23.1127981Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-09-07T10:01:23.1128457Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-09-07T10:01:23.1128810Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-09-07T10:01:23.1129033Z 2025-09-07T10:01:23.1129103Z warmup_failed 2025-09-07T10:01:27.0750226Z Run failed with return code: 255 2025-09-07T10:01:27.0750715Z Output: None 2025-09-07T10:01:27.0751036Z Error: None 2025-09-07T10:01:27.0818755Z speedup gmean=0.00x mean=1.768x 2025-09-07T10:01:27.0822444Z abs_latency gmean=0.00x mean=32.611x 2025-09-07T10:01:27.0824134Z compilation_latency mean=70.797 seconds 2025-09-07T10:01:27.0824552Z compression_ratio mean=0.988x 2025-09-07T10:01:27.0828602Z eager_peak_mem gmean=0.00x mean=7.224x 2025-09-07T10:01:27.0832353Z dynamo_peak_mem gmean=0.00x mean=6.458x 2025-09-07T10:01:27.0836263Z calls_captured gmean=0.00x mean=1203.250x 2025-09-07T10:01:27.0839864Z unique_graphs gmean=0.00x mean=2.500x 2025-09-07T10:01:27.0843341Z graph_breaks gmean=0.00x mean=6.875x 2025-09-07T10:01:27.0847067Z unique_graph_breaks gmean=0.00x mean=5.250x 2025-09-07T10:01:27.0850490Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T10:01:27.0853917Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T10:01:27.0858028Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T10:01:29.7902946Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *cppwrapper-true* ]] 2025-09-07T10:01:29.7904619Z + TORCHINDUCTOR_CPP_WRAPPER=1 2025-09-07T10:01:29.7905905Z + python benchmarks/dynamo/timm_models.py --performance --cold-start-latency --training --amp --backend inductor --disable-cudagraphs --device cuda --total-partitions 7 --partition-id 1 --output /var/lib/jenkins/workspace/test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_performance.csv 2025-09-07T10:01:30.8166191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:01:30.8167550Z import pynvml # type: ignore[import] 2025-09-07T10:01:35.1045922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:01:35.1047154Z import pynvml # type: ignore[import] 2025-09-07T10:01:38.0925103Z 2025-09-07T10:01:39.0396479Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:01:39.0396858Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:01:39.0397228Z cuda train crossvit_9_240 2025-09-07T10:03:17.3580947Z W0907 10:03:17.357000 198433 site-packages/torch/_logging/_internal.py:1199] [6/0] Profiler function will be ignored 2025-09-07T10:04:19.5472189Z 2025-09-07T10:04:19.6870443Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T10:06:55.2783263Z 2025-09-07T10:06:55.4009013Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T10:08:22.4841762Z 2025-09-07T10:08:22.6007714Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T10:11:58.1805028Z 2025-09-07T10:11:58.3774852Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T10:14:23.0887255Z 2025-09-07T10:14:23.2531681Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T10:18:49.5542290Z 2025-09-07T10:18:49.7257161Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T10:20:42.8453682Z 2025-09-07T10:20:43.0176971Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T10:23:32.7642652Z 2025-09-07T10:23:32.9799821Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T10:25:32.4226038Z 2025-09-07T10:25:32.5932429Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T10:26:52.8359380Z 2025-09-07T10:26:52.9883324Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T10:29:34.5245367Z 2025-09-07T10:29:34.7972718Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T10:31:35.8626980Z 2025-09-07T10:31:36.1029100Z running benchmark: 0% 0/30 [00:00 will be ignored 2025-09-07T10:35:36.5478807Z 2025-09-07T10:35:36.7596420Z running benchmark: 0% 0/30 [00:00 2025-09-07T10:36:25.0769729Z launcher: self.bench(launcher, *args, **kwargs) 2025-09-07T10:36:25.0770228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 883, in bench 2025-09-07T10:36:25.0770732Z return benchmarker.benchmark_gpu(kernel_call, rep=40) 2025-09-07T10:36:25.0771223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/benchmarking.py", line 39, in wrapper 2025-09-07T10:36:25.0771671Z return fn(self, *args, **kwargs) 2025-09-07T10:36:25.0772135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/benchmarking.py", line 251, in benchmark_gpu 2025-09-07T10:36:25.0772621Z torch.cuda.synchronize() 2025-09-07T10:36:25.0773020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 1083, in synchronize 2025-09-07T10:36:25.0773447Z return torch._C._cuda_synchronize() 2025-09-07T10:36:25.0773910Z torch.AcceleratorError: CUDA error: an illegal memory access was encountered 2025-09-07T10:36:25.0774532Z Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-09-07T10:36:25.0775233Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-09-07T10:36:25.0775720Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-09-07T10:36:25.0776055Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-09-07T10:36:25.0776276Z 2025-09-07T10:36:25.0776341Z warmup_failed 2025-09-07T10:36:28.4642782Z Run failed with return code: 255 2025-09-07T10:36:28.4643400Z Output: None 2025-09-07T10:36:28.4644131Z Error: None 2025-09-07T10:36:28.4702115Z speedup gmean=0.00x mean=1.726x 2025-09-07T10:36:28.4706122Z abs_latency gmean=0.00x mean=31.805x 2025-09-07T10:36:28.4706850Z compilation_latency mean=96.589 seconds 2025-09-07T10:36:28.4707830Z compression_ratio mean=0.985x 2025-09-07T10:36:28.4711784Z eager_peak_mem gmean=0.00x mean=7.224x 2025-09-07T10:36:28.4715699Z dynamo_peak_mem gmean=0.00x mean=6.481x 2025-09-07T10:36:28.4719136Z calls_captured gmean=0.00x mean=1203.250x 2025-09-07T10:36:28.4722551Z unique_graphs gmean=0.00x mean=2.500x 2025-09-07T10:36:28.4726632Z graph_breaks gmean=0.00x mean=6.875x 2025-09-07T10:36:28.4729934Z unique_graph_breaks gmean=0.00x mean=5.250x 2025-09-07T10:36:28.4733232Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T10:36:28.4737130Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T10:36:28.4740290Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T10:36:33.9130713Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *cudagraphs_low_precision-true* ]] 2025-09-07T10:36:33.9131946Z + [[ training == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T10:36:33.9132199Z + for mode in "${modes[@]}" 2025-09-07T10:36:33.9132482Z + [[ inference == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T10:36:33.9132725Z + [[ cuda_h100 == \c\p\u\_\x\8\6 ]] 2025-09-07T10:36:33.9132958Z + dtype=bfloat16 2025-09-07T10:36:33.9133152Z + for target in "${targets[@]}" 2025-09-07T10:36:33.9133379Z + target_flag=('--accuracy') 2025-09-07T10:36:33.9133604Z + local target_flag 2025-09-07T10:36:33.9134232Z + [[ accuracy == \p\e\r\f\o\r\m\a\n\c\e ]] 2025-09-07T10:36:33.9134492Z + [[ accuracy == \a\c\c\u\r\a\c\y ]] 2025-09-07T10:36:33.9134766Z + target_flag+=(--no-translation-validation) 2025-09-07T10:36:33.9135854Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *freezing-true* ]] 2025-09-07T10:36:33.9138350Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *default-true* ]] 2025-09-07T10:36:33.9140308Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --inference --bfloat16 --backend inductor --disable-cudagraphs --device cuda --total-partitions 7 --partition-id 1 --output /var/lib/jenkins/workspace/test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.csv 2025-09-07T10:36:34.9451176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:36:34.9452369Z import pynvml # type: ignore[import] 2025-09-07T10:36:39.2401259Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:36:39.2402551Z import pynvml # type: ignore[import] 2025-09-07T10:36:42.2325589Z 2025-09-07T10:36:43.0143425Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:36:43.0143964Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:36:43.0226717Z cuda eval crossvit_9_240 2025-09-07T10:37:00.2101244Z pass 2025-09-07T10:37:04.1877419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:37:04.1878686Z import pynvml # type: ignore[import] 2025-09-07T10:37:07.1825451Z 2025-09-07T10:37:08.7666818Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:37:08.7667380Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:37:08.7755967Z cuda eval cspdarknet53 2025-09-07T10:37:22.6723441Z pass 2025-09-07T10:37:26.4856589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:37:26.4858417Z import pynvml # type: ignore[import] 2025-09-07T10:37:29.4680745Z 2025-09-07T10:37:31.1831492Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:37:31.1831859Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:37:31.1880867Z cuda eval deit_base_distilled_patch16_224 2025-09-07T10:37:39.8998434Z pass 2025-09-07T10:37:43.4586587Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:37:43.4587801Z import pynvml # type: ignore[import] 2025-09-07T10:37:46.5079807Z 2025-09-07T10:37:47.7557253Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:37:47.7557653Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:37:47.7685533Z cuda eval dla102 2025-09-07T10:38:04.4006727Z pass 2025-09-07T10:38:08.3524649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:38:08.3525922Z import pynvml # type: ignore[import] 2025-09-07T10:38:11.3451686Z 2025-09-07T10:38:13.0512228Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:38:13.0512615Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:38:13.0581590Z cuda eval dm_nfnet_f0 2025-09-07T10:38:27.2690615Z pass 2025-09-07T10:38:31.1722537Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:38:31.1723982Z import pynvml # type: ignore[import] 2025-09-07T10:38:34.1789580Z 2025-09-07T10:38:35.5403099Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:38:35.5403466Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:38:35.5546075Z cuda eval dpn107 2025-09-07T10:38:58.3550263Z pass 2025-09-07T10:39:02.5808161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:39:02.5809319Z import pynvml # type: ignore[import] 2025-09-07T10:39:05.5656953Z 2025-09-07T10:39:06.8837522Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:39:06.8838093Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:39:06.8884536Z cuda eval eca_botnext26ts_256 2025-09-07T10:39:20.5134837Z pass 2025-09-07T10:39:24.3405317Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:39:24.3406678Z import pynvml # type: ignore[import] 2025-09-07T10:39:27.3305483Z 2025-09-07T10:39:28.4507404Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:39:28.4507785Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:39:28.4554401Z cuda eval eca_halonext26ts 2025-09-07T10:39:47.6775749Z pass 2025-09-07T10:39:50.5680140Z accuracy pass_rate=100.00% 2025-09-07T10:39:50.5684708Z calls_captured gmean=361.25x mean=380.875x 2025-09-07T10:39:50.5687800Z unique_graphs gmean=1.00x mean=1.000x 2025-09-07T10:39:50.5691296Z graph_breaks gmean=0.00x mean=0.000x 2025-09-07T10:39:50.5695021Z unique_graph_breaks gmean=0.00x mean=0.000x 2025-09-07T10:39:50.5698356Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T10:39:50.5701621Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T10:39:50.5705265Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T10:39:50.5706136Z compilation_latency mean=15.145 seconds 2025-09-07T10:39:51.6975265Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *cudagraphs-true* ]] 2025-09-07T10:39:51.6977790Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --inference --bfloat16 --backend inductor --device cuda --total-partitions 7 --partition-id 1 --output /var/lib/jenkins/workspace/test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.csv 2025-09-07T10:39:52.7394247Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:39:52.7395690Z import pynvml # type: ignore[import] 2025-09-07T10:39:57.0919232Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:39:57.0921162Z import pynvml # type: ignore[import] 2025-09-07T10:40:00.1639145Z 2025-09-07T10:40:01.1339489Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:40:01.1340888Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:40:01.1416923Z cuda eval crossvit_9_240 2025-09-07T10:40:18.2249961Z pass 2025-09-07T10:40:22.1748840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:40:22.1750116Z import pynvml # type: ignore[import] 2025-09-07T10:40:25.1463077Z 2025-09-07T10:40:26.6924274Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:40:26.6924664Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:40:26.7011103Z cuda eval cspdarknet53 2025-09-07T10:40:40.4056232Z pass 2025-09-07T10:40:44.2679484Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:40:44.2680744Z import pynvml # type: ignore[import] 2025-09-07T10:40:47.3179980Z 2025-09-07T10:40:48.7303153Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:40:48.7303697Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:40:48.7349615Z cuda eval deit_base_distilled_patch16_224 2025-09-07T10:40:57.2217088Z pass 2025-09-07T10:41:00.8298931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:41:00.8308882Z import pynvml # type: ignore[import] 2025-09-07T10:41:03.8654328Z 2025-09-07T10:41:05.9835474Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:41:05.9836007Z loading model: 0it [00:02, ?it/s] 2025-09-07T10:41:05.9963327Z cuda eval dla102 2025-09-07T10:41:23.0546175Z pass 2025-09-07T10:41:27.0092133Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:41:27.0094332Z import pynvml # type: ignore[import] 2025-09-07T10:41:30.0541950Z 2025-09-07T10:41:31.6195415Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:41:31.6195807Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:41:31.6262606Z cuda eval dm_nfnet_f0 2025-09-07T10:41:45.9685843Z pass 2025-09-07T10:41:49.8744449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:41:49.8745748Z import pynvml # type: ignore[import] 2025-09-07T10:41:52.8553494Z 2025-09-07T10:41:54.6720704Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:41:54.6721284Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:41:54.6865725Z cuda eval dpn107 2025-09-07T10:42:17.8310679Z pass 2025-09-07T10:42:21.9159893Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:42:21.9161602Z import pynvml # type: ignore[import] 2025-09-07T10:42:24.9405682Z 2025-09-07T10:42:26.2191144Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:42:26.2191659Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:42:26.2237584Z cuda eval eca_botnext26ts_256 2025-09-07T10:42:39.8064454Z pass 2025-09-07T10:42:43.5047731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:42:43.5048929Z import pynvml # type: ignore[import] 2025-09-07T10:42:46.4870842Z 2025-09-07T10:42:47.9477554Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:42:47.9478114Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:42:47.9525176Z cuda eval eca_halonext26ts 2025-09-07T10:43:06.8740382Z pass 2025-09-07T10:43:09.7518327Z accuracy pass_rate=100.00% 2025-09-07T10:43:09.7522232Z calls_captured gmean=361.25x mean=380.875x 2025-09-07T10:43:09.7525617Z unique_graphs gmean=1.00x mean=1.000x 2025-09-07T10:43:09.7529330Z graph_breaks gmean=0.00x mean=0.000x 2025-09-07T10:43:09.7532566Z unique_graph_breaks gmean=0.00x mean=0.000x 2025-09-07T10:43:09.7536371Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T10:43:09.7539354Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T10:43:09.7542608Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T10:43:09.7543553Z compilation_latency mean=15.125 seconds 2025-09-07T10:43:10.7682039Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *dynamic-true* ]] 2025-09-07T10:43:10.7684293Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --inference --bfloat16 --backend inductor --dynamic-shapes --dynamic-batch-only --device cuda --total-partitions 7 --partition-id 1 --output /var/lib/jenkins/workspace/test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_accuracy.csv 2025-09-07T10:43:11.7819214Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:43:11.7820884Z import pynvml # type: ignore[import] 2025-09-07T10:43:16.0835112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:43:16.0836991Z import pynvml # type: ignore[import] 2025-09-07T10:43:19.0461503Z 2025-09-07T10:43:19.9148174Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:43:19.9148502Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:43:19.9227617Z cuda eval crossvit_9_240 2025-09-07T10:43:24.8568208Z pass 2025-09-07T10:43:28.2734576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:43:28.2736419Z import pynvml # type: ignore[import] 2025-09-07T10:43:31.3031396Z 2025-09-07T10:43:32.2947771Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:43:32.2948238Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:43:32.3036430Z cuda eval cspdarknet53 2025-09-07T10:43:37.9416940Z pass 2025-09-07T10:43:41.3061997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:43:41.3064356Z import pynvml # type: ignore[import] 2025-09-07T10:43:44.3307581Z 2025-09-07T10:43:45.9745107Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:43:45.9746335Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:43:45.9792106Z cuda eval deit_base_distilled_patch16_224 2025-09-07T10:43:49.8457822Z pass 2025-09-07T10:43:53.3244938Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:43:53.3246812Z import pynvml # type: ignore[import] 2025-09-07T10:43:56.3844882Z 2025-09-07T10:43:57.5870954Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:43:57.5871502Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:43:57.6000478Z cuda eval dla102 2025-09-07T10:44:04.4052961Z pass 2025-09-07T10:44:07.9150100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:44:07.9151327Z import pynvml # type: ignore[import] 2025-09-07T10:44:10.8944023Z 2025-09-07T10:44:13.1768384Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:44:13.1768715Z loading model: 0it [00:02, ?it/s] 2025-09-07T10:44:13.1834189Z cuda eval dm_nfnet_f0 2025-09-07T10:44:18.0926098Z pass 2025-09-07T10:44:21.4799250Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:44:21.4801196Z import pynvml # type: ignore[import] 2025-09-07T10:44:24.4525832Z 2025-09-07T10:44:25.9728215Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:44:25.9728792Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:44:25.9873288Z cuda eval dpn107 2025-09-07T10:44:33.8424476Z pass 2025-09-07T10:44:37.6046800Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:44:37.6048582Z import pynvml # type: ignore[import] 2025-09-07T10:44:40.6206599Z 2025-09-07T10:44:41.9191080Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:44:41.9191641Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:44:41.9239280Z cuda eval eca_botnext26ts_256 2025-09-07T10:44:46.3231152Z pass 2025-09-07T10:44:49.7696228Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:44:49.7697385Z import pynvml # type: ignore[import] 2025-09-07T10:44:52.7551212Z 2025-09-07T10:44:54.2318021Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:44:54.2318397Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:44:54.2365608Z cuda eval eca_halonext26ts 2025-09-07T10:44:58.7830761Z pass 2025-09-07T10:45:01.3229579Z accuracy pass_rate=100.00% 2025-09-07T10:45:01.3233338Z calls_captured gmean=361.25x mean=380.875x 2025-09-07T10:45:01.3237743Z unique_graphs gmean=1.00x mean=1.000x 2025-09-07T10:45:01.3241316Z graph_breaks gmean=0.00x mean=0.000x 2025-09-07T10:45:01.3245266Z unique_graph_breaks gmean=0.00x mean=0.000x 2025-09-07T10:45:01.3248712Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T10:45:01.3252127Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T10:45:01.3256277Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T10:45:01.3257172Z compilation_latency mean=4.707 seconds 2025-09-07T10:45:02.3584101Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *cppwrapper-true* ]] 2025-09-07T10:45:02.3585380Z + TORCHINDUCTOR_CPP_WRAPPER=1 2025-09-07T10:45:02.3586764Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --inference --bfloat16 --backend inductor --disable-cudagraphs --device cuda --total-partitions 7 --partition-id 1 --output /var/lib/jenkins/workspace/test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_accuracy.csv 2025-09-07T10:45:03.3920179Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:45:03.3922095Z import pynvml # type: ignore[import] 2025-09-07T10:45:07.6733681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:45:07.6735448Z import pynvml # type: ignore[import] 2025-09-07T10:45:10.7488417Z 2025-09-07T10:45:12.0639052Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:45:12.0639642Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:45:12.0713060Z cuda eval crossvit_9_240 2025-09-07T10:45:39.2828193Z pass 2025-09-07T10:45:43.2452474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:45:43.2455942Z import pynvml # type: ignore[import] 2025-09-07T10:45:46.2237906Z 2025-09-07T10:45:47.7902777Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:45:47.7903368Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:45:47.7984551Z cuda eval cspdarknet53 2025-09-07T10:46:11.0880316Z pass 2025-09-07T10:46:14.9446170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:46:14.9448278Z import pynvml # type: ignore[import] 2025-09-07T10:46:17.9974103Z 2025-09-07T10:46:19.5716876Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:46:19.5717323Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:46:19.5764640Z cuda eval deit_base_distilled_patch16_224 2025-09-07T10:46:33.3251512Z pass 2025-09-07T10:46:36.9982951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:46:36.9984614Z import pynvml # type: ignore[import] 2025-09-07T10:46:40.0077361Z 2025-09-07T10:46:41.3777702Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:46:41.3778209Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:46:41.3926316Z cuda eval dla102 2025-09-07T10:47:10.7001713Z pass 2025-09-07T10:47:14.6933950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:47:14.6935142Z import pynvml # type: ignore[import] 2025-09-07T10:47:17.7015655Z 2025-09-07T10:47:19.4490592Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:47:19.4490944Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:47:19.4558564Z cuda eval dm_nfnet_f0 2025-09-07T10:47:42.0114594Z pass 2025-09-07T10:47:45.9267866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:47:45.9269011Z import pynvml # type: ignore[import] 2025-09-07T10:47:48.9789213Z 2025-09-07T10:47:51.6551313Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:47:51.6551705Z loading model: 0it [00:02, ?it/s] 2025-09-07T10:47:51.6689317Z cuda eval dpn107 2025-09-07T10:48:29.6528629Z pass 2025-09-07T10:48:33.9273586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:48:33.9275245Z import pynvml # type: ignore[import] 2025-09-07T10:48:36.9786814Z 2025-09-07T10:48:38.2560345Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:48:38.2560776Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:48:38.2604073Z cuda eval eca_botnext26ts_256 2025-09-07T10:48:57.5856295Z pass 2025-09-07T10:49:01.4362248Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:49:01.4363507Z import pynvml # type: ignore[import] 2025-09-07T10:49:04.4313658Z 2025-09-07T10:49:05.8943110Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:49:05.8944325Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:49:05.8991374Z cuda eval eca_halonext26ts 2025-09-07T10:49:31.9721083Z pass 2025-09-07T10:49:34.8925919Z accuracy pass_rate=100.00% 2025-09-07T10:49:34.8928822Z calls_captured gmean=361.25x mean=380.875x 2025-09-07T10:49:34.8932534Z unique_graphs gmean=1.00x mean=1.000x 2025-09-07T10:49:34.8936648Z graph_breaks gmean=0.00x mean=0.000x 2025-09-07T10:49:34.8939822Z unique_graph_breaks gmean=0.00x mean=0.000x 2025-09-07T10:49:34.8943128Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T10:49:34.8946956Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T10:49:34.8950042Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T10:49:34.8951283Z compilation_latency mean=24.279 seconds 2025-09-07T10:49:35.9225031Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *freezing_cudagraphs-true* ]] 2025-09-07T10:49:35.9226378Z + [[ inference == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T10:49:35.9227815Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --inference --bfloat16 --backend inductor --device cuda --total-partitions 7 --partition-id 1 --freezing --output /var/lib/jenkins/workspace/test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_accuracy.csv 2025-09-07T10:49:36.9833131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:49:36.9835322Z import pynvml # type: ignore[import] 2025-09-07T10:49:41.2044857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:49:41.2046141Z import pynvml # type: ignore[import] 2025-09-07T10:49:44.2430948Z 2025-09-07T10:49:45.1185216Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:49:45.1185728Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:49:45.1266072Z cuda eval crossvit_9_240 2025-09-07T10:50:04.3186698Z pass 2025-09-07T10:50:08.2278350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:50:08.2279671Z import pynvml # type: ignore[import] 2025-09-07T10:50:11.2223288Z 2025-09-07T10:50:12.6059151Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:50:12.6059512Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:50:12.6146904Z cuda eval cspdarknet53 2025-09-07T10:50:28.4753160Z pass 2025-09-07T10:50:32.2513190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:50:32.2514866Z import pynvml # type: ignore[import] 2025-09-07T10:50:35.3369156Z 2025-09-07T10:50:37.8204228Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:50:37.8204587Z loading model: 0it [00:02, ?it/s] 2025-09-07T10:50:37.8251290Z cuda eval deit_base_distilled_patch16_224 2025-09-07T10:50:47.6371920Z pass 2025-09-07T10:50:51.3185339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:50:51.3187031Z import pynvml # type: ignore[import] 2025-09-07T10:50:54.3017701Z 2025-09-07T10:50:55.9116498Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:50:55.9116944Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:50:55.9248074Z cuda eval dla102 2025-09-07T10:51:15.9291969Z pass 2025-09-07T10:51:19.6430970Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:51:19.6432365Z import pynvml # type: ignore[import] 2025-09-07T10:51:22.6177755Z 2025-09-07T10:51:24.7299771Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:51:24.7300160Z loading model: 0it [00:02, ?it/s] 2025-09-07T10:51:24.7366891Z cuda eval dm_nfnet_f0 2025-09-07T10:51:37.2221983Z pass 2025-09-07T10:51:41.0016316Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:51:41.0017428Z import pynvml # type: ignore[import] 2025-09-07T10:51:44.0172322Z 2025-09-07T10:51:46.1367273Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:51:46.1367629Z loading model: 0it [00:02, ?it/s] 2025-09-07T10:51:46.1504233Z cuda eval dpn107 2025-09-07T10:52:13.7168785Z pass 2025-09-07T10:52:17.7375165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:52:17.7377068Z import pynvml # type: ignore[import] 2025-09-07T10:52:20.7277718Z 2025-09-07T10:52:21.6648709Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:52:21.6649053Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:52:21.6694101Z cuda eval eca_botnext26ts_256 2025-09-07T10:52:37.2459061Z pass 2025-09-07T10:52:41.2958439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:52:41.2960391Z import pynvml # type: ignore[import] 2025-09-07T10:52:44.2643150Z 2025-09-07T10:52:45.7399387Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:52:45.7399927Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:52:45.7445646Z cuda eval eca_halonext26ts 2025-09-07T10:53:06.9186443Z pass 2025-09-07T10:53:09.7619044Z accuracy pass_rate=100.00% 2025-09-07T10:53:09.7622809Z calls_captured gmean=361.25x mean=380.875x 2025-09-07T10:53:09.7625830Z unique_graphs gmean=1.00x mean=1.000x 2025-09-07T10:53:09.7629703Z graph_breaks gmean=0.00x mean=0.000x 2025-09-07T10:53:09.7632901Z unique_graph_breaks gmean=0.00x mean=0.000x 2025-09-07T10:53:09.7636648Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T10:53:09.7640150Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T10:53:09.7643510Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T10:53:09.7644992Z compilation_latency mean=17.038 seconds 2025-09-07T10:53:10.8248694Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *freeze_autotune_cudagraphs-true* ]] 2025-09-07T10:53:10.8250504Z + [[ inference == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T10:53:10.8250889Z + TORCHINDUCTOR_MAX_AUTOTUNE=1 2025-09-07T10:53:10.8253406Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --inference --bfloat16 --backend inductor --device cuda --total-partitions 7 --partition-id 1 --freezing --output /var/lib/jenkins/workspace/test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.csv 2025-09-07T10:53:11.8679550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:53:11.8681124Z import pynvml # type: ignore[import] 2025-09-07T10:53:16.1226389Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:53:16.1228193Z import pynvml # type: ignore[import] 2025-09-07T10:53:19.1030036Z 2025-09-07T10:53:20.1268035Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:53:20.1268519Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:53:20.1348988Z cuda eval crossvit_9_240 2025-09-07T10:53:42.1600214Z Autotune Choices Stats: 2025-09-07T10:53:42.1601324Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_253", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.008320000022649765, "best_triton_pos": 0} 2025-09-07T10:53:42.1681936Z AUTOTUNE mm(3208x128, 128x384) 2025-09-07T10:53:42.1682278Z strides: [128, 1], [1, 128] 2025-09-07T10:53:42.1682544Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:53:42.1684111Z triton_mm_253 0.0083 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:42.1685108Z triton_mm_251 0.0084 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:42.1686040Z triton_mm_256 0.0084 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:42.1686939Z triton_mm_252 0.0085 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:42.1687830Z triton_mm_260 0.0085 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:42.1688732Z triton_mm_255 0.0086 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:42.1689627Z triton_mm_259 0.0086 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:42.1690535Z triton_mm_254 0.0088 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:42.1691429Z triton_mm_258 0.0088 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:42.1691996Z mm 0.0089 ms 93.5% 2025-09-07T10:53:42.1692409Z SingleProcess AUTOTUNE benchmarking takes 0.2459 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T10:53:42.7775142Z Autotune Choices Stats: 2025-09-07T10:53:42.7776608Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_97", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.009279999881982803, "best_triton_pos": 0} 2025-09-07T10:53:42.7856418Z AUTOTUNE mm(1576x256, 256x768) 2025-09-07T10:53:42.7856717Z strides: [256, 1], [1, 256] 2025-09-07T10:53:42.7856993Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:53:42.7857711Z triton_mm_97 0.0093 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:53:42.7858760Z triton_mm_101 0.0093 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:42.7859828Z triton_mm_100 0.0095 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:42.7860872Z triton_mm_107 0.0095 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:42.7861916Z triton_mm_108 0.0095 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:42.7862565Z mm 0.0095 ms 97.3% 2025-09-07T10:53:42.7863168Z triton_mm_104 0.0097 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:42.7864652Z triton_mm_103 0.0099 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:42.7865959Z triton_mm_99 0.0100 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:42.7867013Z triton_mm_106 0.0102 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:42.7867930Z SingleProcess AUTOTUNE benchmarking takes 0.2376 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T10:53:43.5047628Z Autotune Choices Stats: 2025-09-07T10:53:43.5048636Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_339", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006943999789655209, "best_triton_pos": 0} 2025-09-07T10:53:43.5125685Z AUTOTUNE mm(8x256, 256x256) 2025-09-07T10:53:43.5125983Z strides: [50432, 1], [1, 256] 2025-09-07T10:53:43.5126264Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:53:43.5126877Z triton_mm_339 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:43.5127811Z triton_mm_343 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:43.5128718Z triton_mm_338 0.0071 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:43.5129615Z triton_mm_342 0.0071 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:43.5130509Z triton_mm_337 0.0072 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:43.5131755Z mm 0.0073 ms 95.2% 2025-09-07T10:53:43.5132286Z triton_mm_351 0.0074 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:43.5133365Z triton_mm_346 0.0074 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:43.5134732Z triton_mm_336 0.0074 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T10:53:43.5135586Z triton_mm_349 0.0075 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:43.5136338Z SingleProcess AUTOTUNE benchmarking takes 0.2150 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T10:53:44.0135652Z Autotune Choices Stats: 2025-09-07T10:53:44.0136716Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_457", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.0063680000603199005, "best_triton_pos": 0} 2025-09-07T10:53:44.0215950Z AUTOTUNE mm(8x128, 128x128) 2025-09-07T10:53:44.0216222Z strides: [51328, 1], [1, 128] 2025-09-07T10:53:44.0216463Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:53:44.0217049Z triton_mm_457 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:44.0218432Z triton_mm_465 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:44.0219317Z triton_mm_461 0.0066 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:44.0220159Z triton_mm_456 0.0066 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:44.0220991Z triton_mm_468 0.0067 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:44.0221837Z triton_mm_462 0.0068 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:44.0222683Z triton_mm_463 0.0068 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:44.0223541Z triton_mm_467 0.0068 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:44.0224612Z triton_mm_470 0.0069 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:44.0225464Z triton_mm_458 0.0069 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:44.0226209Z SingleProcess AUTOTUNE benchmarking takes 0.2138 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T10:53:44.8305848Z Autotune Choices Stats: 2025-09-07T10:53:44.8307069Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_4", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=12, KERNEL_W=12, PADDING_H=0, PADDING_W=0, STRIDE_H=12, STRIDE_W=12, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.0451200008392334, "best_triton_pos": 0} 2025-09-07T10:53:44.8388521Z AUTOTUNE convolution(8x3x240x240, 128x3x12x12) 2025-09-07T10:53:44.8389139Z strides: [172800, 57600, 240, 1], [432, 144, 12, 1] 2025-09-07T10:53:44.8389476Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:53:44.8390295Z triton_convolution2d_4 0.0451 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=12, KERNEL_W=12, PADDING_H=0, PADDING_W=0, STRIDE_H=12, STRIDE_W=12, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:53:44.8391077Z convolution 0.0627 ms 72.0% 2025-09-07T10:53:44.8391842Z triton_convolution2d_0 0.0642 ms 70.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=12, KERNEL_W=12, PADDING_H=0, PADDING_W=0, STRIDE_H=12, STRIDE_W=12, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:53:44.8393093Z triton_convolution2d_6 0.0645 ms 70.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=12, KERNEL_W=12, PADDING_H=0, PADDING_W=0, STRIDE_H=12, STRIDE_W=12, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:53:44.8394733Z triton_convolution2d_1 0.0682 ms 66.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=12, KERNEL_W=12, PADDING_H=0, PADDING_W=0, STRIDE_H=12, STRIDE_W=12, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:53:44.8395781Z triton_convolution2d_3 0.0724 ms 62.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=12, KERNEL_W=12, PADDING_H=0, PADDING_W=0, STRIDE_H=12, STRIDE_W=12, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:53:44.8397080Z triton_convolution2d_5 0.0800 ms 56.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=12, KERNEL_W=12, PADDING_H=0, PADDING_W=0, STRIDE_H=12, STRIDE_W=12, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:53:44.8398102Z triton_convolution2d_2 0.1947 ms 23.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=12, KERNEL_W=12, PADDING_H=0, PADDING_W=0, STRIDE_H=12, STRIDE_W=12, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:53:44.8398907Z SingleProcess AUTOTUNE benchmarking takes 0.1450 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:53:45.0970954Z Autotune Choices Stats: 2025-09-07T10:53:45.0971863Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_15", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008576000109314919, "best_triton_pos": 0} 2025-09-07T10:53:45.1050300Z AUTOTUNE addmm(3208x384, 3208x128, 128x384) 2025-09-07T10:53:45.1050613Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T10:53:45.1050926Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:53:45.1051566Z triton_mm_15 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:45.1052462Z triton_mm_19 0.0090 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:45.1053342Z triton_mm_14 0.0090 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:53:45.1054514Z triton_mm_17 0.0091 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:45.1055419Z triton_mm_16 0.0091 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:45.1056465Z triton_mm_18 0.0091 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:45.1057314Z triton_mm_21 0.0092 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:45.1058232Z triton_mm_20 0.0092 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:45.1059067Z triton_mm_25 0.0093 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:45.1059895Z triton_mm_24 0.0094 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:45.1060643Z SingleProcess AUTOTUNE benchmarking takes 0.2658 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:53:45.3342770Z Autotune Choices Stats: 2025-09-07T10:53:45.3344053Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_33", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007424000184983015, "best_triton_pos": 0} 2025-09-07T10:53:45.3421838Z AUTOTUNE mm(3208x128, 128x128) 2025-09-07T10:53:45.3422110Z strides: [128, 1], [1, 128] 2025-09-07T10:53:45.3422392Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:53:45.3423043Z triton_mm_33 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:53:45.3424493Z triton_mm_28 0.0079 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:45.3425394Z triton_mm_38 0.0080 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:45.3426233Z triton_mm_37 0.0080 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:45.3427049Z triton_mm_27 0.0081 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:53:45.3427864Z triton_mm_40 0.0081 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:45.3428389Z mm 0.0082 ms 91.0% 2025-09-07T10:53:45.3428869Z triton_mm_29 0.0082 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:45.3429698Z triton_mm_36 0.0083 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:45.3430522Z triton_mm_34 0.0084 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:45.3431236Z SingleProcess AUTOTUNE benchmarking takes 0.2365 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T10:53:45.5410913Z Autotune Choices Stats: 2025-09-07T10:53:45.5412191Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.06739199906587601, "best_triton_pos": 1, "best_triton_time": 0.08297599852085114, "best_triton_kernel": "triton_convolution2d_49", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T10:53:45.5489496Z AUTOTUNE convolution(8x3x224x224, 256x3x16x16) 2025-09-07T10:53:45.5489809Z strides: [150528, 50176, 224, 1], [768, 256, 16, 1] 2025-09-07T10:53:45.5490089Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:53:45.5490345Z convolution 0.0674 ms 100.0% 2025-09-07T10:53:45.5491043Z triton_convolution2d_49 0.0830 ms 81.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:53:45.5492200Z triton_convolution2d_51 0.1354 ms 49.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:53:45.5493347Z triton_convolution2d_48 0.1412 ms 47.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:53:45.5494628Z triton_convolution2d_46 0.1536 ms 43.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:53:45.5495759Z triton_convolution2d_50 0.2057 ms 32.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:53:45.5497116Z triton_convolution2d_45 0.2288 ms 29.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:53:45.5498305Z triton_convolution2d_47 0.4252 ms 15.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:53:45.5499211Z SingleProcess AUTOTUNE benchmarking takes 0.2063 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:53:45.8085293Z Autotune Choices Stats: 2025-09-07T10:53:45.8086187Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_59", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.009631999768316746, "best_triton_pos": 0} 2025-09-07T10:53:45.8164809Z AUTOTUNE addmm(1576x768, 1576x256, 256x768) 2025-09-07T10:53:45.8165110Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T10:53:45.8165423Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:53:45.8166065Z triton_mm_59 0.0096 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:53:45.8166962Z triton_mm_63 0.0097 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:45.8167536Z bias_addmm 0.0097 ms 99.3% 2025-09-07T10:53:45.8168067Z triton_mm_62 0.0101 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:45.8168946Z triton_mm_66 0.0104 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:45.8169831Z triton_mm_61 0.0105 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:45.8170861Z triton_mm_65 0.0106 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:45.8171837Z triton_mm_69 0.0106 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:45.8172729Z triton_mm_70 0.0108 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:45.8173619Z triton_mm_68 0.0111 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:45.8174660Z SingleProcess AUTOTUNE benchmarking takes 0.2664 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:53:46.0472197Z Autotune Choices Stats: 2025-09-07T10:53:46.0473005Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_78", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.0080960001796484, "best_triton_pos": 0} 2025-09-07T10:53:46.0555666Z AUTOTUNE mm(1576x256, 256x256) 2025-09-07T10:53:46.0555932Z strides: [256, 1], [1, 256] 2025-09-07T10:53:46.0556215Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:53:46.0556880Z triton_mm_78 0.0081 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:53:46.0558277Z triton_mm_79 0.0082 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:46.0559267Z triton_mm_82 0.0084 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:46.0560222Z triton_mm_83 0.0086 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:46.0561180Z triton_mm_74 0.0088 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:46.0562128Z triton_mm_85 0.0089 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:46.0563078Z triton_mm_73 0.0089 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:46.0564252Z triton_mm_81 0.0090 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:46.0564954Z mm 0.0090 ms 89.7% 2025-09-07T10:53:46.0565485Z triton_mm_75 0.0091 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:46.0566252Z SingleProcess AUTOTUNE benchmarking takes 0.2385 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T10:53:46.2834735Z Autotune Choices Stats: 2025-09-07T10:53:46.2835731Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_117", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009568000212311745, "best_triton_pos": 0} 2025-09-07T10:53:46.2916078Z AUTOTUNE mm(1576x768, 768x256) 2025-09-07T10:53:46.2916310Z strides: [768, 1], [1, 768] 2025-09-07T10:53:46.2916538Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:53:46.2917114Z triton_mm_117 0.0096 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:46.2917777Z mm 0.0100 ms 95.5% 2025-09-07T10:53:46.2918239Z triton_mm_121 0.0104 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:46.2919009Z triton_mm_116 0.0116 ms 82.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:53:46.2919782Z triton_mm_113 0.0118 ms 80.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:46.2920550Z triton_mm_120 0.0119 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:46.2921314Z triton_mm_127 0.0121 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:46.2922081Z triton_mm_126 0.0128 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:46.2922849Z triton_mm_110 0.0128 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:53:46.2924031Z triton_mm_123 0.0129 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:46.2924791Z SingleProcess AUTOTUNE benchmarking takes 0.2356 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T10:53:46.9554356Z Autotune Choices Stats: 2025-09-07T10:53:46.9555503Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_269", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008576000109314919, "best_triton_pos": 0} 2025-09-07T10:53:46.9641566Z AUTOTUNE mm(3208x384, 384x128) 2025-09-07T10:53:46.9641848Z strides: [384, 1], [1, 384] 2025-09-07T10:53:46.9642134Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:53:46.9642838Z triton_mm_269 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:46.9644082Z triton_mm_268 0.0092 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:53:46.9644764Z mm 0.0093 ms 92.1% 2025-09-07T10:53:46.9645424Z triton_mm_273 0.0094 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:46.9646335Z triton_mm_272 0.0096 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:46.9647218Z triton_mm_264 0.0098 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:46.9648110Z triton_mm_271 0.0100 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:46.9649305Z triton_mm_262 0.0101 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:53:46.9650321Z triton_mm_265 0.0103 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:46.9651206Z triton_mm_275 0.0103 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:46.9651988Z SingleProcess AUTOTUNE benchmarking takes 0.6595 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T10:53:47.2137385Z Autotune Choices Stats: 2025-09-07T10:53:47.2138402Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_321", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006399999838322401, "best_triton_pos": 0} 2025-09-07T10:53:47.2230464Z AUTOTUNE addmm(8x256, 8x128, 128x256) 2025-09-07T10:53:47.2239119Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T10:53:47.2239480Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:53:47.2240101Z triton_mm_321 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:47.2240935Z triton_mm_325 0.0067 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:47.2242046Z triton_mm_320 0.0068 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:47.2242887Z triton_mm_332 0.0070 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:47.2243904Z triton_mm_329 0.0071 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:47.2244754Z triton_mm_326 0.0072 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:47.2245590Z triton_mm_334 0.0073 ms 88.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:47.2246411Z triton_mm_328 0.0073 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:47.2247221Z triton_mm_331 0.0073 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:47.2247987Z triton_mm_327 0.0075 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:47.2248675Z SingleProcess AUTOTUNE benchmarking takes 0.2554 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T10:53:47.4419877Z Autotune Choices Stats: 2025-09-07T10:53:47.4421222Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_385", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.007872000336647034, "best_triton_pos": 0} 2025-09-07T10:53:47.4557768Z AUTOTUNE bmm(32x1x64, 32x64x197) 2025-09-07T10:53:47.4558130Z strides: [64, 0, 1], [12608, 197, 1] 2025-09-07T10:53:47.4558509Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:53:47.4559394Z triton_bmm_385 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:47.4560958Z triton_bmm_373 0.0079 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:47.4562296Z triton_bmm_374 0.0080 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:47.4563639Z triton_bmm_384 0.0081 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:47.4565434Z triton_bmm_381 0.0082 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:47.4566786Z triton_bmm_377 0.0083 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:53:47.4568125Z triton_bmm_378 0.0085 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:47.4569437Z triton_bmm_379 0.0085 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:47.4570979Z triton_bmm_382 0.0085 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:47.4572335Z triton_bmm_371 0.0086 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T10:53:47.4573495Z SingleProcess AUTOTUNE benchmarking takes 0.2294 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T10:53:47.6369033Z Autotune Choices Stats: 2025-09-07T10:53:47.6370008Z {"num_choices": 14, "num_triton_choices": 13, "best_kernel": "triton_bmm_415", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007648000027984381, "best_triton_pos": 0} 2025-09-07T10:53:47.6455959Z AUTOTUNE bmm(32x1x197, 32x197x64) 2025-09-07T10:53:47.6456302Z strides: [197, 0, 1], [12608, 64, 1] 2025-09-07T10:53:47.6456602Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:53:47.6457326Z triton_bmm_415 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:47.6458383Z triton_bmm_411 0.0077 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:47.6459389Z triton_bmm_418 0.0077 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:47.6460360Z triton_bmm_408 0.0080 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T10:53:47.6461334Z triton_bmm_419 0.0082 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:47.6462528Z triton_bmm_414 0.0086 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:47.6463493Z triton_bmm_409 0.0093 ms 82.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:47.6464971Z triton_bmm_410 0.0095 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:47.6465926Z triton_bmm_417 0.0097 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:47.6466766Z triton_bmm_416 0.0100 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:47.6467494Z SingleProcess AUTOTUNE benchmarking takes 0.1848 seconds and 0.0002 seconds precompiling for 14 choices 2025-09-07T10:53:47.8526421Z Autotune Choices Stats: 2025-09-07T10:53:47.8527387Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_424", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.00684799998998642, "best_triton_pos": 0} 2025-09-07T10:53:47.8613452Z AUTOTUNE mm(8x256, 256x256) 2025-09-07T10:53:47.8613953Z strides: [256, 1], [1, 256] 2025-09-07T10:53:47.8614223Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:53:47.8614890Z triton_mm_424 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:47.8616423Z triton_mm_423 0.0069 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:47.8617076Z mm 0.0070 ms 98.2% 2025-09-07T10:53:47.8617669Z triton_mm_428 0.0070 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:47.8618630Z triton_mm_427 0.0071 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:47.8619590Z triton_mm_422 0.0072 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:47.8620533Z triton_mm_431 0.0074 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:47.8621491Z triton_mm_436 0.0074 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:47.8622467Z triton_mm_421 0.0074 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T10:53:47.8623424Z triton_mm_432 0.0075 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:47.8624457Z SingleProcess AUTOTUNE benchmarking takes 0.2153 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T10:53:48.0988488Z Autotune Choices Stats: 2025-09-07T10:53:48.0989485Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_441", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.0071680000983178616, "best_triton_pos": 0} 2025-09-07T10:53:48.1078895Z AUTOTUNE addmm(8x128, 8x256, 256x128) 2025-09-07T10:53:48.1079196Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T10:53:48.1079554Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:53:48.1080442Z triton_mm_441 0.0072 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:48.1081434Z triton_mm_440 0.0072 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:48.1082396Z triton_mm_439 0.0073 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:48.1083390Z triton_mm_438 0.0076 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T10:53:48.1084768Z triton_mm_445 0.0076 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:48.1085848Z triton_mm_449 0.0076 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:48.1086749Z triton_mm_444 0.0077 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:48.1087636Z triton_mm_451 0.0078 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:48.1088696Z triton_mm_453 0.0078 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:48.1089608Z triton_mm_448 0.0079 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:48.1090391Z SingleProcess AUTOTUNE benchmarking takes 0.2458 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T10:53:48.2808782Z Autotune Choices Stats: 2025-09-07T10:53:48.2809683Z {"num_choices": 14, "num_triton_choices": 13, "best_kernel": "triton_bmm_490", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2", "best_time": 0.007296000141650438, "best_triton_pos": 0} 2025-09-07T10:53:48.2894920Z AUTOTUNE bmm(32x1x32, 32x32x401) 2025-09-07T10:53:48.2895221Z strides: [32, 0, 1], [12864, 401, 1] 2025-09-07T10:53:48.2895562Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:53:48.2896345Z triton_bmm_490 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T10:53:48.2897382Z triton_bmm_495 0.0075 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:53:48.2898347Z triton_bmm_492 0.0076 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:48.2899302Z triton_bmm_496 0.0076 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:48.2900258Z triton_bmm_500 0.0077 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:48.2901532Z triton_bmm_497 0.0080 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:48.2902606Z triton_bmm_494 0.0080 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:53:48.2903573Z triton_bmm_498 0.0081 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:48.2904889Z triton_bmm_499 0.0081 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:48.2905885Z triton_bmm_502 0.0082 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:48.2906579Z SingleProcess AUTOTUNE benchmarking takes 0.1759 seconds and 0.0002 seconds precompiling for 14 choices 2025-09-07T10:53:48.4328633Z Autotune Choices Stats: 2025-09-07T10:53:48.4329766Z {"num_choices": 12, "num_triton_choices": 11, "best_kernel": "bmm", "best_time": 0.008832000195980072, "best_triton_pos": 1, "best_triton_time": 0.009184000082314014, "best_triton_kernel": "triton_bmm_525", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2"} 2025-09-07T10:53:48.4414926Z AUTOTUNE bmm(32x1x401, 32x401x32) 2025-09-07T10:53:48.4415605Z strides: [401, 12864, 1], [12864, 32, 1] 2025-09-07T10:53:48.4415983Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:53:48.4416320Z bmm 0.0088 ms 100.0% 2025-09-07T10:53:48.4417369Z triton_bmm_525 0.0092 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:48.4418476Z triton_bmm_531 0.0092 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T10:53:48.4419487Z triton_bmm_523 0.0101 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T10:53:48.4420486Z triton_bmm_532 0.0106 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:48.4421471Z triton_bmm_528 0.0110 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T10:53:48.4422449Z triton_bmm_524 0.0133 ms 66.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:48.4423418Z triton_bmm_529 0.0133 ms 66.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=2 2025-09-07T10:53:48.4424684Z triton_bmm_527 0.0135 ms 65.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T10:53:48.4425697Z triton_bmm_530 0.0135 ms 65.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=2 2025-09-07T10:53:48.4426430Z SingleProcess AUTOTUNE benchmarking takes 0.1488 seconds and 0.0002 seconds precompiling for 12 choices 2025-09-07T10:53:48.6490963Z Autotune Choices Stats: 2025-09-07T10:53:48.6491917Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_536", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006271999794989824, "best_triton_pos": 0} 2025-09-07T10:53:48.6580149Z AUTOTUNE mm(8x128, 128x128) 2025-09-07T10:53:48.6580424Z strides: [128, 1], [1, 128] 2025-09-07T10:53:48.6580698Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:53:48.6581395Z triton_mm_536 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:48.6582493Z triton_mm_540 0.0063 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:48.6583543Z triton_mm_535 0.0063 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:48.6584772Z triton_mm_544 0.0067 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:48.6585774Z triton_mm_546 0.0068 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:48.6586599Z triton_mm_547 0.0068 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:48.6587446Z triton_mm_541 0.0068 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:48.6588481Z triton_mm_542 0.0069 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:48.6589326Z triton_mm_549 0.0069 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:48.6590161Z triton_mm_543 0.0070 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:48.6590895Z SingleProcess AUTOTUNE benchmarking takes 0.2152 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T10:53:49.0309570Z Autotune Choices Stats: 2025-09-07T10:53:49.0310563Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_1727", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.00687999976798892, "best_triton_pos": 0} 2025-09-07T10:53:49.0402338Z AUTOTUNE addmm(8x1000, 8x128, 128x1000) 2025-09-07T10:53:49.0402638Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T10:53:49.0402969Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:53:49.0404176Z triton_mm_1727 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:49.0405179Z triton_mm_1726 0.0069 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:49.0406216Z triton_mm_1731 0.0071 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:49.0407278Z triton_mm_1735 0.0073 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:49.0408161Z triton_mm_1738 0.0073 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:49.0409168Z triton_mm_1732 0.0073 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:49.0410075Z triton_mm_1737 0.0073 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:49.0410992Z triton_mm_1740 0.0073 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:49.0411898Z triton_mm_1734 0.0075 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:53:49.0412795Z triton_mm_1733 0.0078 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:49.0413593Z SingleProcess AUTOTUNE benchmarking takes 0.2440 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T10:53:49.2759594Z Autotune Choices Stats: 2025-09-07T10:53:49.2760615Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_1745", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.007360000163316727, "best_triton_pos": 0} 2025-09-07T10:53:49.2848420Z AUTOTUNE addmm(8x1000, 8x256, 256x1000) 2025-09-07T10:53:49.2848903Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T10:53:49.2849231Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:53:49.2849875Z triton_mm_1745 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:49.2850793Z triton_mm_1749 0.0074 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:49.2851685Z triton_mm_1744 0.0078 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:53:49.2852582Z triton_mm_1753 0.0078 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:49.2853476Z triton_mm_1748 0.0079 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:49.2854716Z triton_mm_1743 0.0080 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:53:49.2855640Z triton_mm_1757 0.0080 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:53:49.2856531Z triton_mm_1755 0.0080 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:53:49.2857393Z triton_mm_1752 0.0081 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:53:49.2858339Z triton_mm_1742 0.0083 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T10:53:49.2859078Z SingleProcess AUTOTUNE benchmarking takes 0.2441 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T10:53:54.4504396Z pass 2025-09-07T10:53:58.7460851Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:53:58.7462867Z import pynvml # type: ignore[import] 2025-09-07T10:54:01.7451406Z 2025-09-07T10:54:03.3759640Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:54:03.3760006Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:54:03.3852156Z cuda eval cspdarknet53 2025-09-07T10:54:23.3028885Z Autotune Choices Stats: 2025-09-07T10:54:23.3029988Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_85", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.024512000381946564, "best_triton_pos": 0} 2025-09-07T10:54:23.3130692Z AUTOTUNE addmm(131072x64, 131072x128, 128x64) 2025-09-07T10:54:23.3130993Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T10:54:23.3131306Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:54:23.3132013Z triton_mm_85 0.0245 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:23.3132981Z triton_mm_90 0.0249 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:23.3134437Z bias_addmm 0.0252 ms 97.5% 2025-09-07T10:54:23.3135069Z triton_mm_81 0.0258 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:54:23.3136031Z triton_mm_87 0.0279 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:23.3136996Z triton_mm_88 0.0286 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:23.3137994Z triton_mm_89 0.0293 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T10:54:23.3138947Z triton_mm_80 0.0296 ms 82.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:54:23.3139844Z triton_mm_83 0.0297 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:23.3140744Z triton_mm_86 0.0300 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:54:23.3141533Z SingleProcess AUTOTUNE benchmarking takes 0.2761 seconds and 0.0004 seconds precompiling for 20 choices 2025-09-07T10:54:23.8496392Z Autotune Choices Stats: 2025-09-07T10:54:23.8497425Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_47", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.016095999628305435, "best_triton_pos": 0} 2025-09-07T10:54:23.8588108Z AUTOTUNE addmm(131072x32, 131072x64, 64x32) 2025-09-07T10:54:23.8588465Z strides: [0, 1], [64, 1], [1, 64] 2025-09-07T10:54:23.8588799Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:54:23.8589521Z triton_mm_47 0.0161 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:23.8590397Z bias_addmm 0.0162 ms 99.4% 2025-09-07T10:54:23.8591014Z triton_mm_39 0.0165 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:54:23.8591974Z triton_mm_43 0.0166 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:54:23.8592926Z triton_mm_42 0.0167 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:23.8594300Z triton_mm_36 0.0170 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:54:23.8595283Z triton_mm_48 0.0171 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:54:23.8596255Z triton_mm_45 0.0172 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:23.8597290Z triton_mm_44 0.0177 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:23.8598452Z triton_mm_41 0.0178 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:23.8599199Z SingleProcess AUTOTUNE benchmarking takes 0.2511 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T10:54:24.3830762Z Autotune Choices Stats: 2025-09-07T10:54:24.3831876Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_203", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.013120000250637531, "best_triton_pos": 0} 2025-09-07T10:54:24.3915590Z AUTOTUNE addmm(32768x128, 32768x128, 128x128) 2025-09-07T10:54:24.3915922Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T10:54:24.3916259Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:54:24.3917123Z triton_mm_203 0.0131 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:24.3918180Z triton_mm_195 0.0134 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:24.3919284Z triton_mm_199 0.0136 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:24.3920335Z triton_mm_202 0.0136 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:24.3921339Z triton_mm_192 0.0137 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:54:24.3922327Z triton_mm_196 0.0138 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:24.3923974Z triton_mm_200 0.0140 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:24.3924968Z triton_mm_201 0.0141 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T10:54:24.3926119Z triton_mm_197 0.0142 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:24.3927103Z triton_mm_193 0.0156 ms 84.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:54:24.3927963Z SingleProcess AUTOTUNE benchmarking takes 0.2744 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:54:24.7510012Z Autotune Choices Stats: 2025-09-07T10:54:24.7511115Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_135", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.009472000412642956, "best_triton_pos": 0} 2025-09-07T10:54:24.7594906Z AUTOTUNE addmm(32768x64, 32768x64, 64x64) 2025-09-07T10:54:24.7595221Z strides: [0, 1], [64, 1], [1, 64] 2025-09-07T10:54:24.7595549Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:54:24.7596312Z triton_mm_135 0.0095 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:54:24.7597947Z triton_mm_134 0.0097 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:24.7598881Z triton_mm_129 0.0098 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:24.7599677Z triton_mm_130 0.0098 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:54:24.7600467Z triton_mm_126 0.0099 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:54:24.7601258Z triton_mm_127 0.0099 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:24.7602053Z triton_mm_128 0.0099 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:24.7602850Z triton_mm_132 0.0099 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:24.7603652Z triton_mm_124 0.0100 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:54:24.7604778Z triton_mm_125 0.0100 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:54:24.7605495Z SingleProcess AUTOTUNE benchmarking takes 0.2585 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T10:54:25.3044080Z Autotune Choices Stats: 2025-09-07T10:54:25.3045488Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.011071999557316303, "best_triton_pos": 1, "best_triton_time": 0.011455999687314034, "best_triton_kernel": "triton_mm_468", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8"} 2025-09-07T10:54:25.3131254Z AUTOTUNE addmm(8192x256, 8192x256, 256x256) 2025-09-07T10:54:25.3131750Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T10:54:25.3132061Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:54:25.3132381Z bias_addmm 0.0111 ms 100.0% 2025-09-07T10:54:25.3132980Z triton_mm_468 0.0115 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:25.3134116Z triton_mm_465 0.0115 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:54:25.3135076Z triton_mm_469 0.0116 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:25.3136043Z triton_mm_472 0.0116 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:25.3137014Z triton_mm_475 0.0116 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:25.3137990Z triton_mm_476 0.0116 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:54:25.3138973Z triton_mm_471 0.0119 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:25.3140109Z triton_mm_467 0.0119 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:25.3141012Z triton_mm_474 0.0124 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:25.3141802Z SingleProcess AUTOTUNE benchmarking takes 0.2769 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:54:25.6832235Z Autotune Choices Stats: 2025-09-07T10:54:25.6833351Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_239", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009151999838650227, "best_triton_pos": 0} 2025-09-07T10:54:25.6919709Z AUTOTUNE addmm(8192x128, 8192x128, 128x128) 2025-09-07T10:54:25.6920075Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T10:54:25.6920409Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:54:25.6921164Z triton_mm_239 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:54:25.6922160Z triton_mm_244 0.0094 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:25.6923135Z triton_mm_243 0.0094 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:54:25.6924289Z triton_mm_241 0.0095 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:25.6925257Z triton_mm_245 0.0095 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:25.6926576Z triton_mm_238 0.0096 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:54:25.6927661Z triton_mm_242 0.0096 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:25.6928808Z triton_mm_235 0.0096 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:54:25.6929775Z triton_mm_240 0.0098 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:25.6930742Z triton_mm_232 0.0101 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:54:25.6931588Z SingleProcess AUTOTUNE benchmarking takes 0.2687 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:54:26.3412295Z Autotune Choices Stats: 2025-09-07T10:54:26.3413284Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_742", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.01027199998497963, "best_triton_pos": 0} 2025-09-07T10:54:26.3505454Z AUTOTUNE addmm(2048x512, 2048x512, 512x512) 2025-09-07T10:54:26.3505782Z strides: [0, 1], [512, 1], [1, 512] 2025-09-07T10:54:26.3506115Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:54:26.3507372Z triton_mm_742 0.0103 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:54:26.3508075Z bias_addmm 0.0105 ms 97.6% 2025-09-07T10:54:26.3508758Z triton_mm_741 0.0110 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:26.3509654Z triton_mm_737 0.0112 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:54:26.3510505Z triton_mm_740 0.0121 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:26.3511346Z triton_mm_738 0.0122 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:54:26.3512209Z triton_mm_744 0.0123 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:26.3513071Z triton_mm_748 0.0123 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:54:26.3514140Z triton_mm_747 0.0124 ms 82.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:26.3515008Z triton_mm_739 0.0133 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:26.3515754Z SingleProcess AUTOTUNE benchmarking takes 0.2833 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:54:26.7335351Z Autotune Choices Stats: 2025-09-07T10:54:26.7337430Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_511", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008767999708652496, "best_triton_pos": 0} 2025-09-07T10:54:26.7428959Z AUTOTUNE addmm(2048x256, 2048x256, 256x256) 2025-09-07T10:54:26.7429470Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T10:54:26.7429962Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:54:26.7431013Z triton_mm_511 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:54:26.7432466Z triton_mm_510 0.0091 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:54:26.7434238Z triton_mm_514 0.0093 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:26.7435716Z triton_mm_504 0.0093 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:54:26.7437274Z triton_mm_505 0.0094 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:54:26.7438753Z triton_mm_515 0.0094 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:54:26.7439687Z bias_addmm 0.0095 ms 91.9% 2025-09-07T10:54:26.7441042Z triton_mm_506 0.0097 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:54:26.7442589Z triton_mm_507 0.0098 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:54:26.7444329Z triton_mm_513 0.0098 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:26.7445627Z SingleProcess AUTOTUNE benchmarking takes 0.2782 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:54:27.5958848Z Autotune Choices Stats: 2025-09-07T10:54:27.5960313Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_779", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008031999692320824, "best_triton_pos": 0} 2025-09-07T10:54:27.6061014Z AUTOTUNE addmm(512x512, 512x512, 512x512) 2025-09-07T10:54:27.6061350Z strides: [0, 1], [512, 1], [1, 512] 2025-09-07T10:54:27.6061700Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:54:27.6062292Z triton_mm_779 0.0080 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:54:27.6063092Z triton_mm_783 0.0084 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:54:27.6064425Z triton_mm_778 0.0094 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:54:27.6065198Z triton_mm_782 0.0096 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:54:27.6066525Z triton_mm_776 0.0098 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:54:27.6067954Z triton_mm_785 0.0112 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:27.6069591Z triton_mm_792 0.0113 ms 71.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:27.6070591Z triton_mm_784 0.0122 ms 66.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:27.6071097Z addmm 0.0129 ms 62.3% 2025-09-07T10:54:27.6071579Z triton_mm_781 0.0158 ms 50.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:54:27.6072254Z SingleProcess AUTOTUNE benchmarking takes 0.3973 seconds and 0.0004 seconds precompiling for 21 choices 2025-09-07T10:54:28.6232087Z Autotune Choices Stats: 2025-09-07T10:54:28.6233319Z {"num_choices": 7, "num_triton_choices": 6, "best_kernel": "triton_convolution2d_2", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8", "best_time": 0.09017600119113922, "best_triton_pos": 0} 2025-09-07T10:54:28.6330452Z AUTOTUNE convolution(8x3x256x256, 32x3x3x3) 2025-09-07T10:54:28.6330792Z strides: [196608, 1, 768, 3], [27, 1, 9, 3] 2025-09-07T10:54:28.6331102Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:54:28.6332360Z triton_convolution2d_2 0.0902 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:54:28.6333624Z triton_convolution2d_4 0.0904 ms 99.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:28.6335085Z triton_convolution2d_3 0.0984 ms 91.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:28.6336302Z triton_convolution2d_0 0.0986 ms 91.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:28.6337042Z convolution 0.1041 ms 86.7% 2025-09-07T10:54:28.6337771Z triton_convolution2d_1 0.1201 ms 75.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:28.6338984Z triton_convolution2d_5 0.1363 ms 66.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:28.6339947Z SingleProcess AUTOTUNE benchmarking takes 0.1893 seconds and 0.0002 seconds precompiling for 7 choices 2025-09-07T10:54:28.7522995Z Autotune Choices Stats: 2025-09-07T10:54:28.7524536Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.03324799984693527, "best_triton_pos": 1, "best_triton_time": 0.03548799827694893, "best_triton_kernel": "triton_convolution2d_9", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T10:54:28.7616050Z AUTOTUNE convolution(8x32x256x256, 64x32x3x3) 2025-09-07T10:54:28.7616379Z strides: [2097152, 1, 8192, 32], [288, 1, 96, 32] 2025-09-07T10:54:28.7616701Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:54:28.7616979Z convolution 0.0332 ms 100.0% 2025-09-07T10:54:28.7617713Z triton_convolution2d_9 0.0355 ms 93.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:28.7619142Z triton_convolution2d_12 0.0365 ms 91.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:28.7620364Z triton_convolution2d_10 0.0377 ms 88.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:28.7621498Z triton_convolution2d_11 0.0430 ms 77.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:28.7622625Z triton_convolution2d_7 0.0474 ms 70.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:28.7623907Z triton_convolution2d_6 0.0634 ms 52.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:28.7625039Z triton_convolution2d_8 0.1230 ms 27.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:54:28.7626164Z SingleProcess AUTOTUNE benchmarking takes 0.1280 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:54:29.0306961Z Autotune Choices Stats: 2025-09-07T10:54:29.0307960Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_25", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.024671999737620354, "best_triton_pos": 0} 2025-09-07T10:54:29.0396696Z AUTOTUNE addmm(131072x128, 131072x64, 64x128) 2025-09-07T10:54:29.0397125Z strides: [0, 1], [64, 1], [1, 64] 2025-09-07T10:54:29.0397450Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:54:29.0398178Z triton_mm_25 0.0247 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:54:29.0399186Z triton_mm_24 0.0248 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:29.0400164Z triton_mm_21 0.0263 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:54:29.0400947Z triton_mm_22 0.0264 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:29.0401721Z triton_mm_26 0.0269 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:29.0402505Z triton_mm_31 0.0275 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:54:29.0403301Z triton_mm_23 0.0278 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:29.0404494Z triton_mm_20 0.0283 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:54:29.0405421Z triton_mm_27 0.0283 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:29.0406202Z triton_mm_30 0.0284 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:29.0406911Z SingleProcess AUTOTUNE benchmarking takes 0.2770 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:54:29.1496843Z Autotune Choices Stats: 2025-09-07T10:54:29.1498130Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.02800000086426735, "best_triton_pos": 1, "best_triton_time": 0.028736000880599022, "best_triton_kernel": "triton_convolution2d_55", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T10:54:29.1588379Z AUTOTUNE convolution(8x32x128x128, 64x32x3x3) 2025-09-07T10:54:29.1588744Z strides: [524288, 1, 4096, 32], [288, 1, 96, 32] 2025-09-07T10:54:29.1589063Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:54:29.1589349Z convolution 0.0280 ms 100.0% 2025-09-07T10:54:29.1590110Z triton_convolution2d_55 0.0287 ms 97.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:29.1591711Z triton_convolution2d_52 0.0288 ms 97.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:29.1592988Z triton_convolution2d_53 0.0324 ms 86.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:29.1594585Z triton_convolution2d_54 0.0358 ms 78.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:29.1595803Z triton_convolution2d_50 0.0403 ms 69.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:29.1597125Z triton_convolution2d_49 0.0610 ms 45.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:29.1598365Z triton_convolution2d_51 0.0821 ms 34.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:54:29.1599337Z SingleProcess AUTOTUNE benchmarking takes 0.1186 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:54:29.4061095Z Autotune Choices Stats: 2025-09-07T10:54:29.4061990Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.017823999747633934, "best_triton_pos": 0} 2025-09-07T10:54:29.4153431Z AUTOTUNE addmm(131072x64, 131072x64, 64x64) 2025-09-07T10:54:29.4154045Z strides: [0, 1], [64, 1], [1, 64] 2025-09-07T10:54:29.4154838Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:54:29.4155575Z triton_mm_72 0.0178 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:29.4156759Z triton_mm_73 0.0180 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:54:29.4157525Z bias_addmm 0.0184 ms 96.7% 2025-09-07T10:54:29.4158159Z triton_mm_64 0.0194 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:54:29.4159119Z triton_mm_68 0.0194 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:54:29.4160097Z triton_mm_67 0.0196 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:29.4160875Z triton_mm_63 0.0197 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:54:29.4161666Z triton_mm_69 0.0204 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:29.4162457Z triton_mm_70 0.0205 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:29.4163247Z triton_mm_71 0.0206 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T10:54:29.4164345Z SingleProcess AUTOTUNE benchmarking takes 0.2558 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T10:54:29.5251991Z Autotune Choices Stats: 2025-09-07T10:54:29.5253277Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.024800000712275505, "best_triton_pos": 1, "best_triton_time": 0.026496000587940216, "best_triton_kernel": "triton_convolution2d_95", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T10:54:29.5343269Z AUTOTUNE convolution(8x64x128x128, 128x64x3x3) 2025-09-07T10:54:29.5343952Z strides: [1048576, 1, 8192, 64], [576, 1, 192, 64] 2025-09-07T10:54:29.5344295Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:54:29.5344581Z convolution 0.0248 ms 100.0% 2025-09-07T10:54:29.5345385Z triton_convolution2d_95 0.0265 ms 93.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:29.5346653Z triton_convolution2d_97 0.0265 ms 93.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:29.5347888Z triton_convolution2d_98 0.0298 ms 83.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:29.5349111Z triton_convolution2d_96 0.0358 ms 69.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:29.5350348Z triton_convolution2d_93 0.0381 ms 65.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:29.5351668Z triton_convolution2d_92 0.0413 ms 60.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:29.5352832Z triton_convolution2d_94 0.1061 ms 23.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:54:29.5353661Z SingleProcess AUTOTUNE benchmarking takes 0.1184 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:54:29.6394414Z Autotune Choices Stats: 2025-09-07T10:54:29.6395541Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_140", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.01894400082528591, "best_triton_pos": 0} 2025-09-07T10:54:29.6483048Z AUTOTUNE convolution(8x64x64x64, 64x64x3x3) 2025-09-07T10:54:29.6483380Z strides: [262144, 1, 4096, 64], [576, 1, 192, 64] 2025-09-07T10:54:29.6483885Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:54:29.6484674Z triton_convolution2d_140 0.0189 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:29.6485439Z convolution 0.0192 ms 98.7% 2025-09-07T10:54:29.6486171Z triton_convolution2d_141 0.0193 ms 98.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:29.6487888Z triton_convolution2d_139 0.0199 ms 95.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:29.6489116Z triton_convolution2d_142 0.0251 ms 75.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:29.6490454Z triton_convolution2d_137 0.0327 ms 57.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:29.6491576Z triton_convolution2d_136 0.0328 ms 57.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:29.6492702Z triton_convolution2d_138 0.0531 ms 35.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:54:29.6493600Z SingleProcess AUTOTUNE benchmarking takes 0.1104 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:54:29.7710150Z Autotune Choices Stats: 2025-09-07T10:54:29.7711472Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01913600042462349, "best_triton_pos": 1, "best_triton_time": 0.03372799977660179, "best_triton_kernel": "triton_convolution2d_209", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T10:54:29.7799452Z AUTOTUNE convolution(8x128x64x64, 256x128x3x3) 2025-09-07T10:54:29.7799781Z strides: [524288, 1, 8192, 128], [1152, 1, 384, 128] 2025-09-07T10:54:29.7800094Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:54:29.7800552Z convolution 0.0191 ms 100.0% 2025-09-07T10:54:29.7801301Z triton_convolution2d_209 0.0337 ms 56.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:29.7802628Z triton_convolution2d_208 0.0412 ms 46.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:29.7804225Z triton_convolution2d_211 0.0433 ms 44.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:29.7805443Z triton_convolution2d_210 0.0463 ms 41.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:29.7806649Z triton_convolution2d_206 0.0567 ms 33.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:29.7807873Z triton_convolution2d_205 0.0599 ms 32.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:29.7809077Z triton_convolution2d_207 0.1035 ms 18.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:54:29.7810160Z SingleProcess AUTOTUNE benchmarking takes 0.1257 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:54:29.8952547Z Autotune Choices Stats: 2025-09-07T10:54:29.8954371Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.017184000462293625, "best_triton_pos": 1, "best_triton_time": 0.03062400035560131, "best_triton_kernel": "triton_convolution2d_254", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T10:54:29.9044657Z AUTOTUNE convolution(8x128x32x32, 128x128x3x3) 2025-09-07T10:54:29.9045001Z strides: [131072, 1, 4096, 128], [1152, 1, 384, 128] 2025-09-07T10:54:29.9045321Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:54:29.9045600Z convolution 0.0172 ms 100.0% 2025-09-07T10:54:29.9046356Z triton_convolution2d_254 0.0306 ms 56.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:29.9047606Z triton_convolution2d_255 0.0351 ms 49.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:29.9048855Z triton_convolution2d_256 0.0366 ms 46.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:29.9050212Z triton_convolution2d_253 0.0382 ms 45.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:29.9051356Z triton_convolution2d_250 0.0481 ms 35.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:29.9052502Z triton_convolution2d_251 0.0537 ms 32.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:29.9053967Z triton_convolution2d_252 0.0994 ms 17.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:54:29.9054988Z SingleProcess AUTOTUNE benchmarking takes 0.1208 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:54:30.0770368Z Autotune Choices Stats: 2025-09-07T10:54:30.0771626Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.02380800060927868, "best_triton_pos": 1, "best_triton_time": 0.05552000179886818, "best_triton_kernel": "triton_convolution2d_481", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T10:54:30.0861621Z AUTOTUNE convolution(8x256x32x32, 512x256x3x3) 2025-09-07T10:54:30.0861975Z strides: [262144, 1, 8192, 256], [2304, 1, 768, 256] 2025-09-07T10:54:30.0862289Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:54:30.0862579Z convolution 0.0238 ms 100.0% 2025-09-07T10:54:30.0863346Z triton_convolution2d_481 0.0555 ms 42.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:30.0864881Z triton_convolution2d_480 0.0699 ms 34.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:30.0866347Z triton_convolution2d_482 0.0746 ms 31.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:30.0867633Z triton_convolution2d_483 0.0746 ms 31.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:30.0868861Z triton_convolution2d_477 0.1005 ms 23.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:30.0870143Z triton_convolution2d_478 0.1045 ms 22.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:30.0871230Z triton_convolution2d_479 0.2026 ms 11.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:54:30.0872079Z SingleProcess AUTOTUNE benchmarking takes 0.1578 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:54:30.2345536Z Autotune Choices Stats: 2025-09-07T10:54:30.2347048Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.017696000635623932, "best_triton_pos": 1, "best_triton_time": 0.05363199859857559, "best_triton_kernel": "triton_convolution2d_526", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T10:54:30.2436492Z AUTOTUNE convolution(8x256x16x16, 256x256x3x3) 2025-09-07T10:54:30.2436862Z strides: [65536, 1, 4096, 256], [2304, 1, 768, 256] 2025-09-07T10:54:30.2437292Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:54:30.2437594Z convolution 0.0177 ms 100.0% 2025-09-07T10:54:30.2438360Z triton_convolution2d_526 0.0536 ms 33.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:30.2439859Z triton_convolution2d_528 0.0670 ms 26.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:30.2441057Z triton_convolution2d_525 0.0678 ms 26.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:30.2442064Z triton_convolution2d_527 0.0748 ms 23.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:30.2443074Z triton_convolution2d_523 0.0992 ms 17.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:30.2444355Z triton_convolution2d_522 0.1011 ms 17.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:30.2445368Z triton_convolution2d_524 0.1955 ms 9.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:54:30.2446174Z SingleProcess AUTOTUNE benchmarking takes 0.1539 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:54:30.4679911Z Autotune Choices Stats: 2025-09-07T10:54:30.4681531Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.02534399926662445, "best_triton_pos": 1, "best_triton_time": 0.110944002866745, "best_triton_kernel": "triton_convolution2d_753", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T10:54:30.4771917Z AUTOTUNE convolution(8x512x16x16, 1024x512x3x3) 2025-09-07T10:54:30.4772250Z strides: [131072, 1, 8192, 512], [4608, 1, 1536, 512] 2025-09-07T10:54:30.4772538Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:54:30.4772796Z convolution 0.0253 ms 100.0% 2025-09-07T10:54:30.4773484Z triton_convolution2d_753 0.1109 ms 22.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:30.4774955Z triton_convolution2d_752 0.1376 ms 18.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:30.4776123Z triton_convolution2d_755 0.1421 ms 17.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:30.4777260Z triton_convolution2d_754 0.1434 ms 17.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:30.4778393Z triton_convolution2d_750 0.2001 ms 12.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:30.4779521Z triton_convolution2d_749 0.2019 ms 12.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:30.4780881Z triton_convolution2d_751 0.2847 ms 8.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:54:30.4781745Z SingleProcess AUTOTUNE benchmarking takes 0.2095 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:54:30.7379421Z Autotune Choices Stats: 2025-09-07T10:54:30.7380455Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_764", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.010495999827980995, "best_triton_pos": 0} 2025-09-07T10:54:30.7471996Z AUTOTUNE addmm(512x1024, 512x1024, 1024x1024) 2025-09-07T10:54:30.7472323Z strides: [0, 1], [1024, 1], [1, 1024] 2025-09-07T10:54:30.7472650Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:54:30.7473353Z triton_mm_764 0.0105 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:54:30.7474376Z bias_addmm 0.0108 ms 97.0% 2025-09-07T10:54:30.7475013Z triton_mm_768 0.0117 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:54:30.7476007Z triton_mm_760 0.0130 ms 80.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:54:30.7477049Z triton_mm_763 0.0132 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:54:30.7478401Z triton_mm_774 0.0140 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:54:30.7479391Z triton_mm_767 0.0141 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:30.7480009Z addmm 0.0143 ms 73.5% 2025-09-07T10:54:30.7480579Z triton_mm_759 0.0147 ms 71.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:54:30.7481367Z triton_mm_773 0.0153 ms 68.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:30.7482061Z SingleProcess AUTOTUNE benchmarking takes 0.2692 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:54:30.9481898Z Autotune Choices Stats: 2025-09-07T10:54:30.9483272Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.017503999173641205, "best_triton_pos": 1, "best_triton_time": 0.10831999778747559, "best_triton_kernel": "triton_convolution2d_798", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T10:54:30.9573332Z AUTOTUNE convolution(8x512x8x8, 512x512x3x3) 2025-09-07T10:54:30.9573636Z strides: [32768, 1, 4096, 512], [4608, 1, 1536, 512] 2025-09-07T10:54:30.9574082Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:54:30.9574344Z convolution 0.0175 ms 100.0% 2025-09-07T10:54:30.9575026Z triton_convolution2d_798 0.1083 ms 16.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:30.9576178Z triton_convolution2d_797 0.1330 ms 13.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:30.9577483Z triton_convolution2d_800 0.1347 ms 13.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:30.9578708Z triton_convolution2d_799 0.1419 ms 12.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:30.9579848Z triton_convolution2d_794 0.2068 ms 8.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:30.9581000Z triton_convolution2d_795 0.2069 ms 8.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:30.9582088Z triton_convolution2d_796 0.2386 ms 7.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:54:30.9582926Z SingleProcess AUTOTUNE benchmarking takes 0.2097 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:54:31.2110363Z Autotune Choices Stats: 2025-09-07T10:54:31.2111337Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_921", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.008960000239312649, "best_triton_pos": 0} 2025-09-07T10:54:31.2207923Z AUTOTUNE addmm(8x1000, 8x1024, 1024x1000) 2025-09-07T10:54:31.2208512Z strides: [0, 1], [1024, 1], [1, 1024] 2025-09-07T10:54:31.2208891Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:54:31.2209609Z triton_mm_921 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:54:31.2210718Z triton_mm_925 0.0095 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:54:31.2211295Z bias_addmm 0.0104 ms 85.9% 2025-09-07T10:54:31.2211841Z triton_mm_929 0.0107 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:54:31.2212729Z triton_mm_933 0.0113 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:54:31.2213618Z triton_mm_920 0.0120 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:54:31.2214765Z triton_mm_919 0.0125 ms 71.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:54:31.2215645Z triton_mm_924 0.0126 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:31.2216524Z triton_mm_918 0.0132 ms 68.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T10:54:31.2217087Z addmm 0.0132 ms 67.8% 2025-09-07T10:54:31.2217498Z SingleProcess AUTOTUNE benchmarking takes 0.2480 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T10:54:35.0527993Z pass 2025-09-07T10:54:39.2424954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:54:39.2426835Z import pynvml # type: ignore[import] 2025-09-07T10:54:42.2857865Z 2025-09-07T10:54:44.1587014Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:54:44.1587382Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:54:44.1635226Z cuda eval deit_base_distilled_patch16_224 2025-09-07T10:54:55.6147637Z Autotune Choices Stats: 2025-09-07T10:54:55.6149543Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.018432000651955605, "best_triton_pos": 1, "best_triton_time": 0.02160000056028366, "best_triton_kernel": "triton_mm_61", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T10:54:55.6244460Z AUTOTUNE mm(1584x768, 768x3072) 2025-09-07T10:54:55.6244846Z strides: [768, 1], [1, 768] 2025-09-07T10:54:55.6245264Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:54:55.6245719Z mm 0.0184 ms 100.0% 2025-09-07T10:54:55.6246642Z triton_mm_61 0.0216 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:55.6248149Z triton_mm_62 0.0222 ms 82.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:55.6249626Z triton_mm_56 0.0232 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:55.6250958Z triton_mm_63 0.0248 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:54:55.6251863Z triton_mm_55 0.0272 ms 67.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:55.6252744Z triton_mm_54 0.0282 ms 65.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:55.6253634Z triton_mm_57 0.0284 ms 65.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:54:55.6254693Z triton_mm_52 0.0299 ms 61.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:54:55.6255580Z triton_mm_58 0.0300 ms 61.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:55.6256351Z SingleProcess AUTOTUNE benchmarking takes 0.2758 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T10:54:56.3782824Z Autotune Choices Stats: 2025-09-07T10:54:56.3784072Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_940", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.0080960001796484, "best_triton_pos": 0} 2025-09-07T10:54:56.3878861Z AUTOTUNE mm(8x768, 768x1000) 2025-09-07T10:54:56.3879138Z strides: [152064, 1], [1, 768] 2025-09-07T10:54:56.3879436Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:54:56.3880125Z triton_mm_940 0.0081 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:54:56.3881687Z triton_mm_944 0.0087 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:54:56.3882306Z mm 0.0089 ms 91.3% 2025-09-07T10:54:56.3883064Z triton_mm_948 0.0096 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:54:56.3884252Z triton_mm_952 0.0100 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:54:56.3885219Z triton_mm_939 0.0104 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:54:56.3886177Z triton_mm_938 0.0105 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:54:56.3887146Z triton_mm_943 0.0110 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:56.3888137Z triton_mm_937 0.0111 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T10:54:56.3889229Z triton_mm_947 0.0117 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:56.3890076Z SingleProcess AUTOTUNE benchmarking takes 0.2142 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T10:54:59.5647860Z Autotune Choices Stats: 2025-09-07T10:54:59.5650938Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.12956799566745758, "best_triton_pos": 1, "best_triton_time": 0.13206399977207184, "best_triton_kernel": "triton_convolution2d_6", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T10:54:59.5744807Z AUTOTUNE convolution(8x3x224x224, 768x3x16x16) 2025-09-07T10:54:59.5745154Z strides: [150528, 50176, 224, 1], [768, 256, 16, 1] 2025-09-07T10:54:59.5745444Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:54:59.5745702Z convolution 0.1296 ms 100.0% 2025-09-07T10:54:59.5746401Z triton_convolution2d_6 0.1321 ms 98.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:59.5747565Z triton_convolution2d_1 0.1479 ms 87.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:59.5748744Z triton_convolution2d_3 0.1481 ms 87.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:59.5749890Z triton_convolution2d_4 0.1755 ms 73.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:59.5751032Z triton_convolution2d_5 0.1951 ms 66.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:54:59.5752117Z triton_convolution2d_0 0.2208 ms 58.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:54:59.5753353Z triton_convolution2d_2 0.4085 ms 31.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=16, KERNEL_W=16, PADDING_H=0, PADDING_W=0, STRIDE_H=16, STRIDE_W=16, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:54:59.5754685Z SingleProcess AUTOTUNE benchmarking takes 0.2253 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:54:59.8470764Z Autotune Choices Stats: 2025-09-07T10:54:59.8472698Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.0161920003592968, "best_triton_pos": 1, "best_triton_time": 0.017503999173641205, "best_triton_kernel": "triton_mm_24", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T10:54:59.8563388Z AUTOTUNE addmm(1584x2304, 1584x768, 768x2304) 2025-09-07T10:54:59.8563876Z strides: [0, 1], [768, 1], [1, 768] 2025-09-07T10:54:59.8564214Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:54:59.8564549Z bias_addmm 0.0162 ms 100.0% 2025-09-07T10:54:59.8565168Z triton_mm_24 0.0175 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:59.8566170Z triton_mm_23 0.0202 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:59.8567147Z triton_mm_25 0.0208 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:54:59.8568400Z triton_mm_18 0.0211 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:59.8569409Z triton_mm_16 0.0226 ms 71.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:59.8570066Z addmm 0.0234 ms 69.1% 2025-09-07T10:54:59.8570703Z triton_mm_14 0.0241 ms 67.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:54:59.8571587Z triton_mm_17 0.0246 ms 65.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:54:59.8572475Z triton_mm_20 0.0247 ms 65.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:54:59.8573278Z SingleProcess AUTOTUNE benchmarking takes 0.2808 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:55:00.0813983Z Autotune Choices Stats: 2025-09-07T10:55:00.0815198Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01056000031530857, "best_triton_pos": 1, "best_triton_time": 0.011872000060975552, "best_triton_kernel": "triton_mm_44", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:55:00.0906846Z AUTOTUNE mm(1584x768, 768x768) 2025-09-07T10:55:00.0907291Z strides: [768, 1], [1, 768] 2025-09-07T10:55:00.0907706Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:55:00.0908147Z mm 0.0106 ms 100.0% 2025-09-07T10:55:00.0909105Z triton_mm_44 0.0119 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:00.0911057Z triton_mm_33 0.0128 ms 82.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:00.0911916Z triton_mm_37 0.0128 ms 82.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:00.0912824Z triton_mm_43 0.0131 ms 80.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:00.0913646Z triton_mm_36 0.0139 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:00.0914632Z triton_mm_40 0.0139 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:00.0915467Z triton_mm_38 0.0143 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:00.0916290Z triton_mm_34 0.0157 ms 67.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:00.0917191Z triton_mm_39 0.0161 ms 65.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:00.0917921Z SingleProcess AUTOTUNE benchmarking takes 0.2334 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T10:55:00.3683933Z Autotune Choices Stats: 2025-09-07T10:55:00.3685211Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01836800016462803, "best_triton_pos": 1, "best_triton_time": 0.02393599972128868, "best_triton_kernel": "triton_mm_82", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:55:00.3781230Z AUTOTUNE mm(1584x3072, 3072x768) 2025-09-07T10:55:00.3781680Z strides: [3072, 1], [1, 3072] 2025-09-07T10:55:00.3782125Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:55:00.3782564Z mm 0.0184 ms 100.0% 2025-09-07T10:55:00.3783505Z triton_mm_82 0.0239 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:00.3785350Z triton_mm_75 0.0298 ms 61.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:00.3786901Z triton_mm_71 0.0305 ms 60.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:00.3788509Z triton_mm_81 0.0316 ms 58.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:00.3790052Z triton_mm_72 0.0328 ms 56.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:00.3791378Z triton_mm_76 0.0332 ms 55.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:00.3792154Z triton_mm_74 0.0355 ms 51.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:00.3792927Z triton_mm_78 0.0357 ms 51.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:00.3793971Z triton_mm_68 0.0437 ms 42.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:00.3794738Z SingleProcess AUTOTUNE benchmarking takes 0.2864 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T10:55:00.6848541Z Autotune Choices Stats: 2025-09-07T10:55:00.6849429Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_923", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.00825599953532219, "best_triton_pos": 0} 2025-09-07T10:55:00.6947945Z AUTOTUNE mm(8x768, 768x1000) 2025-09-07T10:55:00.6948350Z strides: [152064, 1], [1, 768] 2025-09-07T10:55:00.6948783Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:55:00.6949905Z triton_mm_923 0.0083 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:55:00.6951103Z mm 0.0086 ms 96.3% 2025-09-07T10:55:00.6951838Z triton_mm_927 0.0086 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:00.6952807Z triton_mm_931 0.0098 ms 84.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:00.6954256Z triton_mm_935 0.0099 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:00.6955616Z triton_mm_922 0.0102 ms 81.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:55:00.6956631Z triton_mm_921 0.0105 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:00.6964700Z triton_mm_926 0.0107 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:00.6965539Z triton_mm_920 0.0111 ms 74.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T10:55:00.6966326Z triton_mm_933 0.0116 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:00.6967013Z SingleProcess AUTOTUNE benchmarking takes 0.2119 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T10:55:02.4226839Z pass 2025-09-07T10:55:06.2636358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:55:06.2638665Z import pynvml # type: ignore[import] 2025-09-07T10:55:09.2125548Z 2025-09-07T10:55:11.0713213Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:55:11.0713570Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:55:11.0833648Z cuda eval dla102 2025-09-07T10:55:38.9053103Z Autotune Choices Stats: 2025-09-07T10:55:38.9054301Z {"num_choices": 18, "num_triton_choices": 16, "best_kernel": "triton_mm_27", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.012703999876976013, "best_triton_pos": 0} 2025-09-07T10:55:38.9161988Z AUTOTUNE addmm(100352x64, 100352x32, 32x64) 2025-09-07T10:55:38.9162306Z strides: [0, 1], [32, 1], [1, 32] 2025-09-07T10:55:38.9162587Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:38.9163184Z triton_mm_27 0.0127 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:38.9164864Z triton_mm_25 0.0127 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:38.9165842Z triton_mm_24 0.0129 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:38.9166845Z triton_mm_22 0.0129 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:55:38.9167814Z triton_mm_28 0.0130 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:38.9168769Z triton_mm_29 0.0133 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:38.9169723Z triton_mm_30 0.0134 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T10:55:38.9170666Z triton_mm_31 0.0135 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:38.9171918Z triton_mm_23 0.0137 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:38.9172871Z triton_mm_26 0.0138 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:38.9173840Z SingleProcess AUTOTUNE benchmarking takes 0.2522 seconds and 0.0003 seconds precompiling for 18 choices 2025-09-07T10:55:39.4479571Z Autotune Choices Stats: 2025-09-07T10:55:39.4480636Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_66", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.00863999966531992, "best_triton_pos": 0} 2025-09-07T10:55:39.4576508Z AUTOTUNE addmm(25088x128, 25088x32, 32x128) 2025-09-07T10:55:39.4576807Z strides: [0, 1], [32, 1], [1, 32] 2025-09-07T10:55:39.4577163Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:39.4577881Z triton_mm_66 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:39.4578851Z triton_mm_64 0.0088 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:55:39.4579819Z triton_mm_71 0.0089 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:39.4580780Z triton_mm_68 0.0089 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:39.4581754Z triton_mm_67 0.0090 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:39.4583007Z triton_mm_70 0.0090 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:39.4584374Z triton_mm_69 0.0091 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:39.4585337Z triton_mm_72 0.0093 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T10:55:39.4586166Z triton_mm_73 0.0094 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:39.4587001Z triton_mm_74 0.0095 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:39.4587731Z SingleProcess AUTOTUNE benchmarking takes 0.2493 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T10:55:39.9841504Z Autotune Choices Stats: 2025-09-07T10:55:39.9842392Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_130", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.014271999709308147, "best_triton_pos": 0} 2025-09-07T10:55:39.9937060Z AUTOTUNE addmm(25088x128, 25088x256, 256x128) 2025-09-07T10:55:39.9937450Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T10:55:39.9937805Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:39.9938539Z triton_mm_130 0.0143 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:39.9940049Z triton_mm_136 0.0152 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:39.9941051Z triton_mm_129 0.0162 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:39.9942042Z triton_mm_132 0.0162 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:39.9943016Z triton_mm_133 0.0162 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:39.9944233Z triton_mm_135 0.0163 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:39.9945161Z triton_mm_128 0.0163 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:39.9945688Z bias_addmm 0.0165 ms 86.3% 2025-09-07T10:55:39.9946205Z triton_mm_125 0.0168 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:55:39.9947029Z triton_mm_126 0.0174 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:39.9947756Z SingleProcess AUTOTUNE benchmarking takes 0.2722 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:55:40.5341289Z Autotune Choices Stats: 2025-09-07T10:55:40.5342291Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_149", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.011231999844312668, "best_triton_pos": 0} 2025-09-07T10:55:40.5436665Z AUTOTUNE addmm(25088x128, 25088x128, 128x128) 2025-09-07T10:55:40.5437481Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T10:55:40.5437822Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:40.5438546Z triton_mm_149 0.0112 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:40.5439533Z triton_mm_147 0.0119 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:40.5440525Z triton_mm_151 0.0120 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:40.5441511Z triton_mm_144 0.0121 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:55:40.5442474Z triton_mm_150 0.0123 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:40.5443443Z triton_mm_155 0.0123 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:40.5444756Z triton_mm_148 0.0123 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:40.5445964Z triton_mm_152 0.0124 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:40.5446950Z triton_mm_154 0.0124 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:40.5447572Z bias_addmm 0.0132 ms 85.0% 2025-09-07T10:55:40.5448044Z SingleProcess AUTOTUNE benchmarking takes 0.2716 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:55:41.0697372Z Autotune Choices Stats: 2025-09-07T10:55:41.0698423Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_86", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.010304000228643417, "best_triton_pos": 0} 2025-09-07T10:55:41.0794528Z AUTOTUNE addmm(25088x64, 25088x128, 128x64) 2025-09-07T10:55:41.0794840Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T10:55:41.0795220Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:41.0795938Z triton_mm_86 0.0103 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:41.0797053Z triton_mm_82 0.0104 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:41.0798050Z triton_mm_83 0.0106 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:41.0799051Z triton_mm_84 0.0108 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:41.0800043Z triton_mm_87 0.0108 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:41.0801476Z triton_mm_85 0.0109 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:41.0802561Z triton_mm_89 0.0110 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:41.0803538Z triton_mm_91 0.0110 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:41.0805681Z triton_mm_79 0.0113 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:41.0806652Z triton_mm_81 0.0113 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:55:41.0807496Z SingleProcess AUTOTUNE benchmarking takes 0.2657 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T10:55:41.6053115Z Autotune Choices Stats: 2025-09-07T10:55:41.6054284Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_190", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.009184000082314014, "best_triton_pos": 0} 2025-09-07T10:55:41.6150621Z AUTOTUNE addmm(6272x256, 6272x128, 128x256) 2025-09-07T10:55:41.6150912Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T10:55:41.6151252Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:41.6152321Z triton_mm_190 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:41.6153342Z triton_mm_195 0.0092 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:41.6154582Z triton_mm_191 0.0092 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:41.6155564Z triton_mm_192 0.0093 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:41.6156408Z triton_mm_194 0.0093 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:41.6157034Z bias_addmm 0.0095 ms 97.0% 2025-09-07T10:55:41.6157574Z triton_mm_197 0.0095 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:41.6158446Z triton_mm_193 0.0095 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:41.6159293Z triton_mm_196 0.0097 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:41.6160143Z triton_mm_200 0.0097 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:41.6160883Z SingleProcess AUTOTUNE benchmarking takes 0.2768 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:55:42.1444210Z Autotune Choices Stats: 2025-09-07T10:55:42.1445914Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.01228800043463707, "best_triton_pos": 1, "best_triton_time": 0.012415999546647072, "best_triton_kernel": "triton_mm_258", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T10:55:42.1543335Z AUTOTUNE addmm(6272x256, 6272x512, 512x256) 2025-09-07T10:55:42.1543654Z strides: [0, 1], [512, 1], [1, 512] 2025-09-07T10:55:42.1544176Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:42.1544517Z bias_addmm 0.0123 ms 100.0% 2025-09-07T10:55:42.1545156Z triton_mm_258 0.0124 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:42.1546172Z triton_mm_254 0.0126 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:42.1547178Z triton_mm_265 0.0131 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:42.1548173Z triton_mm_264 0.0135 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:42.1549157Z triton_mm_257 0.0137 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:42.1550127Z triton_mm_261 0.0141 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:42.1551376Z triton_mm_256 0.0146 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:42.1552405Z triton_mm_259 0.0148 ms 82.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:42.1553388Z triton_mm_260 0.0152 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:42.1554424Z SingleProcess AUTOTUNE benchmarking takes 0.2713 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:55:42.8099872Z Autotune Choices Stats: 2025-09-07T10:55:42.8101223Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.013567999936640263, "best_triton_pos": 1, "best_triton_time": 0.01484800036996603, "best_triton_kernel": "triton_mm_363", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8"} 2025-09-07T10:55:42.8199349Z AUTOTUNE addmm(6272x256, 6272x768, 768x256) 2025-09-07T10:55:42.8199713Z strides: [0, 1], [768, 1], [1, 768] 2025-09-07T10:55:42.8200044Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:42.8200404Z bias_addmm 0.0136 ms 100.0% 2025-09-07T10:55:42.8201044Z triton_mm_363 0.0148 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:42.8202054Z triton_mm_374 0.0149 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:42.8203055Z triton_mm_367 0.0150 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:42.8205009Z triton_mm_373 0.0156 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:42.8206105Z triton_mm_366 0.0166 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:42.8207189Z triton_mm_370 0.0170 ms 79.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:42.8208152Z triton_mm_368 0.0171 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:42.8208765Z addmm 0.0173 ms 78.5% 2025-09-07T10:55:42.8209345Z triton_mm_364 0.0184 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:42.8210197Z SingleProcess AUTOTUNE benchmarking takes 0.2765 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:55:43.1568411Z Autotune Choices Stats: 2025-09-07T10:55:43.1569716Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.015424000099301338, "best_triton_pos": 1, "best_triton_time": 0.017503999173641205, "best_triton_kernel": "triton_mm_592", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:55:43.1666982Z AUTOTUNE addmm(6272x256, 6272x1152, 1152x256) 2025-09-07T10:55:43.1667357Z strides: [0, 1], [1152, 1], [1, 1152] 2025-09-07T10:55:43.1667703Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:43.1668051Z bias_addmm 0.0154 ms 100.0% 2025-09-07T10:55:43.1669246Z triton_mm_592 0.0175 ms 88.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:43.1670284Z triton_mm_585 0.0187 ms 82.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:43.1671262Z triton_mm_581 0.0191 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:43.1671877Z addmm 0.0193 ms 79.9% 2025-09-07T10:55:43.1672460Z triton_mm_591 0.0203 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:43.1673442Z triton_mm_586 0.0207 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:43.1674928Z triton_mm_584 0.0215 ms 71.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:43.1675930Z triton_mm_588 0.0221 ms 69.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:43.1676782Z triton_mm_582 0.0224 ms 68.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:43.1677612Z SingleProcess AUTOTUNE benchmarking takes 0.2765 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:55:43.4503201Z Autotune Choices Stats: 2025-09-07T10:55:43.4504822Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.010432000271975994, "best_triton_pos": 1, "best_triton_time": 0.010463999584317207, "best_triton_kernel": "triton_mm_600", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8"} 2025-09-07T10:55:43.4603445Z AUTOTUNE addmm(6272x256, 6272x256, 256x256) 2025-09-07T10:55:43.4604356Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T10:55:43.4604701Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:43.4605058Z bias_addmm 0.0104 ms 100.0% 2025-09-07T10:55:43.4605813Z triton_mm_600 0.0105 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:43.4606774Z triton_mm_604 0.0107 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:43.4607747Z triton_mm_611 0.0111 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:43.4608718Z triton_mm_603 0.0111 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:43.4609673Z triton_mm_607 0.0112 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:43.4610626Z triton_mm_602 0.0114 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:43.4611575Z triton_mm_606 0.0114 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:43.4612754Z triton_mm_610 0.0114 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:43.4613869Z triton_mm_609 0.0118 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:43.4614724Z SingleProcess AUTOTUNE benchmarking takes 0.2701 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:55:43.8434313Z Autotune Choices Stats: 2025-09-07T10:55:43.8435400Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_209", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.009855999611318111, "best_triton_pos": 0} 2025-09-07T10:55:43.8536986Z AUTOTUNE addmm(6272x128, 6272x256, 256x128) 2025-09-07T10:55:43.8537290Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T10:55:43.8537667Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:43.8538407Z triton_mm_209 0.0099 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:43.8539071Z bias_addmm 0.0101 ms 97.2% 2025-09-07T10:55:43.8539700Z triton_mm_214 0.0104 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:43.8540688Z triton_mm_205 0.0106 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:43.8541651Z triton_mm_213 0.0107 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:43.8542972Z triton_mm_203 0.0108 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:55:43.8544165Z triton_mm_204 0.0109 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:43.8545294Z triton_mm_212 0.0109 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:43.8546224Z triton_mm_216 0.0110 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:43.8547055Z triton_mm_210 0.0111 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:43.8547798Z SingleProcess AUTOTUNE benchmarking takes 0.2727 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:55:44.4637538Z Autotune Choices Stats: 2025-09-07T10:55:44.4638612Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_650", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.008960000239312649, "best_triton_pos": 0} 2025-09-07T10:55:44.4742110Z AUTOTUNE addmm(1568x512, 1568x256, 256x512) 2025-09-07T10:55:44.4742493Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T10:55:44.4742839Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:44.4743587Z triton_mm_650 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:44.4745867Z triton_mm_652 0.0090 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:44.4746879Z triton_mm_645 0.0093 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:44.4747429Z bias_addmm 0.0096 ms 93.6% 2025-09-07T10:55:44.4747946Z triton_mm_649 0.0097 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:44.4748797Z triton_mm_647 0.0103 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:44.4749629Z triton_mm_648 0.0103 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:44.4750468Z triton_mm_651 0.0103 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:44.4751315Z triton_mm_655 0.0103 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:44.4752155Z triton_mm_639 0.0104 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:55:44.4752896Z SingleProcess AUTOTUNE benchmarking takes 0.2767 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:55:45.0186572Z Autotune Choices Stats: 2025-09-07T10:55:45.0187906Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.011071999557316303, "best_triton_pos": 1, "best_triton_time": 0.012191999703645706, "best_triton_kernel": "triton_mm_714", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T10:55:45.0289152Z AUTOTUNE addmm(1568x512, 1568x1024, 1024x512) 2025-09-07T10:55:45.0289675Z strides: [0, 1], [1024, 1], [1, 1024] 2025-09-07T10:55:45.0289992Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:45.0290322Z bias_addmm 0.0111 ms 100.0% 2025-09-07T10:55:45.0290934Z triton_mm_714 0.0122 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:45.0291901Z triton_mm_709 0.0137 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:45.0292864Z triton_mm_710 0.0140 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:45.0294108Z triton_mm_713 0.0140 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:45.0295083Z triton_mm_720 0.0141 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:45.0295695Z addmm 0.0156 ms 71.0% 2025-09-07T10:55:45.0296275Z triton_mm_719 0.0156 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:45.0297447Z triton_mm_716 0.0158 ms 70.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:45.0298288Z triton_mm_712 0.0158 ms 69.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:45.0299016Z SingleProcess AUTOTUNE benchmarking takes 0.2776 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:55:45.6859668Z Autotune Choices Stats: 2025-09-07T10:55:45.6861018Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.012575999833643436, "best_triton_pos": 1, "best_triton_time": 0.013344000093638897, "best_triton_kernel": "triton_mm_823", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T10:55:45.6961187Z AUTOTUNE addmm(1568x512, 1568x1536, 1536x512) 2025-09-07T10:55:45.6961544Z strides: [0, 1], [1536, 1], [1, 1536] 2025-09-07T10:55:45.6961951Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:45.6962296Z bias_addmm 0.0126 ms 100.0% 2025-09-07T10:55:45.6962946Z triton_mm_823 0.0133 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:45.6964413Z triton_mm_829 0.0164 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:45.6965413Z triton_mm_819 0.0166 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:45.6966520Z triton_mm_818 0.0167 ms 75.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:45.6967127Z addmm 0.0179 ms 70.3% 2025-09-07T10:55:45.6968219Z triton_mm_822 0.0180 ms 69.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:45.6969179Z triton_mm_828 0.0198 ms 63.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:45.6970272Z triton_mm_821 0.0200 ms 63.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:45.6971224Z triton_mm_825 0.0200 ms 62.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:45.6972061Z SingleProcess AUTOTUNE benchmarking takes 0.2802 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:55:46.0406099Z Autotune Choices Stats: 2025-09-07T10:55:46.0407576Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.01500799972563982, "best_triton_pos": 1, "best_triton_time": 0.015072000212967396, "best_triton_kernel": "triton_mm_1041", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T10:55:46.0509802Z AUTOTUNE addmm(1568x512, 1568x2048, 2048x512) 2025-09-07T10:55:46.0510136Z strides: [0, 1], [2048, 1], [1, 2048] 2025-09-07T10:55:46.0510493Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:46.0510829Z bias_addmm 0.0150 ms 100.0% 2025-09-07T10:55:46.0511468Z triton_mm_1041 0.0151 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:46.0512911Z triton_mm_1047 0.0188 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:46.0514325Z triton_mm_1037 0.0199 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:46.0514974Z addmm 0.0205 ms 73.1% 2025-09-07T10:55:46.0515562Z triton_mm_1040 0.0211 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:46.0516525Z triton_mm_1036 0.0211 ms 71.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:46.0517460Z triton_mm_1046 0.0229 ms 65.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:46.0518279Z triton_mm_1039 0.0248 ms 60.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:46.0519068Z triton_mm_1043 0.0248 ms 60.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:46.0519770Z SingleProcess AUTOTUNE benchmarking takes 0.2833 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:55:46.4745786Z Autotune Choices Stats: 2025-09-07T10:55:46.4747140Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.01696000061929226, "best_triton_pos": 1, "best_triton_time": 0.017920000478625298, "best_triton_kernel": "triton_mm_1477", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T10:55:46.4851634Z AUTOTUNE addmm(1568x512, 1568x2816, 2816x512) 2025-09-07T10:55:46.4851947Z strides: [0, 1], [2816, 1], [1, 2816] 2025-09-07T10:55:46.4852262Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:46.4852585Z bias_addmm 0.0170 ms 100.0% 2025-09-07T10:55:46.4853204Z triton_mm_1477 0.0179 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:46.4854341Z addmm 0.0216 ms 78.6% 2025-09-07T10:55:46.4854940Z triton_mm_1483 0.0229 ms 74.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:46.4855939Z triton_mm_1473 0.0230 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:46.4856914Z triton_mm_1472 0.0268 ms 63.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:46.4857861Z triton_mm_1476 0.0268 ms 63.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:46.4858706Z triton_mm_1482 0.0292 ms 58.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:46.4859543Z triton_mm_1475 0.0322 ms 52.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:46.4860413Z triton_mm_1479 0.0325 ms 52.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:46.4861398Z SingleProcess AUTOTUNE benchmarking takes 0.2990 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:55:46.7727924Z Autotune Choices Stats: 2025-09-07T10:55:46.7729201Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.00979200005531311, "best_triton_pos": 1, "best_triton_time": 0.009920000098645687, "best_triton_kernel": "triton_mm_1496", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T10:55:46.7833323Z AUTOTUNE addmm(1568x512, 1568x512, 512x512) 2025-09-07T10:55:46.7833586Z strides: [0, 1], [512, 1], [1, 512] 2025-09-07T10:55:46.7834203Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:46.7834487Z bias_addmm 0.0098 ms 100.0% 2025-09-07T10:55:46.7835037Z triton_mm_1496 0.0099 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:46.7835895Z triton_mm_1491 0.0107 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:46.7836728Z triton_mm_1495 0.0107 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:46.7837664Z triton_mm_1492 0.0115 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:46.7838495Z triton_mm_1494 0.0116 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:46.7839330Z triton_mm_1502 0.0116 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:46.7840460Z triton_mm_1498 0.0117 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:46.7841423Z triton_mm_1501 0.0117 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:46.7842257Z triton_mm_1485 0.0124 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:55:46.7842998Z SingleProcess AUTOTUNE benchmarking takes 0.2739 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:55:47.1711673Z Autotune Choices Stats: 2025-09-07T10:55:47.1712734Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_665", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008832000195980072, "best_triton_pos": 0} 2025-09-07T10:55:47.1819680Z AUTOTUNE addmm(1568x256, 1568x512, 512x256) 2025-09-07T10:55:47.1820147Z strides: [0, 1], [512, 1], [1, 512] 2025-09-07T10:55:47.1820498Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:47.1821213Z triton_mm_665 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:47.1821853Z bias_addmm 0.0093 ms 94.5% 2025-09-07T10:55:47.1822461Z triton_mm_669 0.0098 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:47.1824021Z triton_mm_664 0.0100 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:47.1825023Z triton_mm_668 0.0105 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:47.1825996Z triton_mm_661 0.0107 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:47.1827001Z triton_mm_658 0.0108 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:55:47.1827902Z triton_mm_660 0.0109 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:47.1828731Z triton_mm_671 0.0112 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:47.1829562Z triton_mm_667 0.0113 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:47.1830302Z SingleProcess AUTOTUNE benchmarking takes 0.2767 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:55:47.9005980Z Autotune Choices Stats: 2025-09-07T10:55:47.9007158Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_1537", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008671999908983707, "best_triton_pos": 0} 2025-09-07T10:55:47.9110784Z AUTOTUNE addmm(392x1024, 392x512, 512x1024) 2025-09-07T10:55:47.9111151Z strides: [0, 1], [512, 1], [1, 512] 2025-09-07T10:55:47.9112114Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:47.9112886Z triton_mm_1537 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:47.9113543Z bias_addmm 0.0093 ms 93.4% 2025-09-07T10:55:47.9114756Z triton_mm_1541 0.0093 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:47.9115779Z triton_mm_1536 0.0097 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:47.9116760Z triton_mm_1540 0.0101 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:47.9117776Z triton_mm_1532 0.0106 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:47.9118571Z triton_mm_1533 0.0106 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:47.9119366Z triton_mm_1530 0.0108 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:55:47.9120153Z triton_mm_1543 0.0108 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:47.9120980Z triton_mm_1539 0.0111 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:47.9121940Z SingleProcess AUTOTUNE benchmarking takes 0.2748 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:55:48.9569105Z Autotune Choices Stats: 2025-09-07T10:55:48.9570058Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_1552", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009344000369310379, "best_triton_pos": 0} 2025-09-07T10:55:48.9672013Z AUTOTUNE addmm(392x512, 392x1024, 1024x512) 2025-09-07T10:55:48.9672490Z strides: [0, 1], [1024, 1], [1, 1024] 2025-09-07T10:55:48.9672958Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:48.9674364Z triton_mm_1552 0.0093 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:48.9675864Z triton_mm_1556 0.0098 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:48.9676767Z bias_addmm 0.0101 ms 92.1% 2025-09-07T10:55:48.9677798Z triton_mm_1560 0.0111 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:48.9679219Z triton_mm_1555 0.0123 ms 75.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:48.9680611Z triton_mm_1551 0.0124 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:48.9681994Z triton_mm_1550 0.0127 ms 73.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:48.9684196Z triton_mm_1566 0.0131 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:48.9685605Z triton_mm_1559 0.0132 ms 71.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:48.9687231Z triton_mm_1549 0.0135 ms 69.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:55:48.9688462Z SingleProcess AUTOTUNE benchmarking takes 0.2807 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:55:49.9740770Z Autotune Choices Stats: 2025-09-07T10:55:49.9742473Z {"num_choices": 6, "num_triton_choices": 5, "best_kernel": "triton_convolution2d_4", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.10515200346708298, "best_triton_pos": 0} 2025-09-07T10:55:49.9858428Z AUTOTUNE convolution(8x3x224x224, 16x3x7x7) 2025-09-07T10:55:49.9858925Z strides: [150528, 1, 672, 3], [147, 1, 21, 3] 2025-09-07T10:55:49.9859415Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:55:49.9860588Z triton_convolution2d_4 0.1052 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:49.9862434Z triton_convolution2d_1 0.1147 ms 91.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:49.9865108Z triton_convolution2d_3 0.1161 ms 90.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:49.9866990Z triton_convolution2d_0 0.1344 ms 78.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:49.9868855Z triton_convolution2d_2 0.1648 ms 63.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:55:49.9869995Z convolution 0.2140 ms 49.1% 2025-09-07T10:55:49.9870715Z SingleProcess AUTOTUNE benchmarking takes 0.1920 seconds and 0.0002 seconds precompiling for 6 choices 2025-09-07T10:55:50.0644566Z Autotune Choices Stats: 2025-09-07T10:55:50.0646208Z {"num_choices": 6, "num_triton_choices": 5, "best_kernel": "triton_convolution2d_6", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.02879999950528145, "best_triton_pos": 0} 2025-09-07T10:55:50.0747334Z AUTOTUNE convolution(8x16x224x224, 16x16x3x3) 2025-09-07T10:55:50.0747843Z strides: [802816, 1, 3584, 16], [144, 1, 48, 16] 2025-09-07T10:55:50.0748288Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:55:50.0749446Z triton_convolution2d_6 0.0288 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:50.0751314Z triton_convolution2d_5 0.0300 ms 95.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:50.0753163Z triton_convolution2d_9 0.0339 ms 84.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:50.0755708Z triton_convolution2d_8 0.0365 ms 78.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:50.0757886Z triton_convolution2d_7 0.0380 ms 75.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:55:50.0759037Z convolution 0.0539 ms 53.4% 2025-09-07T10:55:50.0759735Z SingleProcess AUTOTUNE benchmarking takes 0.0883 seconds and 0.0002 seconds precompiling for 6 choices 2025-09-07T10:55:50.1600760Z Autotune Choices Stats: 2025-09-07T10:55:50.1602388Z {"num_choices": 7, "num_triton_choices": 6, "best_kernel": "triton_convolution2d_15", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.018400000408291817, "best_triton_pos": 0} 2025-09-07T10:55:50.1702275Z AUTOTUNE convolution(8x16x224x224, 32x16x3x3) 2025-09-07T10:55:50.1702798Z strides: [802816, 1, 3584, 16], [144, 1, 48, 16] 2025-09-07T10:55:50.1703265Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:55:50.1704624Z triton_convolution2d_15 0.0184 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:50.1706504Z triton_convolution2d_11 0.0188 ms 98.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:50.1708806Z triton_convolution2d_10 0.0205 ms 89.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:50.1710660Z triton_convolution2d_14 0.0209 ms 88.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:50.1712525Z triton_convolution2d_13 0.0213 ms 86.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:50.1713672Z convolution 0.0277 ms 66.5% 2025-09-07T10:55:50.1714979Z triton_convolution2d_12 0.0304 ms 60.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:55:50.1716497Z SingleProcess AUTOTUNE benchmarking takes 0.0950 seconds and 0.0002 seconds precompiling for 7 choices 2025-09-07T10:55:50.2705619Z Autotune Choices Stats: 2025-09-07T10:55:50.2707597Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.017216000705957413, "best_triton_pos": 1, "best_triton_time": 0.020640000700950623, "best_triton_kernel": "triton_convolution2d_36", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T10:55:50.2807654Z AUTOTUNE convolution(8x64x112x112, 64x64x3x3) 2025-09-07T10:55:50.2808337Z strides: [802816, 1, 7168, 64], [576, 1, 192, 64] 2025-09-07T10:55:50.2808831Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:55:50.2809253Z convolution 0.0172 ms 100.0% 2025-09-07T10:55:50.2810361Z triton_convolution2d_36 0.0206 ms 83.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:50.2812503Z triton_convolution2d_37 0.0208 ms 82.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:50.2814964Z triton_convolution2d_35 0.0214 ms 80.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:50.2816854Z triton_convolution2d_38 0.0259 ms 66.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:50.2818734Z triton_convolution2d_32 0.0289 ms 59.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:50.2820572Z triton_convolution2d_33 0.0332 ms 51.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:50.2822454Z triton_convolution2d_34 0.0574 ms 30.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:55:50.2824130Z SingleProcess AUTOTUNE benchmarking takes 0.1100 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:55:50.5352047Z Autotune Choices Stats: 2025-09-07T10:55:50.5354265Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_45", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4", "best_time": 0.010239999741315842, "best_triton_pos": 0} 2025-09-07T10:55:50.5457748Z AUTOTUNE addmm(25088x128, 25088x64, 64x128) 2025-09-07T10:55:50.5458108Z strides: [0, 1], [64, 1], [1, 64] 2025-09-07T10:55:50.5458477Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:50.5459277Z triton_mm_45 0.0102 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:55:50.5460243Z triton_mm_47 0.0102 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:50.5461258Z triton_mm_51 0.0103 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:50.5462286Z triton_mm_50 0.0104 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:50.5463261Z triton_mm_53 0.0104 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:50.5464415Z triton_mm_48 0.0104 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:50.5465381Z triton_mm_52 0.0104 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:50.5466326Z triton_mm_49 0.0105 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:55:50.5467589Z triton_mm_46 0.0106 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:50.5468496Z triton_mm_56 0.0108 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:50.5469296Z SingleProcess AUTOTUNE benchmarking takes 0.2639 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:55:50.6450270Z Autotune Choices Stats: 2025-09-07T10:55:50.6451545Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.015936000272631645, "best_triton_pos": 1, "best_triton_time": 0.018400000408291817, "best_triton_kernel": "triton_convolution2d_98", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T10:55:50.6555444Z AUTOTUNE convolution(8x64x56x56, 64x64x3x3) 2025-09-07T10:55:50.6555731Z strides: [200704, 1, 3584, 64], [576, 1, 192, 64] 2025-09-07T10:55:50.6555996Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:55:50.6556230Z convolution 0.0159 ms 100.0% 2025-09-07T10:55:50.6556960Z triton_convolution2d_98 0.0184 ms 86.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:50.6558035Z triton_convolution2d_97 0.0185 ms 86.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:50.6559391Z triton_convolution2d_96 0.0196 ms 81.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:50.6560455Z triton_convolution2d_93 0.0236 ms 67.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:50.6561508Z triton_convolution2d_99 0.0250 ms 63.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:50.6562557Z triton_convolution2d_94 0.0314 ms 50.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:50.6563608Z triton_convolution2d_95 0.0511 ms 31.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:55:50.6564750Z SingleProcess AUTOTUNE benchmarking takes 0.1093 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:55:50.7710180Z Autotune Choices Stats: 2025-09-07T10:55:50.7711542Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.015072000212967396, "best_triton_pos": 1, "best_triton_time": 0.03110400028526783, "best_triton_kernel": "triton_convolution2d_161", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T10:55:50.7819345Z AUTOTUNE convolution(8x128x56x56, 128x128x3x3) 2025-09-07T10:55:50.7819881Z strides: [401408, 1, 7168, 128], [1152, 1, 384, 128] 2025-09-07T10:55:50.7820356Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:55:50.7820774Z convolution 0.0151 ms 100.0% 2025-09-07T10:55:50.7821884Z triton_convolution2d_161 0.0311 ms 48.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:50.7824372Z triton_convolution2d_162 0.0351 ms 42.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:50.7826445Z triton_convolution2d_160 0.0380 ms 39.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:50.7828330Z triton_convolution2d_163 0.0388 ms 38.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:50.7830195Z triton_convolution2d_157 0.0482 ms 31.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:50.7832064Z triton_convolution2d_158 0.0521 ms 28.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:50.7834143Z triton_convolution2d_159 0.1020 ms 14.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:55:50.7835604Z SingleProcess AUTOTUNE benchmarking takes 0.1224 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:55:50.8963164Z Autotune Choices Stats: 2025-09-07T10:55:50.8965201Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.014816000126302242, "best_triton_pos": 1, "best_triton_time": 0.029888000339269638, "best_triton_kernel": "triton_convolution2d_225", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T10:55:50.9072637Z AUTOTUNE convolution(8x128x28x28, 128x128x3x3) 2025-09-07T10:55:50.9073000Z strides: [100352, 1, 3584, 128], [1152, 1, 384, 128] 2025-09-07T10:55:50.9073327Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:55:50.9073607Z convolution 0.0148 ms 100.0% 2025-09-07T10:55:50.9074549Z triton_convolution2d_225 0.0299 ms 49.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:50.9075831Z triton_convolution2d_226 0.0325 ms 45.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:50.9077172Z triton_convolution2d_224 0.0380 ms 39.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:50.9078423Z triton_convolution2d_227 0.0387 ms 38.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:50.9079445Z triton_convolution2d_221 0.0466 ms 31.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:50.9080441Z triton_convolution2d_222 0.0490 ms 30.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:50.9081617Z triton_convolution2d_223 0.0972 ms 15.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:55:50.9082416Z SingleProcess AUTOTUNE benchmarking takes 0.1213 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:55:51.0861184Z Autotune Choices Stats: 2025-09-07T10:55:51.0862607Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.0144640002399683, "best_triton_pos": 1, "best_triton_time": 0.054016001522541046, "best_triton_kernel": "triton_convolution2d_616", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T10:55:51.0972665Z AUTOTUNE convolution(8x256x28x28, 256x256x3x3) 2025-09-07T10:55:51.0973091Z strides: [200704, 1, 7168, 256], [2304, 1, 768, 256] 2025-09-07T10:55:51.0973405Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:55:51.0973654Z convolution 0.0145 ms 100.0% 2025-09-07T10:55:51.0975343Z triton_convolution2d_616 0.0540 ms 26.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:51.0976527Z triton_convolution2d_615 0.0679 ms 21.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:51.0977689Z triton_convolution2d_618 0.0680 ms 21.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:51.0979146Z triton_convolution2d_617 0.0756 ms 19.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:51.0980285Z triton_convolution2d_613 0.0948 ms 15.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:51.0981358Z triton_convolution2d_612 0.1021 ms 14.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:51.0982421Z triton_convolution2d_614 0.2029 ms 7.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:55:51.0983260Z SingleProcess AUTOTUNE benchmarking takes 0.1562 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:55:51.2455023Z Autotune Choices Stats: 2025-09-07T10:55:51.2456361Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.014015999622642994, "best_triton_pos": 1, "best_triton_time": 0.05344000086188316, "best_triton_kernel": "triton_convolution2d_680", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T10:55:51.2559751Z AUTOTUNE convolution(8x256x14x14, 256x256x3x3) 2025-09-07T10:55:51.2560116Z strides: [50176, 1, 3584, 256], [2304, 1, 768, 256] 2025-09-07T10:55:51.2560447Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:55:51.2560742Z convolution 0.0140 ms 100.0% 2025-09-07T10:55:51.2561509Z triton_convolution2d_680 0.0534 ms 26.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:51.2563042Z triton_convolution2d_679 0.0687 ms 20.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:51.2564587Z triton_convolution2d_682 0.0713 ms 19.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:51.2565953Z triton_convolution2d_681 0.0746 ms 18.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:51.2567188Z triton_convolution2d_677 0.0881 ms 15.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:51.2568517Z triton_convolution2d_676 0.1044 ms 13.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:51.2569663Z triton_convolution2d_678 0.1900 ms 7.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:55:51.2570560Z SingleProcess AUTOTUNE benchmarking takes 0.1547 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:55:51.5270275Z Autotune Choices Stats: 2025-09-07T10:55:51.5271985Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.017216000705957413, "best_triton_pos": 1, "best_triton_time": 0.10761599987745285, "best_triton_kernel": "triton_convolution2d_1507", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T10:55:51.5374913Z AUTOTUNE convolution(8x512x14x14, 512x512x3x3) 2025-09-07T10:55:51.5375341Z strides: [100352, 1, 7168, 512], [4608, 1, 1536, 512] 2025-09-07T10:55:51.5375654Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:55:51.5375924Z convolution 0.0172 ms 100.0% 2025-09-07T10:55:51.5376620Z triton_convolution2d_1507 0.1076 ms 16.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:51.5377783Z triton_convolution2d_1506 0.1311 ms 13.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:51.5378949Z triton_convolution2d_1509 0.1340 ms 12.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:51.5380066Z triton_convolution2d_1508 0.1411 ms 12.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:51.5381135Z triton_convolution2d_1504 0.1966 ms 8.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:51.5382205Z triton_convolution2d_1503 0.2017 ms 8.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:51.5383270Z triton_convolution2d_1505 0.2778 ms 6.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:55:51.5384554Z SingleProcess AUTOTUNE benchmarking takes 0.2088 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:55:51.7371479Z Autotune Choices Stats: 2025-09-07T10:55:51.7372906Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.016863999888300896, "best_triton_pos": 1, "best_triton_time": 0.10684800148010254, "best_triton_kernel": "triton_convolution2d_1571", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T10:55:51.7476115Z AUTOTUNE convolution(8x512x7x7, 512x512x3x3) 2025-09-07T10:55:51.7476472Z strides: [25088, 1, 3584, 512], [4608, 1, 1536, 512] 2025-09-07T10:55:51.7476799Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:55:51.7477210Z convolution 0.0169 ms 100.0% 2025-09-07T10:55:51.7477960Z triton_convolution2d_1571 0.1068 ms 15.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:51.7479144Z triton_convolution2d_1570 0.1332 ms 12.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:51.7480168Z triton_convolution2d_1572 0.1397 ms 12.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:51.7481402Z triton_convolution2d_1573 0.1428 ms 11.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:55:51.7482421Z triton_convolution2d_1568 0.1891 ms 8.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:51.7483434Z triton_convolution2d_1567 0.2000 ms 8.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:55:51.7484592Z triton_convolution2d_1569 0.2264 ms 7.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:55:51.7485385Z SingleProcess AUTOTUNE benchmarking takes 0.2060 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:55:52.0311319Z Autotune Choices Stats: 2025-09-07T10:55:52.0312562Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.013407999649643898, "best_triton_pos": 1, "best_triton_time": 0.013439999893307686, "best_triton_kernel": "triton_mm_1601", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T10:55:52.0417663Z AUTOTUNE addmm(392x1024, 392x2560, 2560x1024) 2025-09-07T10:55:52.0417947Z strides: [0, 1], [2560, 1], [1, 2560] 2025-09-07T10:55:52.0418249Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:55:52.0418620Z bias_addmm 0.0134 ms 100.0% 2025-09-07T10:55:52.0419341Z triton_mm_1601 0.0134 ms 99.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:52.0420390Z triton_mm_1605 0.0157 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:55:52.0421204Z addmm 0.0176 ms 76.3% 2025-09-07T10:55:52.0421786Z triton_mm_1597 0.0186 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:55:52.0422839Z triton_mm_1611 0.0208 ms 64.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:52.0424016Z triton_mm_1600 0.0234 ms 57.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:55:52.0424982Z triton_mm_1604 0.0243 ms 55.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:55:52.0425945Z triton_mm_1594 0.0243 ms 55.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:55:52.0426917Z triton_mm_1596 0.0250 ms 53.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:55:52.0427760Z SingleProcess AUTOTUNE benchmarking takes 0.2900 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:55:57.1560457Z pass 2025-09-07T10:56:01.7481811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:56:01.7483108Z import pynvml # type: ignore[import] 2025-09-07T10:56:04.7623060Z 2025-09-07T10:56:07.0718849Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:56:07.0720104Z loading model: 0it [00:02, ?it/s] 2025-09-07T10:56:07.0786694Z cuda eval dm_nfnet_f0 2025-09-07T10:56:20.8801962Z Autotune Choices Stats: 2025-09-07T10:56:20.8803439Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.014592000283300877, "best_triton_pos": 1, "best_triton_time": 0.01894400082528591, "best_triton_kernel": "triton_convolution2d_68", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:56:20.8914609Z AUTOTUNE convolution(8x128x64x64, 256x128x1x1) 2025-09-07T10:56:20.8917513Z strides: [524288, 4096, 64, 1], [128, 1, 1, 1] 2025-09-07T10:56:20.8917885Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:20.8918190Z convolution 0.0146 ms 100.0% 2025-09-07T10:56:20.8918946Z triton_convolution2d_68 0.0189 ms 77.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:20.8919997Z triton_convolution2d_67 0.0202 ms 72.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:20.8921009Z triton_convolution2d_70 0.0212 ms 68.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:20.8922003Z triton_convolution2d_64 0.0240 ms 60.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:20.8922996Z triton_convolution2d_65 0.0253 ms 57.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:20.8924491Z triton_convolution2d_69 0.0277 ms 52.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:20.8925649Z triton_convolution2d_66 0.0311 ms 47.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:56:20.8926284Z conv1x1_via_mm 0.1099 ms 13.3% 2025-09-07T10:56:20.8926690Z SingleProcess AUTOTUNE benchmarking takes 0.1506 seconds and 0.0003 seconds precompiling for 9 choices 2025-09-07T10:56:21.3486356Z Autotune Choices Stats: 2025-09-07T10:56:21.3487581Z {"num_choices": 7, "num_triton_choices": 6, "best_kernel": "triton_convolution2d_6", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.01990400068461895, "best_triton_pos": 0} 2025-09-07T10:56:21.3593009Z AUTOTUNE convolution(8x16x128x128, 32x16x3x3) 2025-09-07T10:56:21.3593382Z strides: [262144, 16384, 128, 1], [144, 9, 3, 1] 2025-09-07T10:56:21.3593897Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:21.3594696Z triton_convolution2d_6 0.0199 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:56:21.3595992Z triton_convolution2d_10 0.0203 ms 98.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:56:21.3597734Z triton_convolution2d_8 0.0211 ms 94.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:56:21.3599004Z triton_convolution2d_5 0.0231 ms 86.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:56:21.3600016Z triton_convolution2d_9 0.0256 ms 77.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:56:21.3601020Z triton_convolution2d_7 0.0408 ms 48.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:56:21.3601639Z convolution 0.0462 ms 43.1% 2025-09-07T10:56:21.3602028Z SingleProcess AUTOTUNE benchmarking takes 0.1032 seconds and 0.0002 seconds precompiling for 7 choices 2025-09-07T10:56:21.8079375Z Autotune Choices Stats: 2025-09-07T10:56:21.8080576Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_21", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8", "best_time": 0.0488319993019104, "best_triton_pos": 0} 2025-09-07T10:56:21.8186519Z AUTOTUNE convolution(8x64x129x129, 128x64x3x3) 2025-09-07T10:56:21.8186878Z strides: [1065024, 16641, 129, 1], [576, 9, 3, 1] 2025-09-07T10:56:21.8187192Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:21.8188037Z triton_convolution2d_21 0.0488 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:56:21.8189326Z triton_convolution2d_19 0.0493 ms 99.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:56:21.8190677Z triton_convolution2d_24 0.0511 ms 95.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:56:21.8191859Z triton_convolution2d_23 0.0529 ms 92.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:56:21.8192540Z convolution 0.0545 ms 89.6% 2025-09-07T10:56:21.8193167Z triton_convolution2d_22 0.0573 ms 85.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:56:21.8194642Z triton_convolution2d_18 0.0703 ms 69.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:56:21.8195703Z triton_convolution2d_20 0.1745 ms 28.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:56:21.8196547Z SingleProcess AUTOTUNE benchmarking takes 0.1426 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:56:22.2842856Z Autotune Choices Stats: 2025-09-07T10:56:22.2844467Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_29", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4", "best_time": 0.013151999562978745, "best_triton_pos": 0} 2025-09-07T10:56:22.2951359Z AUTOTUNE convolution(8x128x64x64, 128x128x1x1) 2025-09-07T10:56:22.2951740Z strides: [524288, 4096, 64, 1], [128, 1, 1, 1] 2025-09-07T10:56:22.2952075Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:22.2952891Z triton_convolution2d_29 0.0132 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:22.2953679Z convolution 0.0132 ms 99.3% 2025-09-07T10:56:22.2954668Z triton_convolution2d_28 0.0141 ms 93.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:22.2955895Z triton_convolution2d_25 0.0144 ms 91.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:22.2957260Z triton_convolution2d_31 0.0147 ms 89.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:22.2958483Z triton_convolution2d_30 0.0161 ms 81.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:22.2959680Z triton_convolution2d_26 0.0167 ms 78.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:22.2960733Z triton_convolution2d_27 0.0193 ms 68.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:56:22.2961551Z conv1x1_via_mm 0.0780 ms 16.9% 2025-09-07T10:56:22.2961964Z SingleProcess AUTOTUNE benchmarking takes 0.1462 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T10:56:22.7687760Z Autotune Choices Stats: 2025-09-07T10:56:22.7689408Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.053408000618219376, "best_triton_pos": 1, "best_triton_time": 0.08150400221347809, "best_triton_kernel": "triton_convolution2d_33", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4"} 2025-09-07T10:56:22.7791353Z AUTOTUNE convolution(8x128x64x64, 128x128x3x3) 2025-09-07T10:56:22.7791702Z strides: [524288, 4096, 64, 1], [1152, 9, 3, 1] 2025-09-07T10:56:22.7792022Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:22.7792323Z convolution 0.0534 ms 100.0% 2025-09-07T10:56:22.7793128Z triton_convolution2d_33 0.0815 ms 65.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:56:22.7794806Z triton_convolution2d_38 0.0848 ms 63.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:56:22.7796054Z triton_convolution2d_35 0.0935 ms 57.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:56:22.7797396Z triton_convolution2d_37 0.1007 ms 53.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:56:22.7798920Z triton_convolution2d_36 0.1030 ms 51.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:56:22.7800107Z triton_convolution2d_32 0.1521 ms 35.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:56:22.7801180Z triton_convolution2d_34 0.2797 ms 19.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:56:22.7802049Z SingleProcess AUTOTUNE benchmarking takes 0.1825 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:56:22.9386628Z Autotune Choices Stats: 2025-09-07T10:56:22.9388110Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.012992000207304955, "best_triton_pos": 1, "best_triton_time": 0.015584000386297703, "best_triton_kernel": "triton_convolution2d_101", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:56:22.9488933Z AUTOTUNE convolution(8x256x32x32, 512x256x1x1) 2025-09-07T10:56:22.9489338Z strides: [262144, 1024, 32, 1], [256, 1, 1, 1] 2025-09-07T10:56:22.9489658Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:22.9489925Z convolution 0.0130 ms 100.0% 2025-09-07T10:56:22.9490606Z triton_convolution2d_101 0.0156 ms 83.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:22.9491779Z triton_convolution2d_100 0.0169 ms 77.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:22.9493281Z triton_convolution2d_103 0.0180 ms 72.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:22.9494883Z triton_convolution2d_97 0.0204 ms 63.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:22.9496012Z triton_convolution2d_102 0.0212 ms 61.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:22.9497148Z triton_convolution2d_98 0.0222 ms 58.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:22.9498308Z triton_convolution2d_99 0.0264 ms 49.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:56:22.9499005Z conv1x1_via_mm 0.0692 ms 18.8% 2025-09-07T10:56:22.9499453Z SingleProcess AUTOTUNE benchmarking takes 0.1452 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T10:56:23.4736031Z Autotune Choices Stats: 2025-09-07T10:56:23.4737458Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.012480000033974648, "best_triton_pos": 1, "best_triton_time": 0.020287999883294106, "best_triton_kernel": "triton_convolution2d_160", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:56:23.4840777Z AUTOTUNE convolution(8x512x16x16, 1536x512x1x1) 2025-09-07T10:56:23.4841178Z strides: [131072, 256, 16, 1], [512, 1, 1, 1] 2025-09-07T10:56:23.4841491Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:23.4841794Z convolution 0.0125 ms 100.0% 2025-09-07T10:56:23.4842560Z triton_convolution2d_160 0.0203 ms 61.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:23.4844183Z triton_convolution2d_159 0.0228 ms 54.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:23.4845422Z triton_convolution2d_162 0.0233 ms 53.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:23.4846656Z triton_convolution2d_161 0.0238 ms 52.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:23.4847867Z triton_convolution2d_156 0.0313 ms 39.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:23.4849129Z triton_convolution2d_157 0.0330 ms 37.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:23.4850494Z triton_convolution2d_158 0.0427 ms 29.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:56:23.4851271Z conv1x1_via_mm 0.0632 ms 19.7% 2025-09-07T10:56:23.4851897Z SingleProcess AUTOTUNE benchmarking takes 0.1484 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T10:56:23.9414373Z Autotune Choices Stats: 2025-09-07T10:56:23.9415522Z {"num_choices": 6, "num_triton_choices": 5, "best_kernel": "triton_convolution2d_1", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.014655999839305878, "best_triton_pos": 0} 2025-09-07T10:56:23.9517840Z AUTOTUNE convolution(8x3x257x257, 16x3x3x3) 2025-09-07T10:56:23.9518184Z strides: [198147, 66049, 257, 1], [27, 9, 3, 1] 2025-09-07T10:56:23.9518501Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:23.9519301Z triton_convolution2d_1 0.0147 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:56:23.9520584Z triton_convolution2d_4 0.0155 ms 94.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:56:23.9521811Z triton_convolution2d_2 0.0161 ms 90.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:56:23.9523035Z triton_convolution2d_0 0.0185 ms 79.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:56:23.9524441Z triton_convolution2d_3 0.0186 ms 78.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:56:23.9525544Z convolution 0.0254 ms 57.7% 2025-09-07T10:56:23.9526064Z SingleProcess AUTOTUNE benchmarking takes 0.0863 seconds and 0.0002 seconds precompiling for 6 choices 2025-09-07T10:56:24.4065601Z Autotune Choices Stats: 2025-09-07T10:56:24.4067036Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.012512000277638435, "best_triton_pos": 1, "best_triton_time": 0.01974399946630001, "best_triton_kernel": "triton_convolution2d_108", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:56:24.4175814Z AUTOTUNE convolution(8x512x32x32, 256x512x1x1) 2025-09-07T10:56:24.4176180Z strides: [524288, 1024, 32, 1], [512, 1, 1, 1] 2025-09-07T10:56:24.4176490Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:24.4176767Z convolution 0.0125 ms 100.0% 2025-09-07T10:56:24.4177558Z triton_convolution2d_108 0.0197 ms 63.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:24.4178807Z triton_convolution2d_107 0.0235 ms 53.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:24.4180044Z triton_convolution2d_110 0.0239 ms 52.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:24.4181215Z triton_convolution2d_109 0.0253 ms 49.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:24.4182341Z triton_convolution2d_104 0.0331 ms 37.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:24.4184048Z triton_convolution2d_105 0.0333 ms 37.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:24.4185358Z triton_convolution2d_106 0.0395 ms 31.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:56:24.4186055Z conv1x1_via_mm 0.0699 ms 17.9% 2025-09-07T10:56:24.4186495Z SingleProcess AUTOTUNE benchmarking takes 0.1484 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T10:56:24.8709677Z Autotune Choices Stats: 2025-09-07T10:56:24.8711688Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.013856000266969204, "best_triton_pos": 1, "best_triton_time": 0.04182400181889534, "best_triton_kernel": "triton_convolution2d_167", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:56:24.8818152Z AUTOTUNE convolution(8x1536x16x16, 768x1536x1x1) 2025-09-07T10:56:24.8818624Z strides: [393216, 256, 16, 1], [1536, 1, 1, 1] 2025-09-07T10:56:24.8819034Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:24.8819416Z convolution 0.0139 ms 100.0% 2025-09-07T10:56:24.8820459Z triton_convolution2d_167 0.0418 ms 33.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:24.8822658Z triton_convolution2d_169 0.0517 ms 26.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:24.8824555Z triton_convolution2d_166 0.0518 ms 26.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:24.8826264Z triton_convolution2d_168 0.0535 ms 25.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:24.8827294Z conv1x1_via_mm 0.0703 ms 19.7% 2025-09-07T10:56:24.8828306Z triton_convolution2d_163 0.0761 ms 18.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:24.8830003Z triton_convolution2d_164 0.0802 ms 17.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:24.8831024Z triton_convolution2d_165 0.1071 ms 12.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:56:24.8831836Z SingleProcess AUTOTUNE benchmarking takes 0.1678 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T10:56:25.3854848Z Autotune Choices Stats: 2025-09-07T10:56:25.3856247Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.014368000440299511, "best_triton_pos": 1, "best_triton_time": 0.041120000183582306, "best_triton_kernel": "triton_convolution2d_323", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:56:25.3960780Z AUTOTUNE convolution(8x1536x8x8, 1536x1536x1x1) 2025-09-07T10:56:25.3961356Z strides: [98304, 64, 8, 1], [1536, 1, 1, 1] 2025-09-07T10:56:25.3961854Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:25.3962335Z convolution 0.0144 ms 100.0% 2025-09-07T10:56:25.3963542Z triton_convolution2d_323 0.0411 ms 34.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:25.3965516Z conv1x1_via_mm 0.0466 ms 30.9% 2025-09-07T10:56:25.3966749Z triton_convolution2d_322 0.0498 ms 28.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:25.3968767Z triton_convolution2d_324 0.0521 ms 27.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:25.3970926Z triton_convolution2d_325 0.0530 ms 27.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:25.3972140Z triton_convolution2d_319 0.0745 ms 19.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:25.3973363Z triton_convolution2d_321 0.0782 ms 18.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:56:25.3974950Z triton_convolution2d_320 0.0800 ms 18.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:25.3975938Z SingleProcess AUTOTUNE benchmarking takes 0.1663 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T10:56:26.2625503Z Autotune Choices Stats: 2025-09-07T10:56:26.2627755Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.011839999817311764, "best_triton_pos": 2, "best_triton_time": 0.04012800008058548, "best_triton_kernel": "triton_convolution2d_330", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:56:26.2735431Z AUTOTUNE convolution(8x1536x8x8, 768x1536x1x1) 2025-09-07T10:56:26.2735745Z strides: [98304, 64, 8, 1], [1536, 1, 1, 1] 2025-09-07T10:56:26.2736019Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:26.2736278Z convolution 0.0118 ms 100.0% 2025-09-07T10:56:26.2736533Z conv1x1_via_mm 0.0326 ms 36.3% 2025-09-07T10:56:26.2737229Z triton_convolution2d_330 0.0401 ms 29.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:26.2738379Z triton_convolution2d_329 0.0511 ms 23.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:26.2739538Z triton_convolution2d_331 0.0524 ms 22.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:26.2740699Z triton_convolution2d_332 0.0539 ms 22.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:26.2742109Z triton_convolution2d_326 0.0752 ms 15.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:26.2743164Z triton_convolution2d_328 0.0782 ms 15.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:56:26.2744856Z triton_convolution2d_327 0.0816 ms 14.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:26.2745708Z SingleProcess AUTOTUNE benchmarking takes 0.5225 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T10:56:26.7229267Z Autotune Choices Stats: 2025-09-07T10:56:26.7231328Z {"num_choices": 8, "num_triton_choices": 6, "best_kernel": "convolution", "best_time": 0.01017600018531084, "best_triton_pos": 2, "best_triton_time": 0.037248000502586365, "best_triton_kernel": "triton_convolution2d_148", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:56:26.7337833Z AUTOTUNE convolution(8x1536x1x1, 768x1536x1x1) 2025-09-07T10:56:26.7338185Z strides: [1536, 1, 1, 1], [1536, 1, 1, 1] 2025-09-07T10:56:26.7338484Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:26.7338783Z convolution 0.0102 ms 100.0% 2025-09-07T10:56:26.7339064Z conv1x1_via_mm 0.0124 ms 82.4% 2025-09-07T10:56:26.7339836Z triton_convolution2d_148 0.0372 ms 27.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:26.7341339Z triton_convolution2d_147 0.0410 ms 24.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:26.7342419Z triton_convolution2d_149 0.0485 ms 21.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:26.7343474Z triton_convolution2d_146 0.0533 ms 19.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=1 2025-09-07T10:56:26.7344854Z triton_convolution2d_145 0.0588 ms 17.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:26.7345922Z triton_convolution2d_144 0.0703 ms 14.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:26.7346761Z SingleProcess AUTOTUNE benchmarking takes 0.1466 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:56:27.2000548Z Autotune Choices Stats: 2025-09-07T10:56:27.2001853Z {"num_choices": 8, "num_triton_choices": 6, "best_kernel": "convolution", "best_time": 0.008063999935984612, "best_triton_pos": 2, "best_triton_time": 0.015200000256299973, "best_triton_kernel": "triton_convolution2d_89", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:56:27.2110209Z AUTOTUNE convolution(8x512x1x1, 256x512x1x1) 2025-09-07T10:56:27.2110555Z strides: [512, 1, 1, 1], [512, 1, 1, 1] 2025-09-07T10:56:27.2110846Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:27.2111506Z convolution 0.0081 ms 100.0% 2025-09-07T10:56:27.2111718Z conv1x1_via_mm 0.0117 ms 68.7% 2025-09-07T10:56:27.2112346Z triton_convolution2d_89 0.0152 ms 53.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:27.2113510Z triton_convolution2d_88 0.0171 ms 47.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:27.2114742Z triton_convolution2d_87 0.0191 ms 42.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=1 2025-09-07T10:56:27.2115750Z triton_convolution2d_86 0.0203 ms 39.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:27.2116754Z triton_convolution2d_90 0.0207 ms 38.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:27.2117851Z triton_convolution2d_85 0.0281 ms 28.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:27.2118662Z SingleProcess AUTOTUNE benchmarking takes 0.1321 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:56:27.5905346Z Autotune Choices Stats: 2025-09-07T10:56:27.5906953Z {"num_choices": 7, "num_triton_choices": 5, "best_kernel": "convolution", "best_time": 0.0072639998979866505, "best_triton_pos": 2, "best_triton_time": 0.010463999584317207, "best_triton_kernel": "triton_convolution2d_57", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:56:27.6015264Z AUTOTUNE convolution(8x256x1x1, 128x256x1x1) 2025-09-07T10:56:27.6015824Z strides: [256, 1, 1, 1], [256, 1, 1, 1] 2025-09-07T10:56:27.6016340Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:27.6016794Z convolution 0.0073 ms 100.0% 2025-09-07T10:56:27.6017206Z conv1x1_via_mm 0.0099 ms 73.7% 2025-09-07T10:56:27.6018439Z triton_convolution2d_57 0.0105 ms 69.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:27.6020412Z triton_convolution2d_56 0.0109 ms 66.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:27.6022105Z triton_convolution2d_55 0.0111 ms 65.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=1 2025-09-07T10:56:27.6023142Z triton_convolution2d_54 0.0121 ms 60.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:27.6024350Z triton_convolution2d_53 0.0131 ms 55.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:27.6025179Z SingleProcess AUTOTUNE benchmarking takes 0.1205 seconds and 0.0002 seconds precompiling for 7 choices 2025-09-07T10:56:28.5715315Z Autotune Choices Stats: 2025-09-07T10:56:28.5716538Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_12", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4", "best_time": 0.04227200150489807, "best_triton_pos": 0} 2025-09-07T10:56:28.5827024Z AUTOTUNE convolution(8x32x128x128, 64x32x3x3) 2025-09-07T10:56:28.5827610Z strides: [524288, 16384, 128, 1], [288, 9, 3, 1] 2025-09-07T10:56:28.5827931Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:28.5828748Z triton_convolution2d_12 0.0423 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:56:28.5830040Z triton_convolution2d_17 0.0438 ms 96.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:56:28.5831312Z triton_convolution2d_14 0.0442 ms 95.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:56:28.5832389Z triton_convolution2d_15 0.0489 ms 86.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:56:28.5833425Z triton_convolution2d_11 0.0524 ms 80.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:56:28.5834615Z triton_convolution2d_16 0.0524 ms 80.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:56:28.5835243Z convolution 0.0567 ms 74.5% 2025-09-07T10:56:28.5836098Z triton_convolution2d_13 0.1328 ms 31.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:56:28.5837013Z SingleProcess AUTOTUNE benchmarking takes 0.1330 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:56:28.7086331Z Autotune Choices Stats: 2025-09-07T10:56:28.7087633Z {"num_choices": 8, "num_triton_choices": 6, "best_kernel": "convolution", "best_time": 0.007071999832987785, "best_triton_pos": 1, "best_triton_time": 0.007712000049650669, "best_triton_kernel": "triton_convolution2d_62", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:56:28.7189916Z AUTOTUNE convolution(8x128x1x1, 256x128x1x1) 2025-09-07T10:56:28.7190284Z strides: [128, 1, 1, 1], [128, 1, 1, 1] 2025-09-07T10:56:28.7190576Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:28.7190870Z convolution 0.0071 ms 100.0% 2025-09-07T10:56:28.7191614Z triton_convolution2d_62 0.0077 ms 91.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:28.7192862Z triton_convolution2d_61 0.0081 ms 87.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:28.7194445Z triton_convolution2d_60 0.0084 ms 84.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=1 2025-09-07T10:56:28.7195659Z triton_convolution2d_59 0.0088 ms 80.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:28.7197432Z triton_convolution2d_63 0.0091 ms 78.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:28.7198306Z conv1x1_via_mm 0.0096 ms 73.7% 2025-09-07T10:56:28.7199043Z triton_convolution2d_58 0.0108 ms 65.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:28.7200064Z SingleProcess AUTOTUNE benchmarking takes 0.1316 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:56:28.8548105Z Autotune Choices Stats: 2025-09-07T10:56:28.8549575Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01820800080895424, "best_triton_pos": 1, "best_triton_time": 0.028511999174952507, "best_triton_kernel": "triton_convolution2d_74", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8"} 2025-09-07T10:56:28.8653938Z AUTOTUNE convolution(8x256x64x64, 256x256x1x1) 2025-09-07T10:56:28.8654340Z strides: [1048576, 4096, 64, 1], [256, 1, 1, 1] 2025-09-07T10:56:28.8654664Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:28.8654948Z convolution 0.0182 ms 100.0% 2025-09-07T10:56:28.8655738Z triton_convolution2d_74 0.0285 ms 63.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:28.8657402Z triton_convolution2d_75 0.0293 ms 62.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:28.8658722Z triton_convolution2d_77 0.0308 ms 59.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:28.8660012Z triton_convolution2d_71 0.0363 ms 50.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:28.8661316Z triton_convolution2d_76 0.0397 ms 45.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:28.8662497Z triton_convolution2d_72 0.0406 ms 44.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:28.8663599Z triton_convolution2d_73 0.0517 ms 35.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:56:28.8664420Z conv1x1_via_mm 0.1482 ms 12.3% 2025-09-07T10:56:28.8664836Z SingleProcess AUTOTUNE benchmarking takes 0.1454 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T10:56:28.9917113Z Autotune Choices Stats: 2025-09-07T10:56:28.9918466Z {"num_choices": 8, "num_triton_choices": 6, "best_kernel": "convolution", "best_time": 0.0077760000713169575, "best_triton_pos": 2, "best_triton_time": 0.010400000028312206, "best_triton_kernel": "triton_convolution2d_95", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:56:29.0025197Z AUTOTUNE convolution(8x256x1x1, 512x256x1x1) 2025-09-07T10:56:29.0025570Z strides: [256, 1, 1, 1], [256, 1, 1, 1] 2025-09-07T10:56:29.0025885Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:29.0026185Z convolution 0.0078 ms 100.0% 2025-09-07T10:56:29.0026446Z conv1x1_via_mm 0.0098 ms 79.7% 2025-09-07T10:56:29.0027547Z triton_convolution2d_95 0.0104 ms 74.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:29.0028810Z triton_convolution2d_94 0.0110 ms 70.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:29.0030045Z triton_convolution2d_93 0.0120 ms 64.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=1 2025-09-07T10:56:29.0031266Z triton_convolution2d_92 0.0128 ms 60.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:29.0032360Z triton_convolution2d_96 0.0133 ms 58.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:29.0033367Z triton_convolution2d_91 0.0163 ms 47.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:29.0034446Z SingleProcess AUTOTUNE benchmarking takes 0.1324 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:56:29.1517585Z Autotune Choices Stats: 2025-09-07T10:56:29.1519203Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.018112000077962875, "best_triton_pos": 1, "best_triton_time": 0.03417599946260452, "best_triton_kernel": "triton_convolution2d_134", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:56:29.1623186Z AUTOTUNE convolution(8x512x32x32, 768x512x1x1) 2025-09-07T10:56:29.1624004Z strides: [524288, 1024, 32, 1], [512, 1, 1, 1] 2025-09-07T10:56:29.1624376Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:29.1624689Z convolution 0.0181 ms 100.0% 2025-09-07T10:56:29.1625469Z triton_convolution2d_134 0.0342 ms 53.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:29.1626738Z triton_convolution2d_130 0.0354 ms 51.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:29.1627971Z triton_convolution2d_131 0.0367 ms 49.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:29.1629206Z triton_convolution2d_135 0.0374 ms 48.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:29.1630437Z triton_convolution2d_133 0.0388 ms 46.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:29.1631706Z triton_convolution2d_136 0.0414 ms 43.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:29.1633081Z triton_convolution2d_132 0.0711 ms 25.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:56:29.1634002Z conv1x1_via_mm 0.1061 ms 17.1% 2025-09-07T10:56:29.1634442Z SingleProcess AUTOTUNE benchmarking takes 0.1512 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T10:56:29.3006175Z Autotune Choices Stats: 2025-09-07T10:56:29.3007441Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01462399959564209, "best_triton_pos": 1, "best_triton_time": 0.02611199952661991, "best_triton_kernel": "triton_convolution2d_141", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:56:29.3112386Z AUTOTUNE convolution(8x768x16x16, 1536x768x1x1) 2025-09-07T10:56:29.3112737Z strides: [196608, 256, 16, 1], [768, 1, 1, 1] 2025-09-07T10:56:29.3113045Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:29.3113332Z convolution 0.0146 ms 100.0% 2025-09-07T10:56:29.3114380Z triton_convolution2d_141 0.0261 ms 56.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:29.3115641Z triton_convolution2d_140 0.0301 ms 48.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:29.3117288Z triton_convolution2d_143 0.0304 ms 48.1% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:29.3118606Z triton_convolution2d_142 0.0309 ms 47.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:29.3119824Z triton_convolution2d_137 0.0415 ms 35.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:29.3121053Z triton_convolution2d_138 0.0445 ms 32.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:29.3122368Z triton_convolution2d_139 0.0597 ms 24.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:56:29.3123072Z conv1x1_via_mm 0.0699 ms 20.9% 2025-09-07T10:56:29.3123513Z SingleProcess AUTOTUNE benchmarking takes 0.1481 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T10:56:29.4348508Z Autotune Choices Stats: 2025-09-07T10:56:29.4349965Z {"num_choices": 8, "num_triton_choices": 6, "best_kernel": "convolution", "best_time": 0.00902399979531765, "best_triton_pos": 2, "best_triton_time": 0.021344000473618507, "best_triton_kernel": "triton_convolution2d_154", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:56:29.4454423Z AUTOTUNE convolution(8x768x1x1, 1536x768x1x1) 2025-09-07T10:56:29.4454802Z strides: [768, 1, 1, 1], [768, 1, 1, 1] 2025-09-07T10:56:29.4455113Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:29.4455406Z convolution 0.0090 ms 100.0% 2025-09-07T10:56:29.4455985Z conv1x1_via_mm 0.0132 ms 68.4% 2025-09-07T10:56:29.4456774Z triton_convolution2d_154 0.0213 ms 42.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:29.4458149Z triton_convolution2d_153 0.0236 ms 38.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:29.4459369Z triton_convolution2d_155 0.0271 ms 33.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:29.4460598Z triton_convolution2d_152 0.0272 ms 33.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=1 2025-09-07T10:56:29.4461868Z triton_convolution2d_151 0.0300 ms 30.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:29.4462914Z triton_convolution2d_150 0.0380 ms 23.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:29.4463911Z SingleProcess AUTOTUNE benchmarking takes 0.1331 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:56:29.6241361Z Autotune Choices Stats: 2025-09-07T10:56:29.6242977Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.010623999871313572, "best_triton_pos": 1, "best_triton_time": 0.023840000852942467, "best_triton_kernel": "triton_convolution2d_304", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:56:29.6347983Z AUTOTUNE convolution(8x768x8x8, 1536x768x1x1) 2025-09-07T10:56:29.6348428Z strides: [49152, 64, 8, 1], [768, 1, 1, 1] 2025-09-07T10:56:29.6348743Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:29.6349032Z convolution 0.0106 ms 100.0% 2025-09-07T10:56:29.6349804Z triton_convolution2d_304 0.0238 ms 44.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:29.6351059Z triton_convolution2d_303 0.0285 ms 37.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:29.6352263Z triton_convolution2d_305 0.0297 ms 35.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:29.6353266Z triton_convolution2d_306 0.0299 ms 35.6% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:29.6354015Z conv1x1_via_mm 0.0330 ms 32.2% 2025-09-07T10:56:29.6354610Z triton_convolution2d_300 0.0403 ms 26.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:29.6355606Z triton_convolution2d_302 0.0433 ms 24.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:56:29.6356599Z triton_convolution2d_301 0.0436 ms 24.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:29.6357649Z SingleProcess AUTOTUNE benchmarking takes 0.1459 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T10:56:29.8112116Z Autotune Choices Stats: 2025-09-07T10:56:29.8113486Z {"num_choices": 9, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01849599927663803, "best_triton_pos": 1, "best_triton_time": 0.04364800080657005, "best_triton_kernel": "triton_convolution2d_382", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:56:29.8219775Z AUTOTUNE convolution(8x1536x8x8, 3072x1536x1x1) 2025-09-07T10:56:29.8220147Z strides: [98304, 64, 8, 1], [1536, 1, 1, 1] 2025-09-07T10:56:29.8220472Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:56:29.8220781Z convolution 0.0185 ms 100.0% 2025-09-07T10:56:29.8221592Z triton_convolution2d_382 0.0436 ms 42.4% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:29.8222761Z triton_convolution2d_381 0.0516 ms 35.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:29.8223989Z triton_convolution2d_384 0.0535 ms 34.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:29.8225362Z triton_convolution2d_383 0.0544 ms 34.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:56:29.8226473Z triton_convolution2d_378 0.0750 ms 24.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:29.8227535Z triton_convolution2d_380 0.0790 ms 23.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=512, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:56:29.8228592Z triton_convolution2d_379 0.0813 ms 22.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=1, STRIDE_W=1, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:56:29.8229240Z conv1x1_via_mm 0.0819 ms 22.6% 2025-09-07T10:56:29.8229659Z SingleProcess AUTOTUNE benchmarking takes 0.1660 seconds and 0.0002 seconds precompiling for 9 choices 2025-09-07T10:56:30.0921327Z Autotune Choices Stats: 2025-09-07T10:56:30.0922670Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "bias_addmm", "best_time": 0.013887999579310417, "best_triton_pos": 1, "best_triton_time": 0.014911999925971031, "best_triton_kernel": "triton_mm_389", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2"} 2025-09-07T10:56:30.1037665Z AUTOTUNE addmm(8x1000, 8x3072, 3072x1000) 2025-09-07T10:56:30.1038072Z strides: [0, 1], [3072, 1], [1, 3072] 2025-09-07T10:56:30.1038422Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:56:30.1038771Z bias_addmm 0.0139 ms 100.0% 2025-09-07T10:56:30.1039393Z triton_mm_389 0.0149 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:56:30.1040383Z triton_mm_393 0.0158 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:56:30.1041673Z triton_mm_397 0.0173 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:56:30.1042457Z addmm 0.0183 ms 76.0% 2025-09-07T10:56:30.1042985Z triton_mm_401 0.0201 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:56:30.1044025Z triton_mm_388 0.0230 ms 60.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:56:30.1044912Z triton_mm_387 0.0245 ms 56.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:56:30.1045817Z triton_mm_392 0.0255 ms 54.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:56:30.1046704Z triton_mm_386 0.0260 ms 53.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T10:56:30.1047488Z SingleProcess AUTOTUNE benchmarking takes 0.2808 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T10:56:34.0661070Z pass 2025-09-07T10:56:38.0046131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:56:38.0047282Z import pynvml # type: ignore[import] 2025-09-07T10:56:41.0471852Z 2025-09-07T10:56:42.4871456Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:56:42.4871866Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:56:42.5015885Z cuda eval dpn107 2025-09-07T10:57:18.4638825Z Autotune Choices Stats: 2025-09-07T10:57:18.4641778Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.04025600105524063, "best_triton_pos": 2, "best_triton_time": 0.045791998505592346, "best_triton_kernel": "triton_mm_202", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T10:57:18.4840874Z AUTOTUNE addmm(25088x400, 25088x376, 376x400) 2025-09-07T10:57:18.4842030Z strides: [0, 1], [376, 1], [1, 376] 2025-09-07T10:57:18.4842443Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:18.4842740Z bias_addmm 0.0403 ms 100.0% 2025-09-07T10:57:18.4842940Z addmm 0.0456 ms 88.3% 2025-09-07T10:57:18.4843511Z triton_mm_202 0.0458 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:18.4844849Z triton_mm_203 0.0512 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:18.4845666Z triton_mm_201 0.0517 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:18.4846494Z triton_mm_200 0.0544 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T10:57:18.4847306Z triton_mm_196 0.0564 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:18.4848445Z triton_mm_192 0.0610 ms 66.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:18.4849252Z triton_mm_198 0.0617 ms 65.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:18.4850197Z triton_mm_199 0.0619 ms 65.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:18.4850913Z SingleProcess AUTOTUNE benchmarking takes 0.4696 seconds and 0.0005 seconds precompiling for 21 choices 2025-09-07T10:57:19.2132874Z Autotune Choices Stats: 2025-09-07T10:57:19.2134457Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_37", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.01692800037562847, "best_triton_pos": 0} 2025-09-07T10:57:19.2265884Z AUTOTUNE addmm(25088x200, 25088x128, 128x200) 2025-09-07T10:57:19.2266248Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T10:57:19.2266530Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:19.2267161Z triton_mm_37 0.0169 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:19.2267976Z triton_mm_35 0.0177 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:19.2268758Z triton_mm_36 0.0178 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:19.2270008Z triton_mm_38 0.0179 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:19.2270819Z triton_mm_39 0.0183 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:19.2271590Z triton_mm_32 0.0184 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:57:19.2272394Z triton_mm_42 0.0185 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:19.2273175Z triton_mm_43 0.0186 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:19.2274193Z triton_mm_40 0.0186 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:19.2274989Z triton_mm_34 0.0195 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:19.2275672Z SingleProcess AUTOTUNE benchmarking takes 0.3698 seconds and 0.0004 seconds precompiling for 21 choices 2025-09-07T10:57:19.9304189Z Autotune Choices Stats: 2025-09-07T10:57:19.9305252Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_80", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.0297279991209507, "best_triton_pos": 0} 2025-09-07T10:57:19.9438476Z AUTOTUNE addmm(25088x200, 25088x316, 316x200) 2025-09-07T10:57:19.9439186Z strides: [0, 1], [316, 1], [1, 316] 2025-09-07T10:57:19.9439466Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:19.9440089Z triton_mm_80 0.0297 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:19.9441080Z triton_mm_73 0.0310 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:19.9441883Z triton_mm_74 0.0317 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:19.9442677Z triton_mm_79 0.0321 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T10:57:19.9443477Z triton_mm_78 0.0322 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:19.9444513Z triton_mm_75 0.0347 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:19.9445306Z triton_mm_81 0.0356 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:19.9446092Z triton_mm_70 0.0361 ms 82.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:57:19.9446871Z triton_mm_77 0.0365 ms 81.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:19.9447876Z triton_mm_82 0.0379 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:19.9448585Z SingleProcess AUTOTUNE benchmarking takes 0.3737 seconds and 0.0004 seconds precompiling for 21 choices 2025-09-07T10:57:20.3138082Z Autotune Choices Stats: 2025-09-07T10:57:20.3139006Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_113", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.026464000344276428, "best_triton_pos": 0} 2025-09-07T10:57:20.3261346Z AUTOTUNE addmm(25088x200, 25088x336, 336x200) 2025-09-07T10:57:20.3263037Z strides: [0, 1], [336, 1], [1, 336] 2025-09-07T10:57:20.3263352Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:20.3264467Z triton_mm_113 0.0265 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:20.3265340Z triton_mm_119 0.0275 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:20.3266157Z triton_mm_118 0.0279 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:20.3266954Z triton_mm_112 0.0280 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:20.3267737Z triton_mm_116 0.0280 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:20.3268979Z triton_mm_120 0.0281 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:20.3269499Z bias_addmm 0.0287 ms 92.1% 2025-09-07T10:57:20.3269697Z addmm 0.0296 ms 89.4% 2025-09-07T10:57:20.3270319Z triton_mm_115 0.0303 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:20.3271111Z triton_mm_111 0.0304 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:20.3271798Z SingleProcess AUTOTUNE benchmarking takes 0.3572 seconds and 0.0004 seconds precompiling for 21 choices 2025-09-07T10:57:20.6965735Z Autotune Choices Stats: 2025-09-07T10:57:20.6967368Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_156", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.031808000057935715, "best_triton_pos": 0} 2025-09-07T10:57:20.7091793Z AUTOTUNE addmm(25088x200, 25088x356, 356x200) 2025-09-07T10:57:20.7092310Z strides: [0, 1], [356, 1], [1, 356] 2025-09-07T10:57:20.7092622Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:20.7093341Z triton_mm_156 0.0318 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:20.7094790Z triton_mm_149 0.0335 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:20.7096197Z triton_mm_150 0.0348 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:20.7097149Z triton_mm_154 0.0350 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:20.7098009Z triton_mm_153 0.0362 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:20.7098872Z triton_mm_155 0.0363 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T10:57:20.7099733Z triton_mm_151 0.0385 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:20.7100602Z triton_mm_157 0.0389 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:20.7101461Z triton_mm_146 0.0407 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:57:20.7102320Z triton_mm_158 0.0412 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:20.7103081Z SingleProcess AUTOTUNE benchmarking takes 0.3587 seconds and 0.0004 seconds precompiling for 21 choices 2025-09-07T10:57:21.1534992Z Autotune Choices Stats: 2025-09-07T10:57:21.1536161Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.027008000761270523, "best_triton_pos": 1, "best_triton_time": 0.03385600075125694, "best_triton_kernel": "triton_mm_513", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T10:57:21.1664509Z AUTOTUNE addmm(6272x800, 6272x1152, 1152x800) 2025-09-07T10:57:21.1683963Z strides: [0, 1], [1152, 1], [1, 1152] 2025-09-07T10:57:21.1685974Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:21.1686885Z bias_addmm 0.0270 ms 100.0% 2025-09-07T10:57:21.1687787Z triton_mm_513 0.0339 ms 79.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:21.1688710Z addmm 0.0347 ms 77.9% 2025-09-07T10:57:21.1689564Z triton_mm_514 0.0351 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:21.1691050Z triton_mm_507 0.0352 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:21.1692468Z triton_mm_510 0.0445 ms 60.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:21.1694190Z triton_mm_506 0.0448 ms 60.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:21.1695590Z triton_mm_503 0.0449 ms 60.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:21.1696987Z triton_mm_508 0.0452 ms 59.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:21.1698809Z triton_mm_512 0.0456 ms 59.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:21.1700068Z SingleProcess AUTOTUNE benchmarking takes 0.4317 seconds and 0.0004 seconds precompiling for 21 choices 2025-09-07T10:57:21.6729806Z Autotune Choices Stats: 2025-09-07T10:57:21.6730936Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.01648000068962574, "best_triton_pos": 1, "best_triton_time": 0.016736000776290894, "best_triton_kernel": "triton_mm_234", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T10:57:21.6860805Z AUTOTUNE addmm(6272x400, 6272x704, 704x400) 2025-09-07T10:57:21.6862799Z strides: [0, 1], [704, 1], [1, 704] 2025-09-07T10:57:21.6863284Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:21.6863982Z bias_addmm 0.0165 ms 100.0% 2025-09-07T10:57:21.6864937Z triton_mm_234 0.0167 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:21.6866392Z triton_mm_240 0.0173 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:21.6867821Z triton_mm_236 0.0197 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:21.6869215Z triton_mm_232 0.0203 ms 81.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:21.6870618Z triton_mm_239 0.0204 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:21.6872732Z triton_mm_237 0.0206 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:21.6874432Z triton_mm_233 0.0210 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:21.6875573Z addmm 0.0212 ms 77.6% 2025-09-07T10:57:21.6876414Z triton_mm_241 0.0216 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:21.6877782Z SingleProcess AUTOTUNE benchmarking takes 0.3806 seconds and 0.0004 seconds precompiling for 21 choices 2025-09-07T10:57:22.9522244Z Autotune Choices Stats: 2025-09-07T10:57:22.9523403Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.016767999157309532, "best_triton_pos": 1, "best_triton_time": 0.017376000061631203, "best_triton_kernel": "triton_mm_272", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T10:57:22.9650028Z AUTOTUNE addmm(6272x400, 6272x768, 768x400) 2025-09-07T10:57:22.9658133Z strides: [0, 1], [768, 1], [1, 768] 2025-09-07T10:57:22.9658518Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:22.9658857Z bias_addmm 0.0168 ms 100.0% 2025-09-07T10:57:22.9659468Z triton_mm_272 0.0174 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:22.9660449Z triton_mm_278 0.0182 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:22.9661910Z triton_mm_274 0.0209 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:22.9662456Z addmm 0.0210 ms 80.0% 2025-09-07T10:57:22.9662971Z triton_mm_277 0.0213 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:22.9664043Z triton_mm_279 0.0216 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:22.9664842Z triton_mm_275 0.0217 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:22.9665625Z triton_mm_271 0.0222 ms 75.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:22.9666400Z triton_mm_270 0.0223 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:22.9667082Z SingleProcess AUTOTUNE benchmarking takes 0.9208 seconds and 0.0005 seconds precompiling for 21 choices 2025-09-07T10:57:23.2791508Z Autotune Choices Stats: 2025-09-07T10:57:23.2792769Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.017152000218629837, "best_triton_pos": 1, "best_triton_time": 0.017696000635623932, "best_triton_kernel": "triton_mm_310", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T10:57:23.2910649Z AUTOTUNE addmm(6272x400, 6272x832, 832x400) 2025-09-07T10:57:23.2910941Z strides: [0, 1], [832, 1], [1, 832] 2025-09-07T10:57:23.2911668Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:23.2911950Z bias_addmm 0.0172 ms 100.0% 2025-09-07T10:57:23.2912481Z triton_mm_310 0.0177 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:23.2913503Z triton_mm_316 0.0186 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:23.2914268Z addmm 0.0213 ms 80.5% 2025-09-07T10:57:23.2914782Z triton_mm_317 0.0215 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:23.2915615Z triton_mm_312 0.0219 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:23.2916465Z triton_mm_315 0.0221 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:23.2917396Z triton_mm_313 0.0227 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:23.2918226Z triton_mm_309 0.0237 ms 72.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:23.2919051Z triton_mm_306 0.0237 ms 72.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:23.2919776Z SingleProcess AUTOTUNE benchmarking takes 0.3024 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:57:23.5957128Z Autotune Choices Stats: 2025-09-07T10:57:23.5958806Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.017472000792622566, "best_triton_pos": 1, "best_triton_time": 0.018400000408291817, "best_triton_kernel": "triton_mm_348", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T10:57:23.6076573Z AUTOTUNE addmm(6272x400, 6272x896, 896x400) 2025-09-07T10:57:23.6076960Z strides: [0, 1], [896, 1], [1, 896] 2025-09-07T10:57:23.6077305Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:23.6077638Z bias_addmm 0.0175 ms 100.0% 2025-09-07T10:57:23.6078299Z triton_mm_348 0.0184 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:23.6079317Z triton_mm_354 0.0195 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:23.6079957Z addmm 0.0223 ms 78.2% 2025-09-07T10:57:23.6080636Z triton_mm_355 0.0224 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:23.6081544Z triton_mm_350 0.0230 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:23.6082437Z triton_mm_353 0.0231 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:23.6083335Z triton_mm_351 0.0243 ms 71.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:23.6084727Z triton_mm_347 0.0245 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:23.6085629Z triton_mm_349 0.0245 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:23.6086550Z SingleProcess AUTOTUNE benchmarking takes 0.2941 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:57:23.9079061Z Autotune Choices Stats: 2025-09-07T10:57:23.9080478Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.018303999677300453, "best_triton_pos": 1, "best_triton_time": 0.0191040001809597, "best_triton_kernel": "triton_mm_386", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T10:57:23.9198138Z AUTOTUNE addmm(6272x400, 6272x960, 960x400) 2025-09-07T10:57:23.9198475Z strides: [0, 1], [960, 1], [1, 960] 2025-09-07T10:57:23.9198757Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:23.9199061Z bias_addmm 0.0183 ms 100.0% 2025-09-07T10:57:23.9199616Z triton_mm_386 0.0191 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:23.9200526Z triton_mm_392 0.0202 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:23.9201120Z addmm 0.0229 ms 79.9% 2025-09-07T10:57:23.9201693Z triton_mm_393 0.0235 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:23.9202989Z triton_mm_388 0.0241 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:23.9204339Z triton_mm_391 0.0244 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:23.9205283Z triton_mm_389 0.0254 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:23.9206186Z triton_mm_382 0.0263 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:23.9207079Z triton_mm_384 0.0264 ms 69.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:23.9207887Z SingleProcess AUTOTUNE benchmarking takes 0.2910 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:57:24.2224483Z Autotune Choices Stats: 2025-09-07T10:57:24.2225861Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.01865600049495697, "best_triton_pos": 1, "best_triton_time": 0.01961600035429001, "best_triton_kernel": "triton_mm_424", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T10:57:24.2346851Z AUTOTUNE addmm(6272x400, 6272x1024, 1024x400) 2025-09-07T10:57:24.2347133Z strides: [0, 1], [1024, 1], [1, 1024] 2025-09-07T10:57:24.2347412Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:24.2347703Z bias_addmm 0.0187 ms 100.0% 2025-09-07T10:57:24.2348261Z triton_mm_424 0.0196 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:24.2349522Z triton_mm_430 0.0210 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:24.2350082Z addmm 0.0232 ms 80.3% 2025-09-07T10:57:24.2350726Z triton_mm_431 0.0240 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:24.2351596Z triton_mm_426 0.0253 ms 73.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:24.2352479Z triton_mm_429 0.0259 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:24.2353345Z triton_mm_425 0.0263 ms 71.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:24.2354385Z triton_mm_420 0.0266 ms 70.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:24.2355226Z triton_mm_427 0.0266 ms 70.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:24.2355968Z SingleProcess AUTOTUNE benchmarking takes 0.2938 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:57:24.5402411Z Autotune Choices Stats: 2025-09-07T10:57:24.5404655Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.019392000511288643, "best_triton_pos": 1, "best_triton_time": 0.02035200037062168, "best_triton_kernel": "triton_mm_462", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T10:57:24.5525986Z AUTOTUNE addmm(6272x400, 6272x1088, 1088x400) 2025-09-07T10:57:24.5526299Z strides: [0, 1], [1088, 1], [1, 1088] 2025-09-07T10:57:24.5526612Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:24.5526944Z bias_addmm 0.0194 ms 100.0% 2025-09-07T10:57:24.5527547Z triton_mm_462 0.0204 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:24.5528531Z triton_mm_468 0.0214 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:24.5529149Z addmm 0.0239 ms 81.0% 2025-09-07T10:57:24.5529751Z triton_mm_469 0.0246 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:24.5530757Z triton_mm_464 0.0263 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:24.5531736Z triton_mm_467 0.0270 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:24.5532650Z triton_mm_465 0.0276 ms 70.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:24.5533549Z triton_mm_458 0.0281 ms 69.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:24.5534629Z triton_mm_463 0.0283 ms 68.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:24.5535610Z SingleProcess AUTOTUNE benchmarking takes 0.2966 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:57:24.9151110Z Autotune Choices Stats: 2025-09-07T10:57:24.9152754Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.02412799932062626, "best_triton_pos": 2, "best_triton_time": 0.030559999868273735, "best_triton_kernel": "triton_mm_1274", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T10:57:24.9278153Z AUTOTUNE addmm(1568x1600, 1568x2432, 2432x1600) 2025-09-07T10:57:24.9278516Z strides: [0, 1], [2432, 1], [1, 2432] 2025-09-07T10:57:24.9278866Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:24.9279219Z bias_addmm 0.0241 ms 100.0% 2025-09-07T10:57:24.9279506Z addmm 0.0299 ms 80.6% 2025-09-07T10:57:24.9280146Z triton_mm_1274 0.0306 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:24.9281335Z triton_mm_1280 0.0324 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:24.9282325Z triton_mm_1275 0.0377 ms 64.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:24.9283321Z triton_mm_1281 0.0380 ms 63.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:24.9284967Z triton_mm_1272 0.0422 ms 57.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:24.9285978Z triton_mm_1276 0.0427 ms 56.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:24.9286970Z triton_mm_1279 0.0438 ms 55.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:24.9287954Z triton_mm_1273 0.0451 ms 53.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:24.9288803Z SingleProcess AUTOTUNE benchmarking takes 0.3537 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:57:25.2931292Z Autotune Choices Stats: 2025-09-07T10:57:25.2932776Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.013952000066637993, "best_triton_pos": 1, "best_triton_time": 0.014911999925971031, "best_triton_kernel": "triton_mm_552", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:25.3054628Z AUTOTUNE addmm(1568x800, 1568x1216, 1216x800) 2025-09-07T10:57:25.3054952Z strides: [0, 1], [1216, 1], [1, 1216] 2025-09-07T10:57:25.3055287Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:25.3055620Z bias_addmm 0.0140 ms 100.0% 2025-09-07T10:57:25.3056279Z triton_mm_552 0.0149 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:25.3057308Z triton_mm_541 0.0164 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:25.3058536Z triton_mm_545 0.0166 ms 84.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:25.3059499Z triton_mm_551 0.0178 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:25.3060595Z triton_mm_544 0.0186 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:25.3061568Z triton_mm_548 0.0188 ms 74.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:25.3062098Z addmm 0.0188 ms 74.1% 2025-09-07T10:57:25.3062616Z triton_mm_546 0.0199 ms 70.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:25.3063473Z triton_mm_542 0.0209 ms 66.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:25.3064392Z SingleProcess AUTOTUNE benchmarking takes 0.2838 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:57:25.8507962Z Autotune Choices Stats: 2025-09-07T10:57:25.8509285Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.013728000223636627, "best_triton_pos": 1, "best_triton_time": 0.015519999898970127, "best_triton_kernel": "triton_mm_590", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:25.8627231Z AUTOTUNE addmm(1568x800, 1568x1280, 1280x800) 2025-09-07T10:57:25.8627580Z strides: [0, 1], [1280, 1], [1, 1280] 2025-09-07T10:57:25.8628338Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:25.8628697Z bias_addmm 0.0137 ms 100.0% 2025-09-07T10:57:25.8629356Z triton_mm_590 0.0155 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:25.8630396Z triton_mm_579 0.0170 ms 80.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:25.8631474Z triton_mm_583 0.0171 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:25.8632045Z addmm 0.0181 ms 75.8% 2025-09-07T10:57:25.8632605Z triton_mm_589 0.0182 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:25.8633521Z triton_mm_582 0.0190 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:25.8634891Z triton_mm_586 0.0193 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:25.8635799Z triton_mm_584 0.0195 ms 70.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:25.8636703Z triton_mm_580 0.0209 ms 65.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:25.8637579Z SingleProcess AUTOTUNE benchmarking takes 0.2838 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:57:26.1544216Z Autotune Choices Stats: 2025-09-07T10:57:26.1545404Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.013952000066637993, "best_triton_pos": 1, "best_triton_time": 0.015744000673294067, "best_triton_kernel": "triton_mm_628", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:26.1664126Z AUTOTUNE addmm(1568x800, 1568x1344, 1344x800) 2025-09-07T10:57:26.1664455Z strides: [0, 1], [1344, 1], [1, 1344] 2025-09-07T10:57:26.1664803Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:26.1665149Z bias_addmm 0.0140 ms 100.0% 2025-09-07T10:57:26.1665785Z triton_mm_628 0.0157 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:26.1666809Z triton_mm_621 0.0174 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:26.1667788Z triton_mm_617 0.0174 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:26.1668748Z triton_mm_627 0.0186 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:26.1669360Z addmm 0.0189 ms 73.6% 2025-09-07T10:57:26.1669947Z triton_mm_624 0.0196 ms 71.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:26.1671227Z triton_mm_620 0.0198 ms 70.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:26.1672166Z triton_mm_622 0.0207 ms 67.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:26.1673020Z triton_mm_618 0.0222 ms 62.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:26.1673980Z SingleProcess AUTOTUNE benchmarking takes 0.2827 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:57:26.4589772Z Autotune Choices Stats: 2025-09-07T10:57:26.4591134Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.015359999611973763, "best_triton_pos": 1, "best_triton_time": 0.015936000272631645, "best_triton_kernel": "triton_mm_666", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:26.4710510Z AUTOTUNE addmm(1568x800, 1568x1408, 1408x800) 2025-09-07T10:57:26.4710839Z strides: [0, 1], [1408, 1], [1, 1408] 2025-09-07T10:57:26.4711180Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:26.4711524Z bias_addmm 0.0154 ms 100.0% 2025-09-07T10:57:26.4712168Z triton_mm_666 0.0159 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:26.4713220Z triton_mm_655 0.0177 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:26.4714434Z triton_mm_659 0.0179 ms 85.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:26.4715057Z addmm 0.0192 ms 79.9% 2025-09-07T10:57:26.4716079Z triton_mm_665 0.0192 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:26.4717174Z triton_mm_658 0.0201 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:26.4718310Z triton_mm_662 0.0203 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:26.4719290Z triton_mm_660 0.0204 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:26.4720271Z triton_mm_656 0.0217 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:26.4721169Z SingleProcess AUTOTUNE benchmarking takes 0.2838 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:57:26.7644088Z Autotune Choices Stats: 2025-09-07T10:57:26.7645394Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.014271999709308147, "best_triton_pos": 1, "best_triton_time": 0.0163199994713068, "best_triton_kernel": "triton_mm_704", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:26.7772783Z AUTOTUNE addmm(1568x800, 1568x1472, 1472x800) 2025-09-07T10:57:26.7773099Z strides: [0, 1], [1472, 1], [1, 1472] 2025-09-07T10:57:26.7773436Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:26.7773948Z bias_addmm 0.0143 ms 100.0% 2025-09-07T10:57:26.7774992Z triton_mm_704 0.0163 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:26.7776071Z triton_mm_693 0.0182 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:26.7777055Z triton_mm_697 0.0184 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:26.7778034Z triton_mm_703 0.0195 ms 73.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:26.7778651Z addmm 0.0196 ms 72.9% 2025-09-07T10:57:26.7779252Z triton_mm_696 0.0208 ms 68.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:26.7780237Z triton_mm_700 0.0209 ms 68.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:26.7781220Z triton_mm_698 0.0214 ms 66.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:26.7782212Z triton_mm_694 0.0231 ms 61.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:26.7782949Z SingleProcess AUTOTUNE benchmarking takes 0.2855 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:57:27.0734753Z Autotune Choices Stats: 2025-09-07T10:57:27.0736158Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.015200000256299973, "best_triton_pos": 1, "best_triton_time": 0.016416000202298164, "best_triton_kernel": "triton_mm_742", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:27.0867419Z AUTOTUNE addmm(1568x800, 1568x1536, 1536x800) 2025-09-07T10:57:27.0867744Z strides: [0, 1], [1536, 1], [1, 1536] 2025-09-07T10:57:27.0868328Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:27.0868664Z bias_addmm 0.0152 ms 100.0% 2025-09-07T10:57:27.0869301Z triton_mm_742 0.0164 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:27.0870305Z triton_mm_731 0.0188 ms 80.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:27.0871346Z triton_mm_735 0.0191 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:27.0872331Z triton_mm_741 0.0200 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:27.0872877Z addmm 0.0200 ms 75.9% 2025-09-07T10:57:27.0873376Z triton_mm_736 0.0212 ms 71.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:27.0874443Z triton_mm_734 0.0213 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:27.0875304Z triton_mm_738 0.0214 ms 71.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:27.0876398Z triton_mm_732 0.0226 ms 67.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:27.0877228Z SingleProcess AUTOTUNE benchmarking takes 0.2882 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:57:27.3857248Z Autotune Choices Stats: 2025-09-07T10:57:27.3858616Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.01548799965530634, "best_triton_pos": 1, "best_triton_time": 0.017184000462293625, "best_triton_kernel": "triton_mm_780", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:27.3997790Z AUTOTUNE addmm(1568x800, 1568x1600, 1600x800) 2025-09-07T10:57:27.3998121Z strides: [0, 1], [1600, 1], [1, 1600] 2025-09-07T10:57:27.3998468Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:27.3998848Z bias_addmm 0.0155 ms 100.0% 2025-09-07T10:57:27.3999472Z triton_mm_780 0.0172 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:27.4000452Z triton_mm_769 0.0193 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:27.4001453Z triton_mm_773 0.0196 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:27.4002137Z addmm 0.0203 ms 76.2% 2025-09-07T10:57:27.4002707Z triton_mm_779 0.0208 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:27.4003668Z triton_mm_772 0.0221 ms 69.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:27.4005446Z triton_mm_774 0.0225 ms 68.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:27.4006585Z triton_mm_776 0.0226 ms 68.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:27.4007579Z triton_mm_770 0.0242 ms 63.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:27.4008429Z SingleProcess AUTOTUNE benchmarking takes 0.2917 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:57:27.7024939Z Autotune Choices Stats: 2025-09-07T10:57:27.7026329Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.01568000018596649, "best_triton_pos": 1, "best_triton_time": 0.018144000321626663, "best_triton_kernel": "triton_mm_818", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:27.7160339Z AUTOTUNE addmm(1568x800, 1568x1664, 1664x800) 2025-09-07T10:57:27.7160662Z strides: [0, 1], [1664, 1], [1, 1664] 2025-09-07T10:57:27.7161009Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:27.7161352Z bias_addmm 0.0157 ms 100.0% 2025-09-07T10:57:27.7162138Z triton_mm_818 0.0181 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:27.7163554Z triton_mm_807 0.0192 ms 81.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:27.7164735Z triton_mm_811 0.0201 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:27.7165362Z addmm 0.0207 ms 75.6% 2025-09-07T10:57:27.7165953Z triton_mm_812 0.0217 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:27.7166923Z triton_mm_817 0.0218 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:27.7167881Z triton_mm_814 0.0223 ms 70.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:27.7168848Z triton_mm_810 0.0224 ms 70.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:27.7169803Z triton_mm_808 0.0236 ms 66.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:27.7170646Z SingleProcess AUTOTUNE benchmarking takes 0.2951 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:57:28.0178295Z Autotune Choices Stats: 2025-09-07T10:57:28.0179655Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.01692800037562847, "best_triton_pos": 1, "best_triton_time": 0.018112000077962875, "best_triton_kernel": "triton_mm_856", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:28.0305848Z AUTOTUNE addmm(1568x800, 1568x1728, 1728x800) 2025-09-07T10:57:28.0306773Z strides: [0, 1], [1728, 1], [1, 1728] 2025-09-07T10:57:28.0307130Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:28.0307477Z bias_addmm 0.0169 ms 100.0% 2025-09-07T10:57:28.0308114Z triton_mm_856 0.0181 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:28.0309263Z triton_mm_845 0.0201 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:28.0310253Z triton_mm_849 0.0206 ms 82.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:28.0310873Z addmm 0.0215 ms 78.7% 2025-09-07T10:57:28.0311470Z triton_mm_855 0.0218 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:28.0312493Z triton_mm_850 0.0235 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:28.0313355Z triton_mm_852 0.0240 ms 70.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:28.0314555Z triton_mm_848 0.0241 ms 70.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:28.0315413Z triton_mm_846 0.0246 ms 68.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:28.0316377Z SingleProcess AUTOTUNE benchmarking takes 0.2924 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:57:28.3320784Z Autotune Choices Stats: 2025-09-07T10:57:28.3322247Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.01635199971497059, "best_triton_pos": 1, "best_triton_time": 0.018624000251293182, "best_triton_kernel": "triton_mm_894", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:28.3447739Z AUTOTUNE addmm(1568x800, 1568x1792, 1792x800) 2025-09-07T10:57:28.3448050Z strides: [0, 1], [1792, 1], [1, 1792] 2025-09-07T10:57:28.3448367Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:28.3448684Z bias_addmm 0.0164 ms 100.0% 2025-09-07T10:57:28.3449316Z triton_mm_894 0.0186 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:28.3450324Z triton_mm_883 0.0207 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:28.3451276Z triton_mm_887 0.0210 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:28.3451881Z addmm 0.0211 ms 77.7% 2025-09-07T10:57:28.3452464Z triton_mm_893 0.0221 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:28.3453402Z triton_mm_888 0.0228 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:28.3454449Z triton_mm_886 0.0246 ms 66.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:28.3455599Z triton_mm_890 0.0246 ms 66.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:28.3456478Z triton_mm_884 0.0250 ms 65.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:28.3457387Z SingleProcess AUTOTUNE benchmarking takes 0.2928 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:57:28.6482286Z Autotune Choices Stats: 2025-09-07T10:57:28.6483565Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.0163199994713068, "best_triton_pos": 1, "best_triton_time": 0.018848000094294548, "best_triton_kernel": "triton_mm_932", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:28.6611985Z AUTOTUNE addmm(1568x800, 1568x1856, 1856x800) 2025-09-07T10:57:28.6612411Z strides: [0, 1], [1856, 1], [1, 1856] 2025-09-07T10:57:28.6612817Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:28.6613244Z bias_addmm 0.0163 ms 100.0% 2025-09-07T10:57:28.6614114Z triton_mm_932 0.0188 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:28.6615152Z triton_mm_921 0.0207 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:28.6615810Z addmm 0.0215 ms 75.8% 2025-09-07T10:57:28.6616869Z triton_mm_925 0.0220 ms 74.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:28.6617926Z triton_mm_931 0.0227 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:28.6618923Z triton_mm_926 0.0245 ms 66.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:28.6619913Z triton_mm_928 0.0255 ms 64.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:28.6620874Z triton_mm_924 0.0256 ms 63.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:28.6621854Z triton_mm_922 0.0257 ms 63.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:28.6622704Z SingleProcess AUTOTUNE benchmarking takes 0.2956 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:57:28.9478016Z Autotune Choices Stats: 2025-09-07T10:57:28.9479313Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.016607999801635742, "best_triton_pos": 1, "best_triton_time": 0.019360000267624855, "best_triton_kernel": "triton_mm_970", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:28.9605829Z AUTOTUNE addmm(1568x800, 1568x1920, 1920x800) 2025-09-07T10:57:28.9606412Z strides: [0, 1], [1920, 1], [1, 1920] 2025-09-07T10:57:28.9606773Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:28.9607106Z bias_addmm 0.0166 ms 100.0% 2025-09-07T10:57:28.9607744Z triton_mm_970 0.0194 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:28.9608667Z addmm 0.0216 ms 76.8% 2025-09-07T10:57:28.9609263Z triton_mm_963 0.0220 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:28.9610349Z triton_mm_959 0.0228 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:28.9611339Z triton_mm_969 0.0235 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:28.9612316Z triton_mm_964 0.0236 ms 70.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:28.9613296Z triton_mm_960 0.0259 ms 64.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:28.9614474Z triton_mm_962 0.0260 ms 63.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:28.9615380Z triton_mm_966 0.0260 ms 63.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:28.9616182Z SingleProcess AUTOTUNE benchmarking takes 0.2937 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:57:29.2472966Z Autotune Choices Stats: 2025-09-07T10:57:29.2474924Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.017472000792622566, "best_triton_pos": 1, "best_triton_time": 0.019487999379634857, "best_triton_kernel": "triton_mm_1008", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:29.2596371Z AUTOTUNE addmm(1568x800, 1568x1984, 1984x800) 2025-09-07T10:57:29.2596757Z strides: [0, 1], [1984, 1], [1, 1984] 2025-09-07T10:57:29.2597211Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:29.2597570Z bias_addmm 0.0175 ms 100.0% 2025-09-07T10:57:29.2598254Z triton_mm_1008 0.0195 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:29.2598960Z addmm 0.0221 ms 78.9% 2025-09-07T10:57:29.2599578Z triton_mm_1001 0.0223 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:29.2600599Z triton_mm_997 0.0228 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:29.2601604Z triton_mm_1007 0.0237 ms 73.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:29.2602771Z triton_mm_1002 0.0248 ms 70.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:29.2605168Z triton_mm_1000 0.0261 ms 66.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:29.2606202Z triton_mm_1004 0.0264 ms 66.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:29.2607487Z triton_mm_998 0.0266 ms 65.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:29.2608353Z SingleProcess AUTOTUNE benchmarking takes 0.2937 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:57:29.5631057Z Autotune Choices Stats: 2025-09-07T10:57:29.5632438Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.017632000148296356, "best_triton_pos": 1, "best_triton_time": 0.0197759997099638, "best_triton_kernel": "triton_mm_1046", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:29.5764584Z AUTOTUNE addmm(1568x800, 1568x2048, 2048x800) 2025-09-07T10:57:29.5765071Z strides: [0, 1], [2048, 1], [1, 2048] 2025-09-07T10:57:29.5765423Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:29.5765756Z bias_addmm 0.0176 ms 100.0% 2025-09-07T10:57:29.5766379Z triton_mm_1046 0.0198 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:29.5767027Z addmm 0.0223 ms 78.9% 2025-09-07T10:57:29.5767604Z triton_mm_1039 0.0230 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:29.5768598Z triton_mm_1035 0.0233 ms 75.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:29.5769592Z triton_mm_1040 0.0241 ms 73.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:29.5771013Z triton_mm_1045 0.0241 ms 73.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:29.5772001Z triton_mm_1038 0.0269 ms 65.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:29.5772971Z triton_mm_1042 0.0269 ms 65.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:29.5774106Z triton_mm_1036 0.0272 ms 64.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:29.5774888Z SingleProcess AUTOTUNE benchmarking takes 0.2965 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:57:29.8856420Z Autotune Choices Stats: 2025-09-07T10:57:29.8857785Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.018303999677300453, "best_triton_pos": 1, "best_triton_time": 0.020255999639630318, "best_triton_kernel": "triton_mm_1084", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:29.8995540Z AUTOTUNE addmm(1568x800, 1568x2112, 2112x800) 2025-09-07T10:57:29.8996049Z strides: [0, 1], [2112, 1], [1, 2112] 2025-09-07T10:57:29.8996550Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:29.8997169Z bias_addmm 0.0183 ms 100.0% 2025-09-07T10:57:29.8998126Z triton_mm_1084 0.0203 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:29.8999105Z addmm 0.0234 ms 78.1% 2025-09-07T10:57:29.9000039Z triton_mm_1077 0.0234 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:29.9002247Z triton_mm_1073 0.0242 ms 75.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:29.9004510Z triton_mm_1083 0.0249 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:29.9006112Z triton_mm_1078 0.0259 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:29.9007644Z triton_mm_1076 0.0275 ms 66.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:29.9009172Z triton_mm_1074 0.0277 ms 66.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:29.9010664Z triton_mm_1080 0.0277 ms 66.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:29.9011972Z SingleProcess AUTOTUNE benchmarking takes 0.3016 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:57:30.2063330Z Autotune Choices Stats: 2025-09-07T10:57:30.2065474Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.018112000077962875, "best_triton_pos": 1, "best_triton_time": 0.020576000213623047, "best_triton_kernel": "triton_mm_1122", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:30.2196254Z AUTOTUNE addmm(1568x800, 1568x2176, 2176x800) 2025-09-07T10:57:30.2196568Z strides: [0, 1], [2176, 1], [1, 2176] 2025-09-07T10:57:30.2196864Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:30.2197262Z bias_addmm 0.0181 ms 100.0% 2025-09-07T10:57:30.2197795Z triton_mm_1122 0.0206 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:30.2198355Z addmm 0.0228 ms 79.6% 2025-09-07T10:57:30.2198847Z triton_mm_1115 0.0238 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:30.2199680Z triton_mm_1111 0.0244 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:30.2200508Z triton_mm_1116 0.0250 ms 72.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:30.2201325Z triton_mm_1121 0.0256 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:30.2202148Z triton_mm_1112 0.0278 ms 65.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:30.2203104Z triton_mm_1114 0.0282 ms 64.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:30.2204238Z triton_mm_1118 0.0282 ms 64.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:30.2205228Z SingleProcess AUTOTUNE benchmarking takes 0.2993 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:57:30.5309968Z Autotune Choices Stats: 2025-09-07T10:57:30.5311296Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.019039999693632126, "best_triton_pos": 1, "best_triton_time": 0.020864000543951988, "best_triton_kernel": "triton_mm_1160", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:30.5440005Z AUTOTUNE addmm(1568x800, 1568x2240, 2240x800) 2025-09-07T10:57:30.5440328Z strides: [0, 1], [2240, 1], [1, 2240] 2025-09-07T10:57:30.5440669Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:30.5441002Z bias_addmm 0.0190 ms 100.0% 2025-09-07T10:57:30.5441662Z triton_mm_1160 0.0209 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:30.5442305Z addmm 0.0241 ms 79.0% 2025-09-07T10:57:30.5443027Z triton_mm_1153 0.0243 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:30.5444355Z triton_mm_1149 0.0247 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:30.5445323Z triton_mm_1159 0.0257 ms 74.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:30.5446306Z triton_mm_1154 0.0265 ms 71.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:30.5447557Z triton_mm_1150 0.0283 ms 67.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:30.5448542Z triton_mm_1152 0.0287 ms 66.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:30.5449511Z triton_mm_1156 0.0289 ms 66.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:30.5450359Z SingleProcess AUTOTUNE benchmarking takes 0.3039 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:57:30.8554209Z Autotune Choices Stats: 2025-09-07T10:57:30.8555592Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.01897599920630455, "best_triton_pos": 1, "best_triton_time": 0.02147199958562851, "best_triton_kernel": "triton_mm_1198", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:30.8685930Z AUTOTUNE addmm(1568x800, 1568x2304, 2304x800) 2025-09-07T10:57:30.8686450Z strides: [0, 1], [2304, 1], [1, 2304] 2025-09-07T10:57:30.8686953Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:30.8687443Z bias_addmm 0.0190 ms 100.0% 2025-09-07T10:57:30.8688379Z triton_mm_1198 0.0215 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:30.8689367Z addmm 0.0234 ms 81.0% 2025-09-07T10:57:30.8690266Z triton_mm_1191 0.0247 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:30.8691788Z triton_mm_1187 0.0252 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:30.8693984Z triton_mm_1192 0.0259 ms 73.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:30.8695746Z triton_mm_1197 0.0267 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:30.8697203Z triton_mm_1188 0.0286 ms 66.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:30.8698681Z triton_mm_1190 0.0292 ms 64.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:30.8700178Z triton_mm_1194 0.0295 ms 64.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:30.8701472Z SingleProcess AUTOTUNE benchmarking takes 0.3038 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:57:31.1647688Z Autotune Choices Stats: 2025-09-07T10:57:31.1649621Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.019007999449968338, "best_triton_pos": 1, "best_triton_time": 0.02143999934196472, "best_triton_kernel": "triton_mm_1236", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:31.1771735Z AUTOTUNE addmm(1568x800, 1568x2368, 2368x800) 2025-09-07T10:57:31.1772203Z strides: [0, 1], [2368, 1], [1, 2368] 2025-09-07T10:57:31.1772696Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:31.1773668Z bias_addmm 0.0190 ms 100.0% 2025-09-07T10:57:31.1774835Z triton_mm_1236 0.0214 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:31.1775787Z addmm 0.0239 ms 79.4% 2025-09-07T10:57:31.1776677Z triton_mm_1229 0.0251 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:31.1778203Z triton_mm_1225 0.0257 ms 73.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:31.1779665Z triton_mm_1235 0.0268 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:31.1781170Z triton_mm_1230 0.0275 ms 69.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:31.1782665Z triton_mm_1226 0.0294 ms 64.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:31.1784358Z triton_mm_1232 0.0299 ms 63.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:31.1785854Z triton_mm_1228 0.0301 ms 63.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:31.1787157Z SingleProcess AUTOTUNE benchmarking takes 0.3029 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:57:31.4831644Z Autotune Choices Stats: 2025-09-07T10:57:31.4833536Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.015135999768972397, "best_triton_pos": 1, "best_triton_time": 0.01648000068962574, "best_triton_kernel": "triton_mm_1313", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T10:57:31.4959903Z AUTOTUNE addmm(392x1600, 392x2432, 2432x1600) 2025-09-07T10:57:31.4960353Z strides: [0, 1], [2432, 1], [1, 2432] 2025-09-07T10:57:31.4960718Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:31.4961067Z bias_addmm 0.0151 ms 100.0% 2025-09-07T10:57:31.4961727Z triton_mm_1313 0.0165 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:31.4962400Z addmm 0.0188 ms 80.3% 2025-09-07T10:57:31.4963153Z triton_mm_1319 0.0211 ms 71.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:31.4964420Z triton_mm_1309 0.0215 ms 70.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:31.4965412Z triton_mm_1308 0.0244 ms 62.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:31.4966390Z triton_mm_1312 0.0247 ms 61.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:31.4967366Z triton_mm_1318 0.0268 ms 56.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:31.4968694Z triton_mm_1305 0.0272 ms 55.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:31.4969685Z triton_mm_1302 0.0278 ms 54.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:57:31.4970547Z SingleProcess AUTOTUNE benchmarking takes 0.2988 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:57:32.0577161Z Autotune Choices Stats: 2025-09-07T10:57:32.0578526Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.016224000602960587, "best_triton_pos": 1, "best_triton_time": 0.01696000061929226, "best_triton_kernel": "triton_mm_1351", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T10:57:32.0703077Z AUTOTUNE addmm(392x1600, 392x2560, 2560x1600) 2025-09-07T10:57:32.0703461Z strides: [0, 1], [2560, 1], [1, 2560] 2025-09-07T10:57:32.0703965Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:32.0704337Z bias_addmm 0.0162 ms 100.0% 2025-09-07T10:57:32.0704984Z triton_mm_1351 0.0170 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:32.0705631Z addmm 0.0197 ms 82.2% 2025-09-07T10:57:32.0706228Z triton_mm_1357 0.0217 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:32.0707201Z triton_mm_1347 0.0220 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:32.0708185Z triton_mm_1350 0.0253 ms 64.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:32.0709747Z triton_mm_1346 0.0254 ms 63.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:32.0710724Z triton_mm_1343 0.0258 ms 62.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:32.0711897Z triton_mm_1356 0.0277 ms 58.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:32.0712886Z triton_mm_1340 0.0289 ms 56.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:57:32.0713971Z SingleProcess AUTOTUNE benchmarking takes 0.3006 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:57:33.0520095Z Autotune Choices Stats: 2025-09-07T10:57:33.0521543Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.12108799815177917, "best_triton_pos": 1, "best_triton_time": 0.20214399695396423, "best_triton_kernel": "triton_convolution2d_5", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T10:57:33.0661680Z AUTOTUNE convolution(8x3x224x224, 128x3x7x7) 2025-09-07T10:57:33.0662028Z strides: [150528, 1, 672, 3], [147, 1, 21, 3] 2025-09-07T10:57:33.0662345Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:57:33.0662645Z convolution 0.1211 ms 100.0% 2025-09-07T10:57:33.0664228Z triton_convolution2d_5 0.2021 ms 59.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:57:33.0665327Z triton_convolution2d_3 0.2106 ms 57.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:57:33.0666356Z triton_convolution2d_0 0.2201 ms 55.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:57:33.0667358Z triton_convolution2d_4 0.2434 ms 49.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:57:33.0668368Z triton_convolution2d_6 0.3310 ms 36.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:57:33.0669369Z triton_convolution2d_2 0.3323 ms 36.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:57:33.0670388Z triton_convolution2d_1 0.7410 ms 16.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=7, KERNEL_W=7, PADDING_H=3, PADDING_W=3, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:57:33.0671184Z SingleProcess AUTOTUNE benchmarking takes 0.3127 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:57:33.3070060Z Autotune Choices Stats: 2025-09-07T10:57:33.3071334Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.016383999958634377, "best_triton_pos": 1, "best_triton_time": 0.01788800023496151, "best_triton_kernel": "triton_mm_23", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T10:57:33.3200036Z AUTOTUNE mm(25088x128, 128x296) 2025-09-07T10:57:33.3200329Z strides: [128, 1], [1, 128] 2025-09-07T10:57:33.3200605Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:57:33.3200886Z mm 0.0164 ms 100.0% 2025-09-07T10:57:33.3201691Z triton_mm_23 0.0179 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:33.3202697Z triton_mm_20 0.0180 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:33.3203913Z triton_mm_21 0.0185 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:33.3204900Z triton_mm_13 0.0188 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:57:33.3205798Z triton_mm_16 0.0188 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:33.3206676Z triton_mm_18 0.0193 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:33.3207568Z triton_mm_17 0.0200 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:33.3208465Z triton_mm_24 0.0204 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:33.3209604Z triton_mm_14 0.0210 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:33.3210408Z SingleProcess AUTOTUNE benchmarking takes 0.2527 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T10:57:33.5731969Z Autotune Choices Stats: 2025-09-07T10:57:33.5732954Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_62", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.028511999174952507, "best_triton_pos": 0} 2025-09-07T10:57:33.5862851Z AUTOTUNE mm(25088x200, 200x276) 2025-09-07T10:57:33.5863231Z strides: [200, 1], [1, 200] 2025-09-07T10:57:33.5863509Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:57:33.5864385Z triton_mm_62 0.0285 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:33.5865395Z triton_mm_61 0.0293 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:33.5866399Z triton_mm_63 0.0303 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:33.5867373Z triton_mm_59 0.0319 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:33.5868341Z triton_mm_58 0.0320 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:33.5869314Z triton_mm_56 0.0325 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:33.5871141Z triton_mm_60 0.0334 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T10:57:33.5872305Z triton_mm_52 0.0341 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:33.5873303Z triton_mm_54 0.0391 ms 72.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:33.5874406Z triton_mm_55 0.0395 ms 72.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:33.5875113Z SingleProcess AUTOTUNE benchmarking takes 0.2654 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T10:57:33.7053497Z Autotune Choices Stats: 2025-09-07T10:57:33.7055167Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.020608000457286835, "best_triton_pos": 1, "best_triton_time": 0.020767999812960625, "best_triton_kernel": "triton_convolution2d_181", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8"} 2025-09-07T10:57:33.7189269Z AUTOTUNE convolution(8x376x56x56, 640x376x1x1) 2025-09-07T10:57:33.7189741Z strides: [1179136, 1, 21056, 376], [376, 1, 1, 1] 2025-09-07T10:57:33.7190060Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:57:33.7190350Z convolution 0.0206 ms 100.0% 2025-09-07T10:57:33.7191707Z triton_convolution2d_181 0.0208 ms 99.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:57:33.7193047Z triton_convolution2d_184 0.0210 ms 98.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:57:33.7194457Z triton_convolution2d_182 0.0267 ms 77.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:57:33.7195513Z triton_convolution2d_179 0.0275 ms 75.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:57:33.7196588Z triton_convolution2d_183 0.0288 ms 71.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:57:33.7197772Z triton_convolution2d_178 0.0310 ms 66.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:57:33.7198838Z triton_convolution2d_180 0.0887 ms 23.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:57:33.7199680Z SingleProcess AUTOTUNE benchmarking takes 0.1206 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:57:33.9746193Z Autotune Choices Stats: 2025-09-07T10:57:33.9747823Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.014911999925971031, "best_triton_pos": 1, "best_triton_time": 0.014911999925971031, "best_triton_kernel": "triton_mm_221", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T10:57:33.9887682Z AUTOTUNE mm(6272x400, 400x576) 2025-09-07T10:57:33.9888065Z strides: [400, 1], [1, 400] 2025-09-07T10:57:33.9888364Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:57:33.9889353Z mm 0.0149 ms 100.0% 2025-09-07T10:57:33.9890112Z triton_mm_221 0.0149 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:33.9891368Z triton_mm_220 0.0156 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:33.9892643Z triton_mm_222 0.0173 ms 86.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:33.9894468Z triton_mm_210 0.0181 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:57:33.9895780Z triton_mm_217 0.0184 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:33.9897048Z triton_mm_219 0.0187 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T10:57:33.9898340Z triton_mm_214 0.0189 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:33.9899956Z triton_mm_213 0.0192 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:33.9901258Z triton_mm_218 0.0194 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:33.9902380Z SingleProcess AUTOTUNE benchmarking takes 0.2684 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T10:57:34.1468807Z Autotune Choices Stats: 2025-09-07T10:57:34.1470158Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.030400000512599945, "best_triton_pos": 1, "best_triton_time": 0.0315839983522892, "best_triton_kernel": "triton_convolution2d_493", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:57:34.1609709Z AUTOTUNE convolution(8x1152x28x28, 1152x1152x1x1) 2025-09-07T10:57:34.1610092Z strides: [903168, 1, 32256, 1152], [1152, 1, 1, 1] 2025-09-07T10:57:34.1610473Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:57:34.1610765Z convolution 0.0304 ms 100.0% 2025-09-07T10:57:34.1614951Z triton_convolution2d_493 0.0316 ms 96.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:57:34.1616302Z triton_convolution2d_492 0.0365 ms 83.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:57:34.1617577Z triton_convolution2d_495 0.0367 ms 82.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:57:34.1618857Z triton_convolution2d_494 0.0373 ms 81.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:57:34.1620314Z triton_convolution2d_490 0.0498 ms 61.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:57:34.1621696Z triton_convolution2d_489 0.0507 ms 60.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:57:34.1622919Z triton_convolution2d_491 0.1704 ms 17.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:57:34.1624131Z SingleProcess AUTOTUNE benchmarking takes 0.1422 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:57:34.4145566Z Autotune Choices Stats: 2025-09-07T10:57:34.4146907Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.012896000407636166, "best_triton_pos": 1, "best_triton_time": 0.01375999953597784, "best_triton_kernel": "triton_mm_533", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:57:34.4287568Z AUTOTUNE mm(1568x800, 800x1088) 2025-09-07T10:57:34.4287922Z strides: [800, 1], [1, 800] 2025-09-07T10:57:34.4288202Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:57:34.4288499Z mm 0.0129 ms 100.0% 2025-09-07T10:57:34.4289169Z triton_mm_533 0.0138 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:34.4290739Z triton_mm_526 0.0144 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:34.4291855Z triton_mm_525 0.0149 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:34.4292926Z triton_mm_532 0.0151 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:34.4294250Z triton_mm_529 0.0152 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:34.4295285Z triton_mm_528 0.0170 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:34.4296242Z triton_mm_522 0.0171 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:34.4297212Z triton_mm_524 0.0171 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:34.4298410Z triton_mm_527 0.0173 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:34.4299277Z SingleProcess AUTOTUNE benchmarking takes 0.2664 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T10:57:34.6701832Z Autotune Choices Stats: 2025-09-07T10:57:34.6703310Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.036031998693943024, "best_triton_pos": 1, "best_triton_time": 0.060127999633550644, "best_triton_kernel": "triton_convolution2d_1260", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:57:34.6836318Z AUTOTUNE convolution(8x2432x14x14, 2304x2432x1x1) 2025-09-07T10:57:34.6836677Z strides: [476672, 1, 34048, 2432], [2432, 1, 1, 1] 2025-09-07T10:57:34.6837368Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:57:34.6837645Z convolution 0.0360 ms 100.0% 2025-09-07T10:57:34.6838330Z triton_convolution2d_1260 0.0601 ms 59.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:57:34.6839482Z triton_convolution2d_1262 0.0695 ms 51.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:57:34.6840654Z triton_convolution2d_1259 0.0709 ms 50.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:57:34.6841810Z triton_convolution2d_1261 0.0756 ms 47.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:57:34.6842950Z triton_convolution2d_1257 0.0961 ms 37.5% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:57:34.6844346Z triton_convolution2d_1256 0.1013 ms 35.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:57:34.6845738Z triton_convolution2d_1258 0.2914 ms 12.4% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:57:34.6846737Z SingleProcess AUTOTUNE benchmarking takes 0.1693 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:57:34.9393152Z Autotune Choices Stats: 2025-09-07T10:57:34.9394821Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1294", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.014368000440299511, "best_triton_pos": 0} 2025-09-07T10:57:34.9533368Z AUTOTUNE mm(392x1600, 1600x2176) 2025-09-07T10:57:34.9534046Z strides: [1600, 1], [1, 1600] 2025-09-07T10:57:34.9534414Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:57:34.9535314Z triton_mm_1294 0.0144 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:34.9536165Z mm 0.0144 ms 99.6% 2025-09-07T10:57:34.9536945Z triton_mm_1300 0.0161 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:34.9538945Z triton_mm_1289 0.0175 ms 82.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:57:34.9540296Z triton_mm_1290 0.0177 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:34.9541598Z triton_mm_1293 0.0187 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:34.9542922Z triton_mm_1299 0.0201 ms 71.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:34.9544750Z triton_mm_1292 0.0209 ms 68.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:34.9546276Z triton_mm_1296 0.0214 ms 67.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:57:34.9547579Z triton_mm_1286 0.0260 ms 55.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:34.9548683Z SingleProcess AUTOTUNE benchmarking takes 0.2678 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T10:57:35.2588958Z Autotune Choices Stats: 2025-09-07T10:57:35.2590646Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_1381", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.014240000396966934, "best_triton_pos": 0} 2025-09-07T10:57:35.2739670Z AUTOTUNE addmm(8x1000, 8x2688, 2688x1000) 2025-09-07T10:57:35.2740282Z strides: [0, 1], [2688, 1], [1, 2688] 2025-09-07T10:57:35.2740789Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:57:35.2741880Z triton_mm_1381 0.0142 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:57:35.2742865Z bias_addmm 0.0145 ms 98.2% 2025-09-07T10:57:35.2744415Z triton_mm_1385 0.0150 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:35.2746620Z triton_mm_1389 0.0162 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:57:35.2748189Z triton_mm_1393 0.0179 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:57:35.2749160Z addmm 0.0186 ms 76.5% 2025-09-07T10:57:35.2750068Z triton_mm_1380 0.0209 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:57:35.2751559Z triton_mm_1379 0.0221 ms 64.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:57:35.2753044Z triton_mm_1384 0.0222 ms 64.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:57:35.2754798Z triton_mm_1378 0.0225 ms 63.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T10:57:35.2756421Z SingleProcess AUTOTUNE benchmarking takes 0.3109 seconds and 0.0003 seconds precompiling for 19 choices 2025-09-07T10:57:49.8198490Z pass 2025-09-07T10:57:54.6523643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:57:54.6525109Z import pynvml # type: ignore[import] 2025-09-07T10:57:57.6514816Z 2025-09-07T10:57:58.9342215Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:57:58.9342532Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:57:58.9389650Z cuda eval eca_botnext26ts_256 2025-09-07T10:58:18.0177523Z Autotune Choices Stats: 2025-09-07T10:58:18.0178508Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_65", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.01568000018596649, "best_triton_pos": 0} 2025-09-07T10:58:18.0309753Z AUTOTUNE addmm(32768x256, 32768x64, 64x256) 2025-09-07T10:58:18.0310055Z strides: [0, 1], [64, 1], [1, 64] 2025-09-07T10:58:18.0310417Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:58:18.0311124Z triton_mm_65 0.0157 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:18.0312112Z triton_mm_69 0.0157 ms 99.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:18.0313070Z triton_mm_68 0.0157 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:18.0317247Z triton_mm_75 0.0159 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:18.0318258Z triton_mm_64 0.0161 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:18.0319219Z triton_mm_66 0.0163 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:18.0320356Z triton_mm_70 0.0164 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:18.0321176Z triton_mm_74 0.0166 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:18.0321965Z triton_mm_67 0.0167 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:18.0322751Z triton_mm_71 0.0171 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:18.0323466Z SingleProcess AUTOTUNE benchmarking takes 0.2904 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:58:18.9841533Z Autotune Choices Stats: 2025-09-07T10:58:18.9842614Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_130", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.01727999933063984, "best_triton_pos": 0} 2025-09-07T10:58:18.9970897Z AUTOTUNE addmm(32768x128, 32768x256, 256x128) 2025-09-07T10:58:18.9971175Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T10:58:18.9971792Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:58:18.9972450Z triton_mm_130 0.0173 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:18.9973365Z triton_mm_129 0.0184 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:18.9974462Z triton_mm_124 0.0189 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:18.9975522Z triton_mm_119 0.0198 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:58:18.9976421Z triton_mm_123 0.0198 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:18.9977460Z triton_mm_126 0.0198 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:18.9978358Z triton_mm_128 0.0199 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T10:58:18.9979253Z triton_mm_122 0.0200 ms 86.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:18.9980146Z triton_mm_127 0.0200 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:18.9980716Z bias_addmm 0.0203 ms 85.0% 2025-09-07T10:58:18.9981140Z SingleProcess AUTOTUNE benchmarking takes 0.2867 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:58:19.5533688Z Autotune Choices Stats: 2025-09-07T10:58:19.5535005Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_186", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.01196799986064434, "best_triton_pos": 0} 2025-09-07T10:58:19.5705809Z AUTOTUNE addmm(8192x512, 8192x128, 128x512) 2025-09-07T10:58:19.5706173Z strides: [0, 1], [128, 1], [1, 128] 2025-09-07T10:58:19.5707025Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:58:19.5707834Z triton_mm_186 0.0120 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:19.5708627Z bias_addmm 0.0123 ms 97.7% 2025-09-07T10:58:19.5709388Z triton_mm_190 0.0123 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:19.5710624Z triton_mm_187 0.0123 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:19.5711806Z triton_mm_183 0.0125 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:58:19.5712950Z triton_mm_191 0.0125 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:19.5714367Z triton_mm_192 0.0129 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T10:58:19.5715808Z triton_mm_188 0.0144 ms 83.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:19.5717031Z triton_mm_184 0.0146 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:19.5718156Z triton_mm_189 0.0146 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:19.5719420Z SingleProcess AUTOTUNE benchmarking takes 0.2954 seconds and 0.0003 seconds precompiling for 21 choices 2025-09-07T10:58:20.6847486Z Autotune Choices Stats: 2025-09-07T10:58:20.6848605Z {"num_choices": 20, "num_triton_choices": 18, "best_kernel": "triton_mm_87", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.01539199985563755, "best_triton_pos": 0} 2025-09-07T10:58:20.6975844Z AUTOTUNE addmm(32768x64, 32768x256, 256x64) 2025-09-07T10:58:20.6976177Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T10:58:20.6976475Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:58:20.6977117Z triton_mm_87 0.0154 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:20.6978074Z triton_mm_92 0.0157 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:20.6978989Z triton_mm_83 0.0157 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:20.6979566Z bias_addmm 0.0167 ms 92.3% 2025-09-07T10:58:20.6980129Z triton_mm_89 0.0173 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:20.6981028Z triton_mm_90 0.0173 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:20.6981868Z triton_mm_86 0.0175 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:20.6982942Z triton_mm_85 0.0176 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:20.6984154Z triton_mm_93 0.0180 ms 85.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:20.6985002Z triton_mm_82 0.0182 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:58:20.6985740Z SingleProcess AUTOTUNE benchmarking takes 0.2715 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T10:58:20.9866728Z Autotune Choices Stats: 2025-09-07T10:58:20.9868095Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.013120000250637531, "best_triton_pos": 1, "best_triton_time": 0.013728000223636627, "best_triton_kernel": "triton_mm_207", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T10:58:20.9998626Z AUTOTUNE addmm(8192x256, 8192x512, 512x256) 2025-09-07T10:58:20.9998940Z strides: [0, 1], [512, 1], [1, 512] 2025-09-07T10:58:20.9999676Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:58:21.0000049Z bias_addmm 0.0131 ms 100.0% 2025-09-07T10:58:21.0000717Z triton_mm_207 0.0137 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:21.0001578Z triton_mm_214 0.0140 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:21.0002421Z triton_mm_203 0.0142 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:21.0003431Z triton_mm_213 0.0143 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:21.0004826Z triton_mm_210 0.0152 ms 86.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:21.0005841Z triton_mm_206 0.0156 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:21.0006685Z triton_mm_208 0.0157 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:21.0007534Z triton_mm_209 0.0159 ms 82.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:21.0008379Z triton_mm_205 0.0160 ms 82.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:21.0009127Z SingleProcess AUTOTUNE benchmarking takes 0.2813 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:58:21.3687027Z Autotune Choices Stats: 2025-09-07T10:58:21.3687967Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_352", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.010495999827980995, "best_triton_pos": 0} 2025-09-07T10:58:21.3817434Z AUTOTUNE addmm(2048x1024, 2048x256, 256x1024) 2025-09-07T10:58:21.3817811Z strides: [0, 1], [256, 1], [1, 256] 2025-09-07T10:58:21.3818385Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:58:21.3819147Z triton_mm_352 0.0105 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:21.3819808Z bias_addmm 0.0106 ms 99.1% 2025-09-07T10:58:21.3820445Z triton_mm_358 0.0107 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:21.3821287Z triton_mm_359 0.0109 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:21.3822146Z triton_mm_348 0.0110 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:21.3822988Z triton_mm_355 0.0112 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:21.3823999Z triton_mm_351 0.0113 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:21.3824961Z triton_mm_350 0.0114 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:21.3825818Z triton_mm_354 0.0114 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:21.3826673Z triton_mm_357 0.0117 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:21.3827528Z SingleProcess AUTOTUNE benchmarking takes 0.2743 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:58:22.2151437Z Autotune Choices Stats: 2025-09-07T10:58:22.2152579Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_170", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.011935999616980553, "best_triton_pos": 0} 2025-09-07T10:58:22.2282304Z AUTOTUNE addmm(8192x128, 8192x512, 512x128) 2025-09-07T10:58:22.2282598Z strides: [0, 1], [512, 1], [1, 512] 2025-09-07T10:58:22.2282901Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:58:22.2283567Z triton_mm_170 0.0119 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:22.2284689Z triton_mm_165 0.0125 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:22.2285594Z triton_mm_169 0.0125 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:22.2286173Z bias_addmm 0.0129 ms 92.3% 2025-09-07T10:58:22.2286724Z triton_mm_176 0.0134 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:22.2287639Z triton_mm_168 0.0138 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:22.2288534Z triton_mm_166 0.0140 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:22.2289636Z triton_mm_175 0.0140 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:22.2290569Z triton_mm_172 0.0141 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:22.2291483Z triton_mm_167 0.0152 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:22.2292262Z SingleProcess AUTOTUNE benchmarking takes 0.2804 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:58:22.8410008Z Autotune Choices Stats: 2025-09-07T10:58:22.8411412Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.011648000217974186, "best_triton_pos": 1, "best_triton_time": 0.012480000033974648, "best_triton_kernel": "triton_mm_372", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T10:58:22.8559639Z AUTOTUNE addmm(2048x512, 2048x1024, 1024x512) 2025-09-07T10:58:22.8559992Z strides: [0, 1], [1024, 1], [1, 1024] 2025-09-07T10:58:22.8560742Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:58:22.8561171Z bias_addmm 0.0116 ms 100.0% 2025-09-07T10:58:22.8561830Z triton_mm_372 0.0125 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:22.8562828Z triton_mm_378 0.0145 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:22.8564128Z triton_mm_371 0.0150 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:22.8565248Z triton_mm_367 0.0150 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:22.8566313Z triton_mm_368 0.0158 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:22.8567276Z triton_mm_377 0.0162 ms 71.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:22.8568240Z triton_mm_374 0.0165 ms 70.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:22.8569195Z triton_mm_370 0.0167 ms 69.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:22.8569805Z addmm 0.0176 ms 66.2% 2025-09-07T10:58:22.8570256Z SingleProcess AUTOTUNE benchmarking takes 0.3526 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:58:23.2353258Z Autotune Choices Stats: 2025-09-07T10:58:23.2354765Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "triton_mm_473", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.010463999584317207, "best_triton_pos": 0} 2025-09-07T10:58:23.2489164Z AUTOTUNE addmm(512x2048, 512x512, 512x2048) 2025-09-07T10:58:23.2489513Z strides: [0, 1], [512, 1], [1, 512] 2025-09-07T10:58:23.2489842Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:58:23.2490911Z triton_mm_473 0.0105 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:23.2491605Z bias_addmm 0.0111 ms 94.5% 2025-09-07T10:58:23.2492206Z triton_mm_468 0.0115 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:23.2493084Z triton_mm_472 0.0116 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:23.2494338Z triton_mm_479 0.0121 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:23.2495205Z triton_mm_478 0.0121 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:23.2496044Z triton_mm_471 0.0124 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:23.2497002Z triton_mm_475 0.0124 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:23.2497849Z triton_mm_469 0.0127 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:23.2498679Z triton_mm_474 0.0137 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:23.2499412Z SingleProcess AUTOTUNE benchmarking takes 0.2824 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:58:23.8153010Z Autotune Choices Stats: 2025-09-07T10:58:23.8154871Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.01056000031530857, "best_triton_pos": 1, "best_triton_time": 0.010751999914646149, "best_triton_kernel": "triton_mm_249", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T10:58:23.8287587Z AUTOTUNE addmm(2048x256, 2048x1024, 1024x256) 2025-09-07T10:58:23.8287922Z strides: [0, 1], [1024, 1], [1, 1024] 2025-09-07T10:58:23.8288244Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:58:23.8288564Z bias_addmm 0.0106 ms 100.0% 2025-09-07T10:58:23.8289180Z triton_mm_249 0.0108 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:23.8290211Z triton_mm_253 0.0119 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:23.8291198Z triton_mm_248 0.0134 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:23.8292165Z triton_mm_245 0.0137 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:23.8293093Z triton_mm_259 0.0139 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:23.8294142Z triton_mm_252 0.0142 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:23.8294725Z addmm 0.0148 ms 71.3% 2025-09-07T10:58:23.8295426Z triton_mm_244 0.0149 ms 70.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:23.8296349Z triton_mm_242 0.0154 ms 68.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:58:23.8297143Z SingleProcess AUTOTUNE benchmarking takes 0.2840 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:58:24.3793468Z Autotune Choices Stats: 2025-09-07T10:58:24.3795057Z {"num_choices": 21, "num_triton_choices": 19, "best_kernel": "bias_addmm", "best_time": 0.012480000033974648, "best_triton_pos": 1, "best_triton_time": 0.012575999833643436, "best_triton_kernel": "triton_mm_491", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T10:58:24.3928158Z AUTOTUNE addmm(512x512, 512x2048, 2048x512) 2025-09-07T10:58:24.3928503Z strides: [0, 1], [2048, 1], [1, 2048] 2025-09-07T10:58:24.3928826Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:58:24.3929154Z bias_addmm 0.0125 ms 100.0% 2025-09-07T10:58:24.3930157Z triton_mm_491 0.0126 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:24.3931154Z triton_mm_495 0.0131 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:24.3932120Z triton_mm_499 0.0153 ms 81.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:24.3932756Z addmm 0.0160 ms 78.2% 2025-09-07T10:58:24.3933320Z triton_mm_505 0.0190 ms 65.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:24.3934576Z triton_mm_490 0.0191 ms 65.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:24.3935604Z triton_mm_489 0.0196 ms 63.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:24.3936495Z triton_mm_494 0.0200 ms 62.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:24.3937380Z triton_mm_488 0.0204 ms 61.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:58:24.3938167Z SingleProcess AUTOTUNE benchmarking takes 0.2966 seconds and 0.0002 seconds precompiling for 21 choices 2025-09-07T10:58:24.8385759Z Autotune Choices Stats: 2025-09-07T10:58:24.8386865Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_mm_295", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2", "best_time": 0.006912000011652708, "best_triton_pos": 0} 2025-09-07T10:58:24.8516991Z AUTOTUNE mm(8192x16, 16x31) 2025-09-07T10:58:24.8517500Z strides: [16, 1], [1, 16] 2025-09-07T10:58:24.8517810Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:58:24.8518532Z triton_mm_295 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T10:58:24.8519886Z triton_mm_303 0.0070 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:24.8520872Z triton_mm_296 0.0071 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:58:24.8521947Z triton_mm_297 0.0071 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:24.8522916Z triton_mm_298 0.0071 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:24.8524277Z triton_mm_301 0.0073 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:24.8525251Z triton_mm_299 0.0073 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:24.8526213Z triton_mm_302 0.0073 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:24.8527302Z triton_mm_304 0.0073 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:24.8528271Z triton_mm_300 0.0074 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:58:24.8529117Z SingleProcess AUTOTUNE benchmarking takes 0.1928 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T10:58:26.2103937Z Autotune Choices Stats: 2025-09-07T10:58:26.2105061Z {"num_choices": 12, "num_triton_choices": 11, "best_kernel": "triton_mm_540", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.006399999838322401, "best_triton_pos": 0} 2025-09-07T10:58:26.2232069Z AUTOTUNE mm(2048x16, 16x15) 2025-09-07T10:58:26.2232332Z strides: [16, 1], [1, 16] 2025-09-07T10:58:26.2232812Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:58:26.2233519Z triton_mm_540 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:58:26.2234711Z triton_mm_538 0.0064 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T10:58:26.2235665Z triton_mm_543 0.0064 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:26.2236612Z triton_mm_539 0.0065 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T10:58:26.2237613Z triton_mm_541 0.0065 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:26.2238574Z triton_mm_544 0.0065 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:26.2239534Z triton_mm_542 0.0066 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:58:26.2240650Z triton_mm_547 0.0067 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T10:58:26.2241615Z triton_mm_546 0.0068 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:26.2242767Z triton_mm_548 0.0068 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:26.2243606Z SingleProcess AUTOTUNE benchmarking takes 0.1565 seconds and 0.0002 seconds precompiling for 12 choices 2025-09-07T10:58:27.9882315Z Autotune Choices Stats: 2025-09-07T10:58:27.9883574Z {"num_choices": 7, "num_triton_choices": 6, "best_kernel": "triton_convolution2d_2", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8", "best_time": 0.026976000517606735, "best_triton_pos": 0} 2025-09-07T10:58:28.0020922Z AUTOTUNE convolution(8x3x256x256, 24x3x3x3) 2025-09-07T10:58:28.0021297Z strides: [196608, 1, 768, 3], [27, 1, 9, 3] 2025-09-07T10:58:28.0021598Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:58:28.0022880Z triton_convolution2d_2 0.0270 ms 100.0% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:58:28.0024339Z triton_convolution2d_4 0.0289 ms 93.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:58:28.0025551Z triton_convolution2d_0 0.0332 ms 81.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:58:28.0026288Z convolution 0.0342 ms 78.8% 2025-09-07T10:58:28.0027203Z triton_convolution2d_3 0.0364 ms 74.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:58:28.0028417Z triton_convolution2d_5 0.0409 ms 65.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:58:28.0029745Z triton_convolution2d_1 0.0422 ms 63.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=2, STRIDE_W=2, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:58:28.0030710Z SingleProcess AUTOTUNE benchmarking takes 0.1087 seconds and 0.0002 seconds precompiling for 7 choices 2025-09-07T10:58:28.1075335Z Autotune Choices Stats: 2025-09-07T10:58:28.1076712Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.02816000021994114, "best_triton_pos": 1, "best_triton_time": 0.029664000496268272, "best_triton_kernel": "triton_convolution2d_9", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8"} 2025-09-07T10:58:28.1203218Z AUTOTUNE convolution(8x24x128x128, 32x24x3x3) 2025-09-07T10:58:28.1212705Z strides: [393216, 1, 3072, 24], [216, 1, 72, 24] 2025-09-07T10:58:28.1213093Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:58:28.1213369Z convolution 0.0282 ms 100.0% 2025-09-07T10:58:28.1214275Z triton_convolution2d_9 0.0297 ms 94.9% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:58:28.1215610Z triton_convolution2d_12 0.0304 ms 92.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:58:28.1216716Z triton_convolution2d_11 0.0327 ms 86.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=8 2025-09-07T10:58:28.1217797Z triton_convolution2d_7 0.0337 ms 83.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:58:28.1218951Z triton_convolution2d_6 0.0353 ms 79.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:58:28.1220021Z triton_convolution2d_8 0.0421 ms 66.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=1, num_warps=8 2025-09-07T10:58:28.1221083Z triton_convolution2d_10 0.0618 ms 45.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, GROUPS=1, KERNEL_H=3, KERNEL_W=3, PADDING_H=1, PADDING_W=1, STRIDE_H=1, STRIDE_W=1, UNROLL=False, num_stages=2, num_warps=4 2025-09-07T10:58:28.1222034Z SingleProcess AUTOTUNE benchmarking takes 0.1176 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:58:28.2320068Z Autotune Choices Stats: 2025-09-07T10:58:28.2321156Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "triton_convolution2d_155", "best_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4", "best_time": 0.01548799965530634, "best_triton_pos": 0} 2025-09-07T10:58:28.2445698Z AUTOTUNE convolution(8x256x64x64, 512x256x1x1) 2025-09-07T10:58:28.2446014Z strides: [1048576, 1, 16384, 256], [256, 1, 1, 1] 2025-09-07T10:58:28.2446478Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:58:28.2447189Z triton_convolution2d_155 0.0155 ms 100.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:58:28.2448041Z convolution 0.0158 ms 98.2% 2025-09-07T10:58:28.2448742Z triton_convolution2d_154 0.0158 ms 97.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:58:28.2449891Z triton_convolution2d_156 0.0165 ms 93.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:58:28.2451031Z triton_convolution2d_157 0.0165 ms 93.8% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:58:28.2452171Z triton_convolution2d_151 0.0195 ms 79.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:58:28.2453314Z triton_convolution2d_152 0.0204 ms 76.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:58:28.2454745Z triton_convolution2d_153 0.0522 ms 29.7% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:58:28.2455611Z SingleProcess AUTOTUNE benchmarking takes 0.1152 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:58:28.3465143Z Autotune Choices Stats: 2025-09-07T10:58:28.3466635Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.014751999638974667, "best_triton_pos": 1, "best_triton_time": 0.01817600056529045, "best_triton_kernel": "triton_convolution2d_238", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:58:28.3591736Z AUTOTUNE convolution(8x512x32x32, 1024x512x1x1) 2025-09-07T10:58:28.3592107Z strides: [524288, 1, 16384, 512], [512, 1, 1, 1] 2025-09-07T10:58:28.3592437Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:58:28.3592738Z convolution 0.0148 ms 100.0% 2025-09-07T10:58:28.3593481Z triton_convolution2d_238 0.0182 ms 81.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:58:28.3594893Z triton_convolution2d_237 0.0204 ms 72.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:58:28.3596273Z triton_convolution2d_239 0.0212 ms 69.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:58:28.3597576Z triton_convolution2d_240 0.0213 ms 69.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:58:28.3598789Z triton_convolution2d_234 0.0273 ms 54.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:58:28.3600187Z triton_convolution2d_235 0.0274 ms 53.8% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:58:28.3601410Z triton_convolution2d_236 0.0506 ms 29.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:58:28.3602498Z SingleProcess AUTOTUNE benchmarking takes 0.1140 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:58:28.5946038Z Autotune Choices Stats: 2025-09-07T10:58:28.5947010Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_271", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.008671999908983707, "best_triton_pos": 0} 2025-09-07T10:58:28.6076140Z AUTOTUNE mm(2048x256, 256x384) 2025-09-07T10:58:28.6076431Z strides: [256, 1], [1, 256] 2025-09-07T10:58:28.6076712Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:58:28.6077428Z triton_mm_271 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:28.6078066Z mm 0.0087 ms 99.6% 2025-09-07T10:58:28.6078647Z triton_mm_267 0.0088 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:28.6079617Z triton_mm_270 0.0088 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:28.6080780Z triton_mm_272 0.0088 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:28.6081766Z triton_mm_274 0.0092 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:28.6082763Z triton_mm_278 0.0095 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:28.6084106Z triton_mm_277 0.0096 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:28.6085026Z triton_mm_273 0.0097 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:28.6085923Z triton_mm_269 0.0100 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:28.6086707Z SingleProcess AUTOTUNE benchmarking takes 0.2476 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T10:58:28.8059036Z Autotune Choices Stats: 2025-09-07T10:58:28.8060190Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_bmm_286", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007391999941319227, "best_triton_pos": 0} 2025-09-07T10:58:28.8190967Z AUTOTUNE bmm(32x256x16, 32x16x256) 2025-09-07T10:58:28.8191257Z strides: [4096, 1, 256], [4096, 256, 1] 2025-09-07T10:58:28.8191573Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:58:28.8192239Z triton_bmm_286 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:28.8193446Z triton_bmm_291 0.0076 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:28.8194792Z triton_bmm_284 0.0076 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:58:28.8195909Z triton_bmm_285 0.0076 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:28.8196976Z triton_bmm_287 0.0077 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:28.8197952Z triton_bmm_283 0.0077 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:28.8198934Z triton_bmm_290 0.0078 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:28.8199918Z triton_bmm_289 0.0078 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:28.8200896Z triton_bmm_288 0.0080 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:28.8201882Z triton_bmm_292 0.0081 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T10:58:28.8202739Z SingleProcess AUTOTUNE benchmarking takes 0.2109 seconds and 0.0002 seconds precompiling for 17 choices 2025-09-07T10:58:29.0399399Z Autotune Choices Stats: 2025-09-07T10:58:29.0400407Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_bmm_331", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009440000168979168, "best_triton_pos": 0} 2025-09-07T10:58:29.0538266Z AUTOTUNE bmm(32x256x256, 32x256x64) 2025-09-07T10:58:29.0538577Z strides: [65536, 256, 1], [16384, 1, 256] 2025-09-07T10:58:29.0538883Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:58:29.0539545Z triton_bmm_331 0.0094 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:29.0540537Z triton_bmm_335 0.0096 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:29.0541523Z triton_bmm_330 0.0098 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:29.0542617Z triton_bmm_334 0.0098 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:29.0543637Z triton_bmm_340 0.0102 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:29.0544506Z bmm 0.0102 ms 92.5% 2025-09-07T10:58:29.0545007Z triton_bmm_324 0.0104 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:58:29.0545854Z triton_bmm_325 0.0104 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:29.0546797Z triton_bmm_326 0.0104 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:29.0547635Z triton_bmm_339 0.0104 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:29.0548448Z SingleProcess AUTOTUNE benchmarking takes 0.2341 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T10:58:29.2880789Z Autotune Choices Stats: 2025-09-07T10:58:29.2882029Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.010463999584317207, "best_triton_pos": 1, "best_triton_time": 0.011327999643981457, "best_triton_kernel": "triton_mm_397", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T10:58:29.3017587Z AUTOTUNE mm(2048x512, 512x640) 2025-09-07T10:58:29.3017874Z strides: [512, 1], [1, 512] 2025-09-07T10:58:29.3018144Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:58:29.3018440Z mm 0.0105 ms 100.0% 2025-09-07T10:58:29.3019054Z triton_mm_397 0.0113 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:29.3020053Z triton_mm_390 0.0116 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:29.3021011Z triton_mm_386 0.0118 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:29.3022185Z triton_mm_396 0.0118 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:29.3023194Z triton_mm_389 0.0123 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:29.3024296Z triton_mm_393 0.0125 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:29.3025130Z triton_mm_391 0.0129 ms 80.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:29.3025963Z triton_mm_392 0.0138 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:29.3026791Z triton_mm_388 0.0138 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:29.3027516Z SingleProcess AUTOTUNE benchmarking takes 0.2474 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T10:58:29.5380922Z Autotune Choices Stats: 2025-09-07T10:58:29.5382303Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_bmm_449", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.010400000028312206, "best_triton_pos": 0} 2025-09-07T10:58:29.5518104Z AUTOTUNE bmm(32x256x256, 32x256x128) 2025-09-07T10:58:29.5518407Z strides: [65536, 256, 1], [32768, 1, 256] 2025-09-07T10:58:29.5518718Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:58:29.5519407Z triton_bmm_449 0.0104 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:29.5520652Z triton_bmm_454 0.0105 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:29.5521279Z bmm 0.0106 ms 98.5% 2025-09-07T10:58:29.5521983Z triton_bmm_453 0.0107 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:29.5522958Z triton_bmm_452 0.0108 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:29.5524188Z triton_bmm_456 0.0108 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:29.5525110Z triton_bmm_460 0.0111 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:29.5526046Z triton_bmm_455 0.0116 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:29.5526957Z triton_bmm_459 0.0117 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:29.5527865Z triton_bmm_451 0.0118 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:29.5528649Z SingleProcess AUTOTUNE benchmarking takes 0.2451 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T10:58:29.6619404Z Autotune Choices Stats: 2025-09-07T10:58:29.6620961Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.01679999940097332, "best_triton_pos": 1, "best_triton_time": 0.028831999748945236, "best_triton_kernel": "triton_convolution2d_484", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:58:29.6753239Z AUTOTUNE convolution(8x1024x16x16, 2048x1024x1x1) 2025-09-07T10:58:29.6753597Z strides: [262144, 1, 16384, 1024], [1024, 1, 1, 1] 2025-09-07T10:58:29.6754255Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:58:29.6754548Z convolution 0.0168 ms 100.0% 2025-09-07T10:58:29.6755291Z triton_convolution2d_484 0.0288 ms 58.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:58:29.6756548Z triton_convolution2d_483 0.0336 ms 50.0% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:58:29.6758041Z triton_convolution2d_485 0.0338 ms 49.7% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:58:29.6759277Z triton_convolution2d_486 0.0340 ms 49.5% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:58:29.6760475Z triton_convolution2d_481 0.0472 ms 35.6% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:58:29.6761680Z triton_convolution2d_480 0.0477 ms 35.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=256, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:58:29.6763025Z triton_convolution2d_482 0.0729 ms 23.1% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:58:29.6764295Z SingleProcess AUTOTUNE benchmarking takes 0.1230 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:58:29.9102180Z Autotune Choices Stats: 2025-09-07T10:58:29.9103612Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_514", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008352000266313553, "best_triton_pos": 0} 2025-09-07T10:58:29.9240517Z AUTOTUNE mm(512x512, 512x640) 2025-09-07T10:58:29.9240887Z strides: [512, 1], [1, 512] 2025-09-07T10:58:29.9241240Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:58:29.9242140Z triton_mm_514 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:29.9243000Z mm 0.0085 ms 97.8% 2025-09-07T10:58:29.9244014Z triton_mm_518 0.0093 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:29.9245352Z triton_mm_513 0.0095 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:29.9246672Z triton_mm_517 0.0100 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:29.9248477Z triton_mm_510 0.0102 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:29.9249823Z triton_mm_507 0.0103 ms 81.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:58:29.9251167Z triton_mm_520 0.0105 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:29.9252504Z triton_mm_524 0.0105 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:29.9254028Z triton_mm_516 0.0106 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:29.9255185Z SingleProcess AUTOTUNE benchmarking takes 0.2477 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T10:58:30.0896198Z Autotune Choices Stats: 2025-09-07T10:58:30.0897507Z {"num_choices": 14, "num_triton_choices": 13, "best_kernel": "triton_bmm_527", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.006335999816656113, "best_triton_pos": 0} 2025-09-07T10:58:30.1032711Z AUTOTUNE bmm(32x64x16, 32x16x64) 2025-09-07T10:58:30.1032993Z strides: [1024, 1, 64], [1024, 64, 1] 2025-09-07T10:58:30.1033283Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:58:30.1034369Z triton_bmm_527 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:30.1035403Z triton_bmm_531 0.0064 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:30.1036620Z triton_bmm_536 0.0064 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T10:58:30.1037879Z triton_bmm_528 0.0065 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:30.1038849Z triton_bmm_532 0.0065 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:30.1039804Z triton_bmm_535 0.0065 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:30.1040771Z triton_bmm_525 0.0066 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T10:58:30.1041740Z triton_bmm_533 0.0066 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:30.1042723Z triton_bmm_534 0.0066 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:30.1043961Z triton_bmm_537 0.0067 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:30.1044759Z SingleProcess AUTOTUNE benchmarking takes 0.1788 seconds and 0.0002 seconds precompiling for 14 choices 2025-09-07T10:58:30.3128452Z Autotune Choices Stats: 2025-09-07T10:58:30.3129882Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_bmm_568", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007071999832987785, "best_triton_pos": 0} 2025-09-07T10:58:30.3267358Z AUTOTUNE bmm(32x64x64, 32x64x128) 2025-09-07T10:58:30.3267707Z strides: [4096, 64, 1], [8192, 1, 64] 2025-09-07T10:58:30.3268001Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:58:30.3268663Z triton_bmm_568 0.0071 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:30.3269680Z triton_bmm_561 0.0071 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:58:30.3270665Z triton_bmm_563 0.0071 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:30.3271631Z triton_bmm_574 0.0072 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:58:30.3272792Z triton_bmm_567 0.0072 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:58:30.3274254Z triton_bmm_573 0.0072 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:30.3275104Z triton_bmm_569 0.0073 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:30.3275942Z triton_bmm_572 0.0073 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:30.3277011Z triton_bmm_562 0.0074 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:30.3277948Z triton_bmm_564 0.0074 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:30.3278680Z SingleProcess AUTOTUNE benchmarking takes 0.2229 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T10:58:30.5821987Z Autotune Choices Stats: 2025-09-07T10:58:30.5823064Z {"num_choices": 19, "num_triton_choices": 17, "best_kernel": "triton_mm_600", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2", "best_time": 0.011264000087976456, "best_triton_pos": 0} 2025-09-07T10:58:30.5958246Z AUTOTUNE addmm(8x1000, 8x2048, 2048x1000) 2025-09-07T10:58:30.5958545Z strides: [0, 1], [2048, 1], [1, 2048] 2025-09-07T10:58:30.5958879Z dtypes: torch.bfloat16, torch.bfloat16, torch.bfloat16 2025-09-07T10:58:30.5959595Z triton_mm_600 0.0113 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:58:30.5960236Z bias_addmm 0.0114 ms 99.2% 2025-09-07T10:58:30.5960844Z triton_mm_604 0.0119 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:30.5961807Z triton_mm_608 0.0141 ms 79.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:58:30.5962428Z addmm 0.0150 ms 75.2% 2025-09-07T10:58:30.5963455Z triton_mm_612 0.0156 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:58:30.5964670Z triton_mm_599 0.0173 ms 65.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:58:30.5965643Z triton_mm_598 0.0185 ms 60.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:58:30.5966593Z triton_mm_603 0.0187 ms 60.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:58:30.5967543Z triton_mm_597 0.0190 ms 59.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T10:58:30.5968396Z SingleProcess AUTOTUNE benchmarking takes 0.2645 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T10:58:35.8872958Z pass 2025-09-07T10:58:40.5976013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:58:40.5977729Z import pynvml # type: ignore[import] 2025-09-07T10:58:43.6114044Z 2025-09-07T10:58:45.0876764Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:58:45.0877235Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:58:45.0923553Z cuda eval eca_halonext26ts 2025-09-07T10:59:09.9434315Z Autotune Choices Stats: 2025-09-07T10:59:09.9435443Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_mm_319", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.007199999876320362, "best_triton_pos": 0} 2025-09-07T10:59:09.9572282Z AUTOTUNE mm(16384x16, 16x23) 2025-09-07T10:59:09.9572533Z strides: [16, 1], [1, 16] 2025-09-07T10:59:09.9572764Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:59:09.9573588Z triton_mm_319 0.0072 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:59:09.9575358Z triton_mm_314 0.0072 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:59:09.9576296Z triton_mm_316 0.0073 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:59:09.9577197Z triton_mm_320 0.0073 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:59:09.9578101Z triton_mm_313 0.0074 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T10:59:09.9578975Z triton_mm_315 0.0074 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:09.9579794Z triton_mm_321 0.0074 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:59:09.9580623Z triton_mm_322 0.0075 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:59:09.9581596Z triton_mm_317 0.0075 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:09.9582445Z triton_mm_325 0.0076 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T10:59:09.9583197Z SingleProcess AUTOTUNE benchmarking takes 0.1999 seconds and 0.0003 seconds precompiling for 15 choices 2025-09-07T10:59:10.8784723Z Autotune Choices Stats: 2025-09-07T10:59:10.8785807Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_mm_433", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006399999838322401, "best_triton_pos": 0} 2025-09-07T10:59:10.8921237Z AUTOTUNE mm(4096x16, 16x23) 2025-09-07T10:59:10.8921503Z strides: [16, 1], [1, 16] 2025-09-07T10:59:10.8921806Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:59:10.8922501Z triton_mm_433 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:10.8924145Z triton_mm_431 0.0064 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T10:59:10.8925157Z triton_mm_432 0.0065 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:59:10.8926110Z triton_mm_436 0.0065 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:59:10.8927051Z triton_mm_439 0.0065 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:59:10.8928320Z triton_mm_437 0.0066 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:59:10.8929371Z triton_mm_435 0.0066 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:10.8930319Z triton_mm_434 0.0067 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:59:10.8931261Z triton_mm_440 0.0067 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:59:10.8932211Z triton_mm_442 0.0069 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:59:10.8933051Z SingleProcess AUTOTUNE benchmarking takes 0.1985 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T10:59:12.8603012Z Autotune Choices Stats: 2025-09-07T10:59:12.8605297Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_287", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007552000228315592, "best_triton_pos": 0} 2025-09-07T10:59:12.8738495Z AUTOTUNE mm(2048x256, 256x128) 2025-09-07T10:59:12.8739015Z strides: [256, 1], [1, 256] 2025-09-07T10:59:12.8739495Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:59:12.8741250Z triton_mm_287 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:12.8742875Z triton_mm_286 0.0079 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:59:12.8744767Z triton_mm_283 0.0079 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:12.8746349Z triton_mm_282 0.0080 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:59:12.8747336Z mm 0.0080 ms 94.0% 2025-09-07T10:59:12.8748245Z triton_mm_281 0.0081 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:59:12.8749556Z triton_mm_290 0.0081 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:59:12.8750403Z triton_mm_291 0.0082 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:59:12.8751388Z triton_mm_280 0.0083 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:59:12.8752247Z triton_mm_293 0.0085 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:59:12.8752986Z SingleProcess AUTOTUNE benchmarking takes 0.2484 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T10:59:13.0606334Z Autotune Choices Stats: 2025-09-07T10:59:13.0607949Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_bmm_305", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007935999892652035, "best_triton_pos": 0} 2025-09-07T10:59:13.0735796Z AUTOTUNE bmm(256x64x16, 256x16x144) 2025-09-07T10:59:13.0736189Z strides: [1024, 16, 1], [2304, 144, 1] 2025-09-07T10:59:13.0736470Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:59:13.0737063Z triton_bmm_305 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:13.0737970Z triton_bmm_303 0.0080 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:59:13.0739007Z triton_bmm_309 0.0081 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:59:13.0740084Z triton_bmm_311 0.0083 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T10:59:13.0741059Z triton_bmm_308 0.0084 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:59:13.0742035Z triton_bmm_298 0.0084 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T10:59:13.0742992Z triton_bmm_302 0.0084 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:13.0744524Z triton_bmm_312 0.0084 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:59:13.0745536Z triton_bmm_306 0.0085 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:59:13.0746514Z triton_bmm_307 0.0085 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:59:13.0747363Z SingleProcess AUTOTUNE benchmarking takes 0.1992 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T10:59:13.2526520Z Autotune Choices Stats: 2025-09-07T10:59:13.2528126Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_bmm_350", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8", "best_time": 0.010015999898314476, "best_triton_pos": 0} 2025-09-07T10:59:13.2655602Z AUTOTUNE bmm(256x64x144, 256x144x32) 2025-09-07T10:59:13.2655895Z strides: [9216, 144, 1], [4608, 32, 1] 2025-09-07T10:59:13.2656150Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:59:13.2656894Z triton_bmm_350 0.0100 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:59:13.2657836Z triton_bmm_351 0.0100 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:59:13.2658988Z triton_bmm_343 0.0100 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:13.2660687Z triton_bmm_354 0.0101 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:59:13.2662394Z triton_bmm_344 0.0102 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:59:13.2664454Z triton_bmm_348 0.0103 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:59:13.2665622Z bmm 0.0103 ms 96.9% 2025-09-07T10:59:13.2666564Z triton_bmm_352 0.0103 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:59:13.2668101Z triton_bmm_349 0.0104 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:59:13.2669543Z triton_bmm_342 0.0109 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:59:13.2670280Z SingleProcess AUTOTUNE benchmarking takes 0.1873 seconds and 0.0002 seconds precompiling for 15 choices 2025-09-07T10:59:13.3701331Z Autotune Choices Stats: 2025-09-07T10:59:13.3703531Z {"num_choices": 8, "num_triton_choices": 7, "best_kernel": "convolution", "best_time": 0.009664000011980534, "best_triton_pos": 1, "best_triton_time": 0.016063999384641647, "best_triton_kernel": "triton_convolution2d_416", "best_triton_kernel_desc": "ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4"} 2025-09-07T10:59:13.3829973Z AUTOTUNE convolution(8x512x16x16, 128x512x1x1) 2025-09-07T10:59:13.3830525Z strides: [131072, 1, 8192, 512], [512, 1, 1, 1] 2025-09-07T10:59:13.3831010Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:59:13.3831698Z convolution 0.0097 ms 100.0% 2025-09-07T10:59:13.3832920Z triton_convolution2d_416 0.0161 ms 60.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:59:13.3835169Z triton_convolution2d_417 0.0169 ms 57.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:59:13.3837242Z triton_convolution2d_415 0.0192 ms 50.3% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:59:13.3839240Z triton_convolution2d_418 0.0200 ms 48.2% ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=8 2025-09-07T10:59:13.3840299Z triton_convolution2d_412 0.0215 ms 44.9% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:59:13.3841388Z triton_convolution2d_413 0.0260 ms 37.2% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=256, BLOCK_N=64, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=2, num_warps=4 2025-09-07T10:59:13.3842391Z triton_convolution2d_414 0.0415 ms 23.3% ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=1024, BLOCK_N=16, GROUPS=1, KERNEL_H=1, KERNEL_W=1, PADDING_H=0, PADDING_W=0, STRIDE_H=2, STRIDE_W=2, UNROLL=True, num_stages=1, num_warps=8 2025-09-07T10:59:13.3843185Z SingleProcess AUTOTUNE benchmarking takes 0.1125 seconds and 0.0002 seconds precompiling for 8 choices 2025-09-07T10:59:13.5368014Z Autotune Choices Stats: 2025-09-07T10:59:13.5369615Z {"num_choices": 13, "num_triton_choices": 12, "best_kernel": "triton_bmm_421", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006912000011652708, "best_triton_pos": 0} 2025-09-07T10:59:13.5495664Z AUTOTUNE bmm(256x16x16, 256x16x144) 2025-09-07T10:59:13.5496035Z strides: [256, 16, 1], [2304, 144, 1] 2025-09-07T10:59:13.5496313Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:59:13.5496906Z triton_bmm_421 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:13.5497804Z triton_bmm_424 0.0070 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:59:13.5498773Z triton_bmm_428 0.0070 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:59:13.5499888Z triton_bmm_423 0.0071 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:59:13.5500874Z triton_bmm_427 0.0071 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:59:13.5501843Z triton_bmm_430 0.0071 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:59:13.5502817Z triton_bmm_419 0.0071 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T10:59:13.5504236Z triton_bmm_420 0.0071 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=2 2025-09-07T10:59:13.5505269Z triton_bmm_425 0.0071 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:59:13.5506279Z triton_bmm_429 0.0071 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T10:59:13.5507134Z SingleProcess AUTOTUNE benchmarking takes 0.1658 seconds and 0.0002 seconds precompiling for 13 choices 2025-09-07T10:59:13.7180621Z Autotune Choices Stats: 2025-09-07T10:59:13.7182228Z {"num_choices": 14, "num_triton_choices": 13, "best_kernel": "triton_bmm_469", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.009184000082314014, "best_triton_pos": 0} 2025-09-07T10:59:13.7308904Z AUTOTUNE bmm(256x16x144, 256x144x64) 2025-09-07T10:59:13.7309425Z strides: [2304, 144, 1], [9216, 64, 1] 2025-09-07T10:59:13.7309915Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:59:13.7311217Z triton_bmm_469 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:59:13.7312861Z triton_bmm_471 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:13.7314697Z triton_bmm_461 0.0092 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:13.7316257Z triton_bmm_462 0.0093 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:59:13.7318149Z triton_bmm_466 0.0093 ms 98.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:59:13.7319637Z triton_bmm_468 0.0096 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:59:13.7320228Z bmm 0.0097 ms 94.7% 2025-09-07T10:59:13.7320706Z triton_bmm_470 0.0097 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:59:13.7321508Z triton_bmm_467 0.0097 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:13.7322335Z triton_bmm_463 0.0099 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=16, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=2 2025-09-07T10:59:13.7323045Z SingleProcess AUTOTUNE benchmarking takes 0.1767 seconds and 0.0002 seconds precompiling for 14 choices 2025-09-07T10:59:13.9718790Z Autotune Choices Stats: 2025-09-07T10:59:13.9720405Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_540", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007552000228315592, "best_triton_pos": 0} 2025-09-07T10:59:13.9854381Z AUTOTUNE mm(512x512, 512x128) 2025-09-07T10:59:13.9854637Z strides: [512, 1], [1, 512] 2025-09-07T10:59:13.9854868Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:59:13.9855615Z triton_mm_540 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:13.9856539Z triton_mm_544 0.0079 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:13.9857115Z mm 0.0079 ms 95.5% 2025-09-07T10:59:13.9857642Z triton_mm_548 0.0087 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:59:13.9858531Z triton_mm_539 0.0089 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:59:13.9859414Z triton_mm_543 0.0091 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:59:13.9860279Z triton_mm_547 0.0092 ms 82.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:59:13.9861105Z triton_mm_538 0.0092 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:59:13.9861998Z triton_mm_537 0.0094 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:59:13.9862841Z triton_mm_554 0.0099 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:59:13.9863604Z SingleProcess AUTOTUNE benchmarking takes 0.2456 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T10:59:14.1378241Z Autotune Choices Stats: 2025-09-07T10:59:14.1379483Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_bmm_559", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.006496000103652477, "best_triton_pos": 0} 2025-09-07T10:59:14.1514799Z AUTOTUNE bmm(64x64x16, 64x16x144) 2025-09-07T10:59:14.1515220Z strides: [16, 1024, 1], [16, 1, 1024] 2025-09-07T10:59:14.1515509Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:59:14.1516170Z triton_bmm_559 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:14.1517249Z triton_bmm_560 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:59:14.1518225Z triton_bmm_556 0.0065 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T10:59:14.1519253Z triton_bmm_566 0.0066 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:59:14.1520116Z triton_bmm_555 0.0066 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=1, num_warps=2 2025-09-07T10:59:14.1520949Z triton_bmm_557 0.0066 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:59:14.1521781Z triton_bmm_558 0.0067 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:59:14.1522701Z triton_bmm_562 0.0067 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:14.1523549Z triton_bmm_565 0.0067 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:59:14.1524752Z triton_bmm_567 0.0067 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:59:14.1525490Z SingleProcess AUTOTUNE benchmarking takes 0.1654 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T10:59:14.3375131Z Autotune Choices Stats: 2025-09-07T10:59:14.3376031Z {"num_choices": 16, "num_triton_choices": 15, "best_kernel": "triton_bmm_600", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.008063999935984612, "best_triton_pos": 0} 2025-09-07T10:59:14.3514419Z AUTOTUNE bmm(64x64x144, 64x144x64) 2025-09-07T10:59:14.3514701Z strides: [9216, 144, 1], [64, 4096, 1] 2025-09-07T10:59:14.3515001Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T10:59:14.3515813Z triton_bmm_600 0.0081 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:59:14.3516905Z triton_bmm_602 0.0081 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:14.3517903Z triton_bmm_601 0.0082 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:59:14.3518894Z triton_bmm_608 0.0082 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T10:59:14.3519915Z triton_bmm_610 0.0082 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T10:59:14.3520750Z triton_bmm_609 0.0082 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T10:59:14.3521668Z triton_bmm_606 0.0083 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T10:59:14.3522495Z triton_bmm_612 0.0083 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T10:59:14.3523323Z triton_bmm_605 0.0083 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T10:59:14.3524211Z bmm 0.0085 ms 95.1% 2025-09-07T10:59:14.3524603Z SingleProcess AUTOTUNE benchmarking takes 0.1995 seconds and 0.0002 seconds precompiling for 16 choices 2025-09-07T10:59:20.6930081Z pass 2025-09-07T10:59:23.7245858Z accuracy pass_rate=100.00% 2025-09-07T10:59:23.7250147Z calls_captured gmean=361.25x mean=380.875x 2025-09-07T10:59:23.7254009Z unique_graphs gmean=1.00x mean=1.000x 2025-09-07T10:59:23.7257669Z graph_breaks gmean=0.00x mean=0.000x 2025-09-07T10:59:23.7261065Z unique_graph_breaks gmean=0.00x mean=0.000x 2025-09-07T10:59:23.7264601Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T10:59:23.7268098Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T10:59:23.7271333Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T10:59:23.7272516Z compilation_latency mean=35.761 seconds 2025-09-07T10:59:24.8391748Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *aotinductor-true* ]] 2025-09-07T10:59:24.8393273Z + [[ inference == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T10:59:24.8394235Z + [[ accuracy == \a\c\c\u\r\a\c\y ]] 2025-09-07T10:59:24.8396211Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --inference --bfloat16 --export --disable-cudagraphs --device cuda --total-partitions 7 --partition-id 1 --output /var/lib/jenkins/workspace/test/test-reports/inductor_export_timm_models_bfloat16_inference_cuda_h100_accuracy.csv 2025-09-07T10:59:25.8672080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:59:25.8673290Z import pynvml # type: ignore[import] 2025-09-07T10:59:30.2209393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:59:30.2210809Z import pynvml # type: ignore[import] 2025-09-07T10:59:33.1969211Z 2025-09-07T10:59:34.8119442Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:59:34.8119818Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:59:34.8195487Z cuda eval crossvit_9_240 2025-09-07T10:59:40.8237889Z pass 2025-09-07T10:59:43.3033610Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:59:43.3037025Z import pynvml # type: ignore[import] 2025-09-07T10:59:46.2949133Z 2025-09-07T10:59:47.3776651Z loading model: 0it [00:00, ?it/s] 2025-09-07T10:59:47.3776973Z loading model: 0it [00:01, ?it/s] 2025-09-07T10:59:47.3864509Z cuda eval cspdarknet53 2025-09-07T10:59:53.3307614Z pass 2025-09-07T10:59:55.8066252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T10:59:55.8067358Z import pynvml # type: ignore[import] 2025-09-07T10:59:58.7844282Z 2025-09-07T11:00:00.2682267Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:00:00.2682629Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:00:00.2730685Z cuda eval deit_base_distilled_patch16_224 2025-09-07T11:00:04.5108135Z pass 2025-09-07T11:00:06.9357690Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:00:06.9358998Z import pynvml # type: ignore[import] 2025-09-07T11:00:10.1334961Z 2025-09-07T11:00:11.2593696Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:00:11.2594519Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:00:11.2730989Z cuda eval dla102 2025-09-07T11:00:16.5434589Z pass 2025-09-07T11:00:18.9751139Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:00:18.9752395Z import pynvml # type: ignore[import] 2025-09-07T11:00:21.9355892Z 2025-09-07T11:00:24.2180447Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:00:24.2180842Z loading model: 0it [00:02, ?it/s] 2025-09-07T11:00:24.2245435Z cuda eval dm_nfnet_f0 2025-09-07T11:00:29.8727250Z pass 2025-09-07T11:00:32.2991866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:00:32.2993135Z import pynvml # type: ignore[import] 2025-09-07T11:00:35.2891333Z 2025-09-07T11:00:36.7921878Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:00:36.7922369Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:00:36.8058150Z cuda eval dpn107 2025-09-07T11:00:44.4030946Z pass 2025-09-07T11:00:47.0055445Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:00:47.0057137Z import pynvml # type: ignore[import] 2025-09-07T11:00:50.0079618Z 2025-09-07T11:00:51.2202926Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:00:51.2203547Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:00:51.2247097Z cuda eval eca_botnext26ts_256 2025-09-07T11:00:55.3608471Z pass 2025-09-07T11:00:57.8158616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:00:57.8159868Z import pynvml # type: ignore[import] 2025-09-07T11:01:00.8092699Z 2025-09-07T11:01:01.9905547Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:01:01.9906555Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:01:01.9950657Z cuda eval eca_halonext26ts 2025-09-07T11:01:06.2363549Z pass 2025-09-07T11:01:07.6742820Z accuracy pass_rate=100.00% 2025-09-07T11:01:07.6746926Z calls_captured gmean=442.36x mean=457.125x 2025-09-07T11:01:07.6751234Z unique_graphs gmean=1.00x mean=1.000x 2025-09-07T11:01:07.6756566Z graph_breaks gmean=0.00x mean=0.000x 2025-09-07T11:01:07.6761025Z unique_graph_breaks gmean=0.00x mean=0.000x 2025-09-07T11:01:07.6765912Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T11:01:07.6770110Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T11:01:07.6775336Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T11:01:07.6776217Z compilation_latency mean=3.694 seconds 2025-09-07T11:01:09.0486103Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --inference --bfloat16 --export-aot-inductor --disable-cudagraphs --device cuda --total-partitions 7 --partition-id 1 --output /var/lib/jenkins/workspace/test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_accuracy.csv 2025-09-07T11:01:10.0366397Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:01:10.0367618Z import pynvml # type: ignore[import] 2025-09-07T11:01:14.3136044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:01:14.3137297Z import pynvml # type: ignore[import] 2025-09-07T11:01:17.3261543Z 2025-09-07T11:01:18.3927942Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:01:18.3928884Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:01:18.4009871Z cuda eval crossvit_9_240 2025-09-07T11:01:45.8270523Z pass 2025-09-07T11:01:49.7957752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:01:49.7959035Z import pynvml # type: ignore[import] 2025-09-07T11:01:52.7812773Z 2025-09-07T11:01:54.0305111Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:01:54.0305468Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:01:54.0399905Z cuda eval cspdarknet53 2025-09-07T11:02:14.5679630Z pass 2025-09-07T11:02:18.3667118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:02:18.3668399Z import pynvml # type: ignore[import] 2025-09-07T11:02:21.3456537Z 2025-09-07T11:02:22.9190852Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:02:22.9191347Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:02:22.9238532Z cuda eval deit_base_distilled_patch16_224 2025-09-07T11:02:39.2711393Z pass 2025-09-07T11:02:42.9475946Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:02:42.9477274Z import pynvml # type: ignore[import] 2025-09-07T11:02:45.9459885Z 2025-09-07T11:02:48.0521862Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:02:48.0522194Z loading model: 0it [00:02, ?it/s] 2025-09-07T11:02:48.0656397Z cuda eval dla102 2025-09-07T11:03:12.8696070Z pass 2025-09-07T11:03:16.8224409Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:03:16.8226742Z import pynvml # type: ignore[import] 2025-09-07T11:03:19.8400202Z 2025-09-07T11:03:21.4452525Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:03:21.4452848Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:03:21.4521298Z cuda eval dm_nfnet_f0 2025-09-07T11:03:43.3623437Z pass 2025-09-07T11:03:47.2991347Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:03:47.2992617Z import pynvml # type: ignore[import] 2025-09-07T11:03:50.4681518Z 2025-09-07T11:03:52.0848832Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:03:52.0849160Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:03:52.0980448Z cuda eval dpn107 2025-09-07T11:04:23.9147518Z pass 2025-09-07T11:04:28.0398087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:04:28.0399261Z import pynvml # type: ignore[import] 2025-09-07T11:04:31.0206837Z 2025-09-07T11:04:32.0642852Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:04:32.0643212Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:04:32.0691631Z cuda eval eca_botnext26ts_256 2025-09-07T11:04:51.4566665Z pass 2025-09-07T11:04:55.2085654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:04:55.2086688Z import pynvml # type: ignore[import] 2025-09-07T11:04:58.3271029Z 2025-09-07T11:04:59.2568465Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:04:59.2568832Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:04:59.2616192Z cuda eval eca_halonext26ts 2025-09-07T11:05:23.9928404Z pass 2025-09-07T11:05:26.8479854Z accuracy pass_rate=100.00% 2025-09-07T11:05:26.8485410Z calls_captured gmean=0.00x mean=0.000x 2025-09-07T11:05:26.8488987Z unique_graphs gmean=0.00x mean=0.000x 2025-09-07T11:05:26.8492541Z graph_breaks gmean=0.00x mean=0.000x 2025-09-07T11:05:26.8496135Z unique_graph_breaks gmean=0.00x mean=0.000x 2025-09-07T11:05:26.8499390Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T11:05:26.8502662Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T11:05:26.8506598Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T11:05:26.8507350Z compilation_latency mean=0.000 seconds 2025-09-07T11:05:27.8477600Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *maxautotune-true* ]] 2025-09-07T11:05:27.8478859Z + TORCHINDUCTOR_MAX_AUTOTUNE=1 2025-09-07T11:05:27.8480058Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --inference --bfloat16 --backend inductor --device cuda --total-partitions 7 --partition-id 1 --output /var/lib/jenkins/workspace/test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.csv 2025-09-07T11:05:28.7900095Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:05:28.7901511Z import pynvml # type: ignore[import] 2025-09-07T11:05:32.9760586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:05:32.9762101Z import pynvml # type: ignore[import] 2025-09-07T11:05:36.0061401Z 2025-09-07T11:05:36.8845849Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:05:36.8846183Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:05:36.8922891Z cuda eval crossvit_9_240 2025-09-07T11:06:01.5936928Z pass 2025-09-07T11:06:05.7501857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:06:05.7503111Z import pynvml # type: ignore[import] 2025-09-07T11:06:08.7183208Z 2025-09-07T11:06:09.8879477Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:06:09.8879864Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:06:09.8967507Z cuda eval cspdarknet53 2025-09-07T11:06:25.5462020Z Autotune Choices Stats: 2025-09-07T11:06:25.5462940Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_25", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.023360000923275948, "best_triton_pos": 0} 2025-09-07T11:06:25.6074237Z AUTOTUNE mm(131072x64, 64x128) 2025-09-07T11:06:25.6074474Z strides: [64, 1], [1, 64] 2025-09-07T11:06:25.6075092Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:06:25.6075691Z triton_mm_25 0.0234 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:06:25.6076540Z triton_mm_29 0.0237 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:25.6077443Z triton_mm_21 0.0239 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:06:25.6078234Z triton_mm_26 0.0243 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:25.6078752Z mm 0.0245 ms 95.3% 2025-09-07T11:06:25.6079221Z triton_mm_24 0.0245 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:25.6080022Z triton_mm_27 0.0248 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:25.6080935Z triton_mm_23 0.0251 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:25.6081730Z triton_mm_31 0.0252 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:06:25.6082517Z triton_mm_22 0.0254 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:25.6083359Z SingleProcess AUTOTUNE benchmarking takes 0.3072 seconds and 0.0004 seconds precompiling for 20 choices 2025-09-07T11:06:26.1278863Z Autotune Choices Stats: 2025-09-07T11:06:26.1280231Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "mm", "best_time": 0.017535999417304993, "best_triton_pos": 1, "best_triton_time": 0.017664000391960144, "best_triton_kernel": "triton_mm_72", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:06:26.1461879Z AUTOTUNE mm(131072x64, 64x64) 2025-09-07T11:06:26.1462200Z strides: [64, 1], [1, 64] 2025-09-07T11:06:26.1462481Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:06:26.1462764Z mm 0.0175 ms 100.0% 2025-09-07T11:06:26.1463373Z triton_mm_72 0.0177 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:26.1464611Z triton_mm_73 0.0177 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:06:26.1465607Z triton_mm_67 0.0188 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:26.1466587Z triton_mm_68 0.0189 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:06:26.1467556Z triton_mm_71 0.0190 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:06:26.1468527Z triton_mm_63 0.0191 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:06:26.1469782Z triton_mm_69 0.0191 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:26.1470693Z triton_mm_70 0.0191 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:26.1471585Z triton_mm_64 0.0192 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:06:26.1472377Z SingleProcess AUTOTUNE benchmarking takes 0.2412 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:06:26.6373966Z Autotune Choices Stats: 2025-09-07T11:06:26.6375050Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_85", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.024671999737620354, "best_triton_pos": 0} 2025-09-07T11:06:26.6755833Z AUTOTUNE mm(131072x128, 128x64) 2025-09-07T11:06:26.6756129Z strides: [128, 1], [1, 128] 2025-09-07T11:06:26.6756407Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:06:26.6757501Z triton_mm_85 0.0247 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:26.6758566Z triton_mm_90 0.0248 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:26.6759550Z triton_mm_81 0.0250 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:06:26.6760218Z mm 0.0256 ms 96.5% 2025-09-07T11:06:26.6760861Z triton_mm_87 0.0276 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:26.6761696Z triton_mm_88 0.0279 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:26.6762639Z triton_mm_89 0.0280 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:06:26.6763469Z triton_mm_80 0.0289 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:06:26.6764637Z triton_mm_84 0.0293 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:26.6765486Z triton_mm_86 0.0295 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:06:26.6766225Z SingleProcess AUTOTUNE benchmarking takes 0.2626 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:06:27.1429958Z Autotune Choices Stats: 2025-09-07T11:06:27.1431108Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_39", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.015039999969303608, "best_triton_pos": 0} 2025-09-07T11:06:27.1594069Z AUTOTUNE mm(131072x64, 64x32) 2025-09-07T11:06:27.1594382Z strides: [64, 1], [1, 64] 2025-09-07T11:06:27.1594651Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:06:27.1595841Z triton_mm_39 0.0150 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:06:27.1596875Z triton_mm_47 0.0155 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:27.1597627Z mm 0.0161 ms 93.6% 2025-09-07T11:06:27.1598223Z triton_mm_43 0.0161 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:06:27.1599215Z triton_mm_45 0.0162 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:27.1600228Z triton_mm_42 0.0162 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:27.1601108Z triton_mm_36 0.0163 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:06:27.1601939Z triton_mm_33 0.0163 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:06:27.1602905Z triton_mm_48 0.0163 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:06:27.1603951Z triton_mm_41 0.0164 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:27.1604702Z SingleProcess AUTOTUNE benchmarking takes 0.2234 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:06:27.6738896Z Autotune Choices Stats: 2025-09-07T11:06:27.6739937Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_116", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.012191999703645706, "best_triton_pos": 0} 2025-09-07T11:06:27.7240986Z AUTOTUNE mm(32768x128, 128x128) 2025-09-07T11:06:27.7241450Z strides: [128, 1], [1, 128] 2025-09-07T11:06:27.7241734Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:06:27.7242412Z triton_mm_116 0.0122 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:27.7243453Z triton_mm_115 0.0124 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:27.7244898Z triton_mm_112 0.0127 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:27.7245546Z mm 0.0128 ms 95.3% 2025-09-07T11:06:27.7246138Z triton_mm_108 0.0128 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:27.7247127Z triton_mm_109 0.0129 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:27.7248137Z triton_mm_105 0.0130 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:06:27.7249139Z triton_mm_114 0.0130 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:06:27.7250329Z triton_mm_110 0.0133 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:27.7251310Z triton_mm_113 0.0133 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:27.7252137Z SingleProcess AUTOTUNE benchmarking takes 0.2953 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:06:28.4774313Z Autotune Choices Stats: 2025-09-07T11:06:28.4775310Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_135", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.009088000282645226, "best_triton_pos": 0} 2025-09-07T11:06:28.5244075Z AUTOTUNE mm(32768x64, 64x64) 2025-09-07T11:06:28.5244307Z strides: [64, 1], [1, 64] 2025-09-07T11:06:28.5244601Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:06:28.5245227Z triton_mm_135 0.0091 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:06:28.5246488Z triton_mm_129 0.0092 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:28.5247410Z triton_mm_131 0.0092 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:28.5248303Z triton_mm_134 0.0093 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:28.5249197Z triton_mm_124 0.0093 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:06:28.5250221Z triton_mm_126 0.0094 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:06:28.5251222Z triton_mm_122 0.0095 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:06:28.5252118Z triton_mm_132 0.0095 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:28.5253007Z triton_mm_125 0.0095 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:06:28.5254052Z triton_mm_130 0.0095 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:06:28.5254850Z SingleProcess AUTOTUNE benchmarking takes 0.2718 seconds and 0.0003 seconds precompiling for 19 choices 2025-09-07T11:06:29.3283325Z Autotune Choices Stats: 2025-09-07T11:06:29.3284723Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_223", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.01033599954098463, "best_triton_pos": 0} 2025-09-07T11:06:29.3663398Z AUTOTUNE mm(8192x256, 256x256) 2025-09-07T11:06:29.3663887Z strides: [256, 1], [1, 256] 2025-09-07T11:06:29.3664171Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:06:29.3664868Z triton_mm_223 0.0103 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:29.3666386Z triton_mm_222 0.0110 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:29.3667378Z triton_mm_219 0.0110 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:06:29.3668363Z triton_mm_226 0.0110 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:29.3669329Z triton_mm_230 0.0111 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:06:29.3670291Z triton_mm_221 0.0112 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:29.3671310Z triton_mm_225 0.0113 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:29.3672281Z triton_mm_229 0.0113 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:29.3672818Z mm 0.0114 ms 91.0% 2025-09-07T11:06:29.3673312Z triton_mm_228 0.0117 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:29.3674217Z SingleProcess AUTOTUNE benchmarking takes 0.2770 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:06:30.1315643Z Autotune Choices Stats: 2025-09-07T11:06:30.1316670Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_242", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.008927999995648861, "best_triton_pos": 0} 2025-09-07T11:06:30.1606149Z AUTOTUNE mm(8192x128, 128x128) 2025-09-07T11:06:30.1606657Z strides: [128, 1], [1, 128] 2025-09-07T11:06:30.1607236Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:06:30.1607919Z triton_mm_242 0.0089 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:30.1608914Z triton_mm_243 0.0089 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:06:30.1609897Z triton_mm_238 0.0090 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:06:30.1610864Z triton_mm_241 0.0090 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:30.1611850Z triton_mm_244 0.0090 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:30.1612761Z triton_mm_245 0.0091 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:30.1613660Z triton_mm_239 0.0091 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:06:30.1615169Z triton_mm_240 0.0092 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:30.1615761Z mm 0.0092 ms 96.9% 2025-09-07T11:06:30.1616301Z triton_mm_248 0.0093 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:30.1617102Z SingleProcess AUTOTUNE benchmarking takes 0.2661 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:06:31.1201435Z Autotune Choices Stats: 2025-09-07T11:06:31.1202839Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.0098879998549819, "best_triton_pos": 1, "best_triton_time": 0.009983999654650688, "best_triton_kernel": "triton_mm_496", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T11:06:31.1788879Z AUTOTUNE mm(2048x512, 512x512) 2025-09-07T11:06:31.1789172Z strides: [512, 1], [1, 512] 2025-09-07T11:06:31.1789492Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:06:31.1789767Z mm 0.0099 ms 100.0% 2025-09-07T11:06:31.1790390Z triton_mm_496 0.0100 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:06:31.1791750Z triton_mm_491 0.0109 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:06:31.1792682Z triton_mm_495 0.0110 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:31.1793541Z triton_mm_502 0.0110 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:06:31.1794616Z triton_mm_501 0.0115 ms 86.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:31.1795635Z triton_mm_498 0.0116 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:31.1796593Z triton_mm_494 0.0118 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:31.1797541Z triton_mm_492 0.0122 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:06:31.1798390Z triton_mm_493 0.0130 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:31.1799137Z SingleProcess AUTOTUNE benchmarking takes 0.2998 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:06:31.9412308Z Autotune Choices Stats: 2025-09-07T11:06:31.9413494Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_511", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007679999805986881, "best_triton_pos": 0} 2025-09-07T11:06:31.9939471Z AUTOTUNE mm(2048x256, 256x256) 2025-09-07T11:06:31.9939779Z strides: [256, 1], [1, 256] 2025-09-07T11:06:31.9940053Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:06:31.9940744Z triton_mm_511 0.0077 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:06:31.9941400Z mm 0.0080 ms 95.6% 2025-09-07T11:06:31.9942548Z triton_mm_510 0.0080 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:06:31.9943555Z triton_mm_514 0.0082 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:31.9944733Z triton_mm_515 0.0083 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:06:31.9945685Z triton_mm_517 0.0086 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:31.9946677Z triton_mm_513 0.0087 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:31.9947632Z triton_mm_506 0.0088 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:06:31.9948570Z triton_mm_505 0.0089 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:06:31.9949671Z triton_mm_504 0.0090 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:06:31.9950523Z SingleProcess AUTOTUNE benchmarking takes 0.2872 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:06:33.0955539Z Autotune Choices Stats: 2025-09-07T11:06:33.0956857Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.009696000255644321, "best_triton_pos": 1, "best_triton_time": 0.009759999811649323, "best_triton_kernel": "triton_mm_764", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:06:33.1140421Z AUTOTUNE mm(512x1024, 1024x1024) 2025-09-07T11:06:33.1140758Z strides: [1024, 1], [1, 1024] 2025-09-07T11:06:33.1141032Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:06:33.1141602Z mm 0.0097 ms 100.0% 2025-09-07T11:06:33.1142207Z triton_mm_764 0.0098 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:06:33.1143198Z triton_mm_768 0.0110 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:06:33.1144645Z triton_mm_760 0.0126 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:06:33.1145616Z triton_mm_763 0.0128 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:06:33.1146578Z triton_mm_774 0.0128 ms 75.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:06:33.1147539Z triton_mm_767 0.0133 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:33.1148489Z triton_mm_773 0.0144 ms 67.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:33.1149614Z triton_mm_759 0.0144 ms 67.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:06:33.1150577Z triton_mm_770 0.0146 ms 66.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:33.1151428Z SingleProcess AUTOTUNE benchmarking takes 0.2527 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:06:33.6328000Z Autotune Choices Stats: 2025-09-07T11:06:33.6328950Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_779", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.007840000092983246, "best_triton_pos": 0} 2025-09-07T11:06:33.9216347Z AUTOTUNE mm(512x512, 512x512) 2025-09-07T11:06:33.9216589Z strides: [512, 1], [1, 512] 2025-09-07T11:06:33.9216812Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:06:33.9217397Z triton_mm_779 0.0078 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:06:33.9217949Z mm 0.0079 ms 99.2% 2025-09-07T11:06:33.9218633Z triton_mm_783 0.0081 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:06:33.9219497Z triton_mm_787 0.0088 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:06:33.9220344Z triton_mm_778 0.0091 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:06:33.9221179Z triton_mm_782 0.0091 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:06:33.9222119Z triton_mm_777 0.0093 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:06:33.9222946Z triton_mm_786 0.0095 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:06:33.9224192Z triton_mm_776 0.0097 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:06:33.9225022Z triton_mm_789 0.0101 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:06:33.9225749Z SingleProcess AUTOTUNE benchmarking takes 0.5235 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:06:40.4447101Z pass 2025-09-07T11:06:44.5497348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:06:44.5499332Z import pynvml # type: ignore[import] 2025-09-07T11:06:47.5186972Z 2025-09-07T11:06:48.9562596Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:06:48.9562989Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:06:48.9608334Z cuda eval deit_base_distilled_patch16_224 2025-09-07T11:07:01.4976732Z pass 2025-09-07T11:07:05.3325894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:07:05.3327729Z import pynvml # type: ignore[import] 2025-09-07T11:07:08.2961790Z 2025-09-07T11:07:09.3639362Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:07:09.3639919Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:07:09.3766111Z cuda eval dla102 2025-09-07T11:07:30.8381351Z Autotune Choices Stats: 2025-09-07T11:07:30.8382472Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_25", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.01206399966031313, "best_triton_pos": 0} 2025-09-07T11:07:30.8533174Z AUTOTUNE mm(100352x32, 32x64) 2025-09-07T11:07:30.8533511Z strides: [32, 1], [1, 32] 2025-09-07T11:07:30.8533981Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:30.8534717Z triton_mm_25 0.0121 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:30.8535736Z triton_mm_24 0.0121 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:30.8537166Z triton_mm_27 0.0123 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:30.8538184Z triton_mm_22 0.0124 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:07:30.8538837Z mm 0.0125 ms 96.7% 2025-09-07T11:07:30.8539428Z triton_mm_30 0.0126 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:07:30.8540445Z triton_mm_31 0.0126 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:30.8541629Z triton_mm_28 0.0126 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:30.8542780Z triton_mm_29 0.0127 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:30.8543624Z triton_mm_26 0.0127 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:30.8544524Z SingleProcess AUTOTUNE benchmarking takes 0.2287 seconds and 0.0004 seconds precompiling for 17 choices 2025-09-07T11:07:31.4405651Z Autotune Choices Stats: 2025-09-07T11:07:31.4406654Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_66", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008320000022649765, "best_triton_pos": 0} 2025-09-07T11:07:31.4550778Z AUTOTUNE mm(25088x32, 32x128) 2025-09-07T11:07:31.4551126Z strides: [32, 1], [1, 32] 2025-09-07T11:07:31.4551452Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:31.4552172Z triton_mm_66 0.0083 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:31.4553217Z triton_mm_64 0.0084 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:07:31.4554571Z triton_mm_68 0.0084 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:31.4555932Z triton_mm_69 0.0084 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:31.4556930Z triton_mm_71 0.0085 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:31.4558018Z triton_mm_67 0.0085 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:31.4558991Z triton_mm_70 0.0088 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:31.4559967Z triton_mm_72 0.0089 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:07:31.4560955Z triton_mm_73 0.0089 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:31.4562075Z triton_mm_74 0.0089 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:31.4562930Z SingleProcess AUTOTUNE benchmarking takes 0.2503 seconds and 0.0002 seconds precompiling for 18 choices 2025-09-07T11:07:32.0779817Z Autotune Choices Stats: 2025-09-07T11:07:32.0780897Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_130", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.013407999649643898, "best_triton_pos": 0} 2025-09-07T11:07:32.0924223Z AUTOTUNE mm(25088x256, 256x128) 2025-09-07T11:07:32.0924870Z strides: [256, 1], [1, 256] 2025-09-07T11:07:32.0925103Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:32.0925660Z triton_mm_130 0.0134 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:32.0926683Z triton_mm_136 0.0141 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:32.0927215Z mm 0.0145 ms 92.3% 2025-09-07T11:07:32.0927694Z triton_mm_135 0.0153 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:32.0928496Z triton_mm_133 0.0155 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:32.0929295Z triton_mm_129 0.0157 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:32.0930104Z triton_mm_132 0.0157 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:32.0930897Z triton_mm_128 0.0160 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:32.0931687Z triton_mm_125 0.0163 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:07:32.0932606Z triton_mm_126 0.0164 ms 82.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:07:32.0933332Z SingleProcess AUTOTUNE benchmarking takes 0.2491 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:07:33.9322820Z Autotune Choices Stats: 2025-09-07T11:07:33.9324348Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_149", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.010623999871313572, "best_triton_pos": 0} 2025-09-07T11:07:33.9471147Z AUTOTUNE mm(25088x128, 128x128) 2025-09-07T11:07:33.9471440Z strides: [128, 1], [1, 128] 2025-09-07T11:07:33.9471725Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:33.9472422Z triton_mm_149 0.0106 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:33.9473462Z triton_mm_147 0.0114 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:33.9474513Z triton_mm_144 0.0115 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:07:33.9475574Z triton_mm_151 0.0115 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:33.9476092Z mm 0.0115 ms 92.2% 2025-09-07T11:07:33.9476580Z triton_mm_155 0.0116 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:33.9477519Z triton_mm_154 0.0116 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:33.9478458Z triton_mm_148 0.0116 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:33.9479277Z triton_mm_152 0.0117 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:33.9480213Z triton_mm_150 0.0121 ms 88.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:33.9480924Z SingleProcess AUTOTUNE benchmarking takes 0.2592 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:07:34.5213669Z Autotune Choices Stats: 2025-09-07T11:07:34.5215008Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_47", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.00940799992531538, "best_triton_pos": 0} 2025-09-07T11:07:34.5357069Z AUTOTUNE mm(25088x64, 64x128) 2025-09-07T11:07:34.5357421Z strides: [64, 1], [1, 64] 2025-09-07T11:07:34.5357688Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:34.5358370Z triton_mm_47 0.0094 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:34.5359390Z triton_mm_53 0.0095 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:34.5360378Z triton_mm_48 0.0096 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:34.5361592Z triton_mm_51 0.0096 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:34.5362591Z triton_mm_52 0.0096 ms 97.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:34.5363679Z triton_mm_49 0.0098 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:34.5364815Z triton_mm_50 0.0098 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:34.5365769Z triton_mm_45 0.0099 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:07:34.5366723Z triton_mm_57 0.0100 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:34.5367696Z triton_mm_55 0.0100 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:34.5368669Z SingleProcess AUTOTUNE benchmarking takes 0.2528 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:07:34.8084378Z Autotune Choices Stats: 2025-09-07T11:07:34.8085294Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_86", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.009855999611318111, "best_triton_pos": 0} 2025-09-07T11:07:34.8540393Z AUTOTUNE mm(25088x128, 128x64) 2025-09-07T11:07:34.8540708Z strides: [128, 1], [1, 128] 2025-09-07T11:07:34.8541429Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:34.8542118Z triton_mm_86 0.0099 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:34.8543128Z triton_mm_82 0.0100 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:07:34.8544414Z triton_mm_89 0.0104 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:34.8545266Z triton_mm_85 0.0104 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:34.8546113Z triton_mm_83 0.0105 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:34.8546943Z triton_mm_91 0.0105 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:34.8547770Z triton_mm_81 0.0106 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:07:34.8548607Z triton_mm_84 0.0106 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:34.8549425Z triton_mm_88 0.0107 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:34.8549942Z mm 0.0107 ms 92.2% 2025-09-07T11:07:34.8550473Z SingleProcess AUTOTUNE benchmarking takes 0.2728 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:07:35.4175899Z Autotune Choices Stats: 2025-09-07T11:07:35.4177016Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_194", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.008704000152647495, "best_triton_pos": 0} 2025-09-07T11:07:35.4320398Z AUTOTUNE mm(6272x128, 128x256) 2025-09-07T11:07:35.4320691Z strides: [128, 1], [1, 128] 2025-09-07T11:07:35.4320969Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:35.4321672Z triton_mm_194 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:35.4322659Z triton_mm_196 0.0088 ms 98.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:35.4323286Z mm 0.0088 ms 98.6% 2025-09-07T11:07:35.4324288Z triton_mm_197 0.0088 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:35.4325976Z triton_mm_192 0.0089 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:35.4327115Z triton_mm_195 0.0089 ms 97.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:35.4328011Z triton_mm_193 0.0090 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:35.4328899Z triton_mm_199 0.0090 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:35.4329982Z triton_mm_200 0.0090 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:35.4331016Z triton_mm_191 0.0091 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:35.4331799Z SingleProcess AUTOTUNE benchmarking takes 0.2572 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:07:36.0557304Z Autotune Choices Stats: 2025-09-07T11:07:36.0558658Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.011680000461637974, "best_triton_pos": 1, "best_triton_time": 0.011839999817311764, "best_triton_kernel": "triton_mm_258", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:07:36.0708993Z AUTOTUNE mm(6272x512, 512x256) 2025-09-07T11:07:36.0709284Z strides: [512, 1], [1, 512] 2025-09-07T11:07:36.0709553Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:36.0709845Z mm 0.0117 ms 100.0% 2025-09-07T11:07:36.0710450Z triton_mm_258 0.0118 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:36.0711432Z triton_mm_265 0.0119 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:36.0712407Z triton_mm_264 0.0122 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:36.0715571Z triton_mm_254 0.0125 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:07:36.0716781Z triton_mm_257 0.0130 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:36.0718039Z triton_mm_261 0.0133 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:36.0718878Z triton_mm_259 0.0137 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:36.0719716Z triton_mm_256 0.0139 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:36.0720549Z triton_mm_260 0.0144 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:36.0721274Z SingleProcess AUTOTUNE benchmarking takes 0.2541 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:07:37.4442074Z Autotune Choices Stats: 2025-09-07T11:07:37.4443440Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.012927999719977379, "best_triton_pos": 1, "best_triton_time": 0.013344000093638897, "best_triton_kernel": "triton_mm_374", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:07:37.4589739Z AUTOTUNE mm(6272x768, 768x256) 2025-09-07T11:07:37.4590029Z strides: [768, 1], [1, 768] 2025-09-07T11:07:37.4590311Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:37.4590811Z mm 0.0129 ms 100.0% 2025-09-07T11:07:37.4591433Z triton_mm_374 0.0133 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:37.4592461Z triton_mm_367 0.0141 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:37.4593604Z triton_mm_363 0.0148 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:07:37.4594867Z triton_mm_373 0.0153 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:37.4595726Z triton_mm_366 0.0156 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:37.4596595Z triton_mm_368 0.0160 ms 81.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:37.4597555Z triton_mm_370 0.0160 ms 80.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:37.4598399Z triton_mm_364 0.0174 ms 74.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:37.4599238Z triton_mm_365 0.0177 ms 73.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:37.4600114Z SingleProcess AUTOTUNE benchmarking takes 0.2509 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:07:38.3891122Z Autotune Choices Stats: 2025-09-07T11:07:38.3892438Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.015135999768972397, "best_triton_pos": 1, "best_triton_time": 0.016704000532627106, "best_triton_kernel": "triton_mm_592", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:07:38.4044215Z AUTOTUNE mm(6272x1152, 1152x256) 2025-09-07T11:07:38.4044551Z strides: [1152, 1], [1, 1152] 2025-09-07T11:07:38.4044878Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:38.4045204Z mm 0.0151 ms 100.0% 2025-09-07T11:07:38.4045821Z triton_mm_592 0.0167 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:38.4046821Z triton_mm_585 0.0180 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:38.4047798Z triton_mm_581 0.0184 ms 82.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:07:38.4049192Z triton_mm_591 0.0193 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:38.4050181Z triton_mm_586 0.0196 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:38.4051150Z triton_mm_584 0.0210 ms 72.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:38.4052252Z triton_mm_582 0.0213 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:38.4053210Z triton_mm_588 0.0214 ms 70.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:38.4054490Z triton_mm_583 0.0245 ms 61.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:38.4055339Z SingleProcess AUTOTUNE benchmarking takes 0.2562 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:07:38.9711333Z Autotune Choices Stats: 2025-09-07T11:07:38.9712681Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.010367999784648418, "best_triton_pos": 1, "best_triton_time": 0.010400000028312206, "best_triton_kernel": "triton_mm_600", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8"} 2025-09-07T11:07:38.9858952Z AUTOTUNE mm(6272x256, 256x256) 2025-09-07T11:07:38.9859228Z strides: [256, 1], [1, 256] 2025-09-07T11:07:38.9859516Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:38.9859832Z mm 0.0104 ms 100.0% 2025-09-07T11:07:38.9860435Z triton_mm_600 0.0104 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:07:38.9861394Z triton_mm_604 0.0105 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:38.9862745Z triton_mm_611 0.0105 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:38.9863910Z triton_mm_607 0.0107 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:38.9864970Z triton_mm_603 0.0108 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:38.9865988Z triton_mm_606 0.0110 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:38.9866900Z triton_mm_610 0.0111 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:38.9867797Z triton_mm_602 0.0112 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:38.9868685Z triton_mm_609 0.0112 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:38.9869601Z SingleProcess AUTOTUNE benchmarking takes 0.2530 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:07:39.5992238Z Autotune Choices Stats: 2025-09-07T11:07:39.5993319Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_209", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8", "best_time": 0.009472000412642956, "best_triton_pos": 0} 2025-09-07T11:07:39.6145625Z AUTOTUNE mm(6272x256, 256x128) 2025-09-07T11:07:39.6145904Z strides: [256, 1], [1, 256] 2025-09-07T11:07:39.6146193Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:39.6146889Z triton_mm_209 0.0095 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:07:39.6148334Z triton_mm_213 0.0096 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:39.6149121Z mm 0.0099 ms 96.1% 2025-09-07T11:07:39.6149708Z triton_mm_214 0.0100 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:39.6150696Z triton_mm_216 0.0101 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:39.6151658Z triton_mm_212 0.0103 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:39.6152617Z triton_mm_203 0.0104 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:07:39.6153588Z triton_mm_205 0.0104 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:39.6154837Z triton_mm_211 0.0105 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:39.6155849Z triton_mm_220 0.0106 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:39.6156607Z SingleProcess AUTOTUNE benchmarking takes 0.2537 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:07:40.3280300Z Autotune Choices Stats: 2025-09-07T11:07:40.3281701Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_649", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.00825599953532219, "best_triton_pos": 0} 2025-09-07T11:07:40.3431311Z AUTOTUNE mm(1568x256, 256x512) 2025-09-07T11:07:40.3431720Z strides: [256, 1], [1, 256] 2025-09-07T11:07:40.3432066Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:40.3432926Z triton_mm_649 0.0083 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:40.3434426Z triton_mm_650 0.0085 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:40.3435774Z triton_mm_652 0.0085 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:40.3436591Z mm 0.0086 ms 95.9% 2025-09-07T11:07:40.3437697Z triton_mm_645 0.0086 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:07:40.3439017Z triton_mm_656 0.0091 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:40.3440305Z triton_mm_648 0.0092 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:40.3441590Z triton_mm_647 0.0095 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:40.3443033Z triton_mm_651 0.0095 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:40.3444506Z triton_mm_655 0.0095 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:40.3445768Z SingleProcess AUTOTUNE benchmarking takes 0.2531 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:07:41.0199803Z Autotune Choices Stats: 2025-09-07T11:07:41.0201594Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.011231999844312668, "best_triton_pos": 1, "best_triton_time": 0.011807999573647976, "best_triton_kernel": "triton_mm_714", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T11:07:41.0353285Z AUTOTUNE mm(1568x1024, 1024x512) 2025-09-07T11:07:41.0353868Z strides: [1024, 1], [1, 1024] 2025-09-07T11:07:41.0354264Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:41.0354639Z mm 0.0112 ms 100.0% 2025-09-07T11:07:41.0355508Z triton_mm_714 0.0118 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:41.0356940Z triton_mm_709 0.0136 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:07:41.0358396Z triton_mm_710 0.0136 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:41.0360248Z triton_mm_713 0.0136 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:41.0361694Z triton_mm_712 0.0153 ms 73.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:41.0363076Z triton_mm_716 0.0154 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:41.0364634Z triton_mm_720 0.0164 ms 68.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:41.0366027Z triton_mm_703 0.0171 ms 65.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:07:41.0367376Z triton_mm_706 0.0174 ms 64.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:41.0368539Z SingleProcess AUTOTUNE benchmarking takes 0.3075 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:07:42.4184028Z Autotune Choices Stats: 2025-09-07T11:07:42.4185887Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.012703999876976013, "best_triton_pos": 1, "best_triton_time": 0.013024000450968742, "best_triton_kernel": "triton_mm_823", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T11:07:42.4330339Z AUTOTUNE mm(1568x1536, 1536x512) 2025-09-07T11:07:42.4330591Z strides: [1536, 1], [1, 1536] 2025-09-07T11:07:42.4330852Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:42.4331137Z mm 0.0127 ms 100.0% 2025-09-07T11:07:42.4331737Z triton_mm_823 0.0130 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:42.4332892Z triton_mm_829 0.0153 ms 83.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:42.4334131Z triton_mm_819 0.0159 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:42.4335102Z triton_mm_818 0.0165 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:07:42.4336072Z triton_mm_822 0.0174 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:42.4337045Z triton_mm_828 0.0188 ms 67.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:42.4337923Z triton_mm_821 0.0191 ms 66.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:42.4338769Z triton_mm_825 0.0193 ms 65.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:42.4339614Z triton_mm_815 0.0214 ms 59.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:42.4340349Z SingleProcess AUTOTUNE benchmarking takes 0.2536 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:07:43.7723277Z Autotune Choices Stats: 2025-09-07T11:07:43.7725185Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.0163199994713068, "best_triton_pos": 1, "best_triton_time": 0.01744000054895878, "best_triton_kernel": "triton_mm_1477", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T11:07:43.7878431Z AUTOTUNE mm(1568x2816, 2816x512) 2025-09-07T11:07:43.7878722Z strides: [2816, 1], [1, 2816] 2025-09-07T11:07:43.7878993Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:43.7879282Z mm 0.0163 ms 100.0% 2025-09-07T11:07:43.7879898Z triton_mm_1477 0.0174 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:43.7880894Z triton_mm_1483 0.0220 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:43.7881875Z triton_mm_1473 0.0223 ms 73.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:43.7883082Z triton_mm_1476 0.0267 ms 61.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:43.7884269Z triton_mm_1472 0.0269 ms 60.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:07:43.7885238Z triton_mm_1482 0.0285 ms 57.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:43.7886212Z triton_mm_1479 0.0315 ms 51.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:43.7887476Z triton_mm_1475 0.0316 ms 51.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:43.7888449Z triton_mm_1469 0.0321 ms 50.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:43.7889397Z SingleProcess AUTOTUNE benchmarking takes 0.2776 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:07:44.3935602Z Autotune Choices Stats: 2025-09-07T11:07:44.3936640Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1496", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.009247999638319016, "best_triton_pos": 0} 2025-09-07T11:07:44.4084978Z AUTOTUNE mm(1568x512, 512x512) 2025-09-07T11:07:44.4085268Z strides: [512, 1], [1, 512] 2025-09-07T11:07:44.4085537Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:44.4086246Z triton_mm_1496 0.0092 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:44.4087036Z mm 0.0097 ms 95.4% 2025-09-07T11:07:44.4087629Z triton_mm_1491 0.0103 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:07:44.4088608Z triton_mm_1495 0.0105 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:44.4089906Z triton_mm_1502 0.0107 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:44.4090908Z triton_mm_1501 0.0110 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:44.4091888Z triton_mm_1498 0.0111 ms 83.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:44.4092855Z triton_mm_1494 0.0111 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:44.4094221Z triton_mm_1492 0.0116 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:44.4095192Z triton_mm_1497 0.0125 ms 73.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:44.4096038Z SingleProcess AUTOTUNE benchmarking takes 0.2943 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:07:45.0084057Z Autotune Choices Stats: 2025-09-07T11:07:45.0085543Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_665", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008608000352978706, "best_triton_pos": 0} 2025-09-07T11:07:45.0238992Z AUTOTUNE mm(1568x512, 512x256) 2025-09-07T11:07:45.0239267Z strides: [512, 1], [1, 512] 2025-09-07T11:07:45.0239530Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:45.0240205Z triton_mm_665 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:45.0241062Z mm 0.0088 ms 97.8% 2025-09-07T11:07:45.0241651Z triton_mm_669 0.0091 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:45.0242648Z triton_mm_664 0.0096 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:07:45.0244658Z triton_mm_668 0.0098 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:45.0245682Z triton_mm_661 0.0101 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:45.0246684Z triton_mm_675 0.0103 ms 83.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:45.0247806Z triton_mm_660 0.0105 ms 82.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:45.0248759Z triton_mm_658 0.0105 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:07:45.0249717Z triton_mm_671 0.0105 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:45.0250558Z SingleProcess AUTOTUNE benchmarking takes 0.2507 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:07:45.9324993Z Autotune Choices Stats: 2025-09-07T11:07:45.9326471Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1537", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.008031999692320824, "best_triton_pos": 0} 2025-09-07T11:07:45.9477927Z AUTOTUNE mm(392x512, 512x1024) 2025-09-07T11:07:45.9478211Z strides: [512, 1], [1, 512] 2025-09-07T11:07:45.9478501Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:45.9479174Z triton_mm_1537 0.0080 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:45.9479830Z mm 0.0088 ms 91.3% 2025-09-07T11:07:45.9480416Z triton_mm_1541 0.0092 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:45.9481404Z triton_mm_1536 0.0092 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:07:45.9482391Z triton_mm_1540 0.0097 ms 83.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:45.9483559Z triton_mm_1533 0.0102 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:45.9484967Z triton_mm_1532 0.0103 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:45.9485937Z triton_mm_1530 0.0104 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:07:45.9486918Z triton_mm_1547 0.0105 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:45.9488203Z triton_mm_1539 0.0107 ms 75.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:45.9489148Z SingleProcess AUTOTUNE benchmarking takes 0.2476 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:07:47.4422115Z Autotune Choices Stats: 2025-09-07T11:07:47.4423211Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1552", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.009119999594986439, "best_triton_pos": 0} 2025-09-07T11:07:47.4570330Z AUTOTUNE mm(392x1024, 1024x512) 2025-09-07T11:07:47.4570587Z strides: [1024, 1], [1, 1024] 2025-09-07T11:07:47.4570855Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:47.4571496Z triton_mm_1552 0.0091 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:47.4572432Z triton_mm_1556 0.0096 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:47.4573010Z mm 0.0098 ms 93.4% 2025-09-07T11:07:47.4573555Z triton_mm_1560 0.0106 ms 86.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:47.4574618Z triton_mm_1551 0.0120 ms 75.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:47.4575724Z triton_mm_1555 0.0122 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:07:47.4576675Z triton_mm_1550 0.0124 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:47.4577595Z triton_mm_1566 0.0126 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:47.4578510Z triton_mm_1559 0.0127 ms 71.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:47.4579386Z triton_mm_1549 0.0133 ms 68.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:07:47.4580135Z SingleProcess AUTOTUNE benchmarking takes 0.2521 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:07:49.1169675Z Autotune Choices Stats: 2025-09-07T11:07:49.1171256Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.014336000196635723, "best_triton_pos": 1, "best_triton_time": 0.014336000196635723, "best_triton_kernel": "triton_mm_1041", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T11:07:49.1329192Z AUTOTUNE mm(1568x2048, 2048x512) 2025-09-07T11:07:49.1329640Z strides: [2048, 1], [1, 2048] 2025-09-07T11:07:49.1329886Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:49.1330131Z mm 0.0143 ms 100.0% 2025-09-07T11:07:49.1330694Z triton_mm_1041 0.0143 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:49.1331633Z triton_mm_1047 0.0179 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:49.1332747Z triton_mm_1037 0.0191 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:49.1333942Z triton_mm_1040 0.0204 ms 70.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:49.1334846Z triton_mm_1036 0.0205 ms 69.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:07:49.1335750Z triton_mm_1046 0.0223 ms 64.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:49.1336664Z triton_mm_1043 0.0238 ms 60.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:49.1337568Z triton_mm_1039 0.0238 ms 60.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:07:49.1338472Z triton_mm_1033 0.0267 ms 53.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:49.1339271Z SingleProcess AUTOTUNE benchmarking takes 0.2627 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:07:49.4732968Z Autotune Choices Stats: 2025-09-07T11:07:49.4734409Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.012736000120639801, "best_triton_pos": 1, "best_triton_time": 0.013151999562978745, "best_triton_kernel": "triton_mm_1601", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:07:49.4892303Z AUTOTUNE mm(392x2560, 2560x1024) 2025-09-07T11:07:49.4892555Z strides: [2560, 1], [1, 2560] 2025-09-07T11:07:49.4892814Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:07:49.4893058Z mm 0.0127 ms 100.0% 2025-09-07T11:07:49.4893609Z triton_mm_1601 0.0132 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:49.4894680Z triton_mm_1605 0.0151 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:07:49.4895590Z triton_mm_1597 0.0181 ms 70.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:07:49.4896496Z triton_mm_1611 0.0202 ms 63.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:49.4897517Z triton_mm_1600 0.0230 ms 55.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:07:49.4898425Z triton_mm_1604 0.0234 ms 54.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:49.4899315Z triton_mm_1594 0.0244 ms 52.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:07:49.4900180Z triton_mm_1596 0.0251 ms 50.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:07:49.4901107Z triton_mm_1610 0.0254 ms 50.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:07:49.4901920Z SingleProcess AUTOTUNE benchmarking takes 0.2695 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:07:57.0746680Z pass 2025-09-07T11:08:01.7088393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:08:01.7089866Z import pynvml # type: ignore[import] 2025-09-07T11:08:04.7997593Z 2025-09-07T11:08:06.6203440Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:08:06.6204370Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:08:06.6267523Z cuda eval dm_nfnet_f0 2025-09-07T11:08:28.2446309Z pass 2025-09-07T11:08:32.1161084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:08:32.1163048Z import pynvml # type: ignore[import] 2025-09-07T11:08:35.0690708Z 2025-09-07T11:08:36.4425366Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:08:36.4425924Z loading model: 0it [00:01, ?it/s] 2025-09-07T11:08:36.4564847Z cuda eval dpn107 2025-09-07T11:09:06.2730429Z Autotune Choices Stats: 2025-09-07T11:09:06.2732128Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.026176000013947487, "best_triton_pos": 1, "best_triton_time": 0.04249599948525429, "best_triton_kernel": "triton_mm_202", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:09:06.2892081Z AUTOTUNE mm(25088x376, 376x400) 2025-09-07T11:09:06.2892314Z strides: [376, 1], [1, 376] 2025-09-07T11:09:06.2892551Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:06.2892832Z mm 0.0262 ms 100.0% 2025-09-07T11:09:06.2893424Z triton_mm_202 0.0425 ms 61.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:06.2894535Z triton_mm_203 0.0465 ms 56.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:06.2895455Z triton_mm_200 0.0517 ms 50.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:09:06.2896391Z triton_mm_201 0.0517 ms 50.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:06.2897403Z triton_mm_196 0.0546 ms 48.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:06.2898252Z triton_mm_192 0.0600 ms 43.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:06.2899098Z triton_mm_199 0.0615 ms 42.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:06.2899938Z triton_mm_198 0.0616 ms 42.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:06.2900893Z triton_mm_197 0.0663 ms 39.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:06.2901625Z SingleProcess AUTOTUNE benchmarking takes 0.3539 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:06.8553307Z Autotune Choices Stats: 2025-09-07T11:09:06.8554547Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_42", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.01369599997997284, "best_triton_pos": 0} 2025-09-07T11:09:06.8706701Z AUTOTUNE mm(25088x128, 128x200) 2025-09-07T11:09:06.8707162Z strides: [128, 1], [1, 128] 2025-09-07T11:09:06.8707586Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:06.8708657Z triton_mm_42 0.0137 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:06.8709684Z mm 0.0138 ms 99.5% 2025-09-07T11:09:06.8710606Z triton_mm_37 0.0149 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:06.8712180Z triton_mm_43 0.0153 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:06.8721606Z triton_mm_36 0.0157 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:06.8722727Z triton_mm_40 0.0160 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:06.8723579Z triton_mm_35 0.0163 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:06.8724624Z triton_mm_39 0.0163 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:06.8725503Z triton_mm_32 0.0167 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:06.8726418Z triton_mm_44 0.0171 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:06.8727204Z SingleProcess AUTOTUNE benchmarking takes 0.2463 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:07.4558681Z Autotune Choices Stats: 2025-09-07T11:09:07.4559806Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_80", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.02751999907195568, "best_triton_pos": 0} 2025-09-07T11:09:07.4713902Z AUTOTUNE mm(25088x316, 316x200) 2025-09-07T11:09:07.4714188Z strides: [316, 1], [1, 316] 2025-09-07T11:09:07.4714412Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:07.4714978Z triton_mm_80 0.0275 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:07.4715840Z triton_mm_73 0.0289 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:07.4716821Z triton_mm_77 0.0292 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:07.4718065Z triton_mm_79 0.0306 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:09:07.4719177Z triton_mm_74 0.0308 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:07.4720150Z triton_mm_78 0.0309 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:07.4721116Z triton_mm_81 0.0326 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:07.4722084Z triton_mm_75 0.0332 ms 82.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:07.4723050Z triton_mm_82 0.0351 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:07.4724188Z triton_mm_70 0.0365 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:07.4725056Z SingleProcess AUTOTUNE benchmarking takes 0.2752 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:07.7483266Z Autotune Choices Stats: 2025-09-07T11:09:07.7485216Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.019807999953627586, "best_triton_pos": 1, "best_triton_time": 0.02316799946129322, "best_triton_kernel": "triton_mm_118", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:09:07.7637776Z AUTOTUNE mm(25088x336, 336x200) 2025-09-07T11:09:07.7638067Z strides: [336, 1], [1, 336] 2025-09-07T11:09:07.7638352Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:07.7638636Z mm 0.0198 ms 100.0% 2025-09-07T11:09:07.7639275Z triton_mm_118 0.0232 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:07.7640287Z triton_mm_119 0.0242 ms 82.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:07.7641268Z triton_mm_113 0.0253 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:07.7642251Z triton_mm_120 0.0256 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:07.7643393Z triton_mm_116 0.0273 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:07.7644590Z triton_mm_115 0.0288 ms 68.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:07.7645576Z triton_mm_112 0.0289 ms 68.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:07.7646664Z triton_mm_109 0.0305 ms 65.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:07.7647739Z triton_mm_111 0.0314 ms 63.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:07.7648600Z SingleProcess AUTOTUNE benchmarking takes 0.2596 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:08.0564883Z Autotune Choices Stats: 2025-09-07T11:09:08.0565989Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_156", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.029952000826597214, "best_triton_pos": 0} 2025-09-07T11:09:08.0718691Z AUTOTUNE mm(25088x356, 356x200) 2025-09-07T11:09:08.0719012Z strides: [356, 1], [1, 356] 2025-09-07T11:09:08.0719284Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:08.0720002Z triton_mm_156 0.0300 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:08.0721006Z triton_mm_149 0.0308 ms 97.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:08.0721981Z triton_mm_153 0.0318 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:08.0722942Z triton_mm_150 0.0340 ms 88.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:08.0724387Z triton_mm_154 0.0343 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:08.0725391Z triton_mm_155 0.0352 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:09:08.0726542Z triton_mm_157 0.0368 ms 81.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:08.0727528Z triton_mm_151 0.0370 ms 81.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:08.0728502Z triton_mm_158 0.0386 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:08.0729470Z triton_mm_146 0.0394 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:08.0730313Z SingleProcess AUTOTUNE benchmarking takes 0.2757 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:08.4047922Z Autotune Choices Stats: 2025-09-07T11:09:08.4049505Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.02396799996495247, "best_triton_pos": 1, "best_triton_time": 0.031936001032590866, "best_triton_kernel": "triton_mm_512", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:09:08.4207040Z AUTOTUNE mm(6272x1152, 1152x800) 2025-09-07T11:09:08.4207333Z strides: [1152, 1], [1, 1152] 2025-09-07T11:09:08.4207606Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:08.4207875Z mm 0.0240 ms 100.0% 2025-09-07T11:09:08.4208467Z triton_mm_512 0.0319 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:08.4209651Z triton_mm_513 0.0324 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:08.4210745Z triton_mm_514 0.0324 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:08.4211726Z triton_mm_507 0.0331 ms 72.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:08.4212696Z triton_mm_508 0.0410 ms 58.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:08.4213670Z triton_mm_509 0.0422 ms 56.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:08.4215013Z triton_mm_510 0.0431 ms 55.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:08.4215974Z triton_mm_506 0.0437 ms 54.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:08.4216941Z triton_mm_503 0.0437 ms 54.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:08.4217743Z SingleProcess AUTOTUNE benchmarking takes 0.3162 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:08.9881113Z Autotune Choices Stats: 2025-09-07T11:09:08.9883008Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.014047999866306782, "best_triton_pos": 1, "best_triton_time": 0.015456000342965126, "best_triton_kernel": "triton_mm_234", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:09:09.0030852Z AUTOTUNE mm(6272x704, 704x400) 2025-09-07T11:09:09.0031117Z strides: [704, 1], [1, 704] 2025-09-07T11:09:09.0031353Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:09.0031608Z mm 0.0140 ms 100.0% 2025-09-07T11:09:09.0032150Z triton_mm_234 0.0155 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:09.0033013Z triton_mm_240 0.0161 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:09.0034063Z triton_mm_236 0.0183 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:09.0034921Z triton_mm_239 0.0187 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:09.0036007Z triton_mm_241 0.0187 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:09.0036880Z triton_mm_237 0.0196 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:09.0037814Z triton_mm_233 0.0197 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:09.0038802Z triton_mm_232 0.0199 ms 70.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:09.0039652Z triton_mm_230 0.0210 ms 67.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:09.0040490Z SingleProcess AUTOTUNE benchmarking takes 0.2555 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:09.5707437Z Autotune Choices Stats: 2025-09-07T11:09:09.5708744Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.015168000012636185, "best_triton_pos": 1, "best_triton_time": 0.016095999628305435, "best_triton_kernel": "triton_mm_272", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:09:09.5861683Z AUTOTUNE mm(6272x768, 768x400) 2025-09-07T11:09:09.5862000Z strides: [768, 1], [1, 768] 2025-09-07T11:09:09.5862284Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:09.5862559Z mm 0.0152 ms 100.0% 2025-09-07T11:09:09.5863175Z triton_mm_272 0.0161 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:09.5864543Z triton_mm_278 0.0164 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:09.5865522Z triton_mm_279 0.0196 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:09.5866907Z triton_mm_277 0.0196 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:09.5867824Z triton_mm_274 0.0197 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:09.5868657Z triton_mm_275 0.0210 ms 72.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:09.5869492Z triton_mm_271 0.0211 ms 71.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:09.5870318Z triton_mm_270 0.0212 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:09.5871153Z triton_mm_273 0.0214 ms 70.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:09.5871895Z SingleProcess AUTOTUNE benchmarking takes 0.2579 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:09.8598867Z Autotune Choices Stats: 2025-09-07T11:09:09.8600372Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.015104000456631184, "best_triton_pos": 1, "best_triton_time": 0.01692800037562847, "best_triton_kernel": "triton_mm_310", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:09:09.8758894Z AUTOTUNE mm(6272x832, 832x400) 2025-09-07T11:09:09.8759164Z strides: [832, 1], [1, 832] 2025-09-07T11:09:09.8759463Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:09.8759739Z mm 0.0151 ms 100.0% 2025-09-07T11:09:09.8760339Z triton_mm_310 0.0169 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:09.8761620Z triton_mm_316 0.0173 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:09.8762732Z triton_mm_317 0.0199 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:09.8764104Z triton_mm_315 0.0207 ms 72.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:09.8765093Z triton_mm_312 0.0210 ms 72.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:09.8766069Z triton_mm_313 0.0219 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:09.8767215Z triton_mm_311 0.0229 ms 66.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:09.8768191Z triton_mm_306 0.0232 ms 65.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:09.8769142Z triton_mm_309 0.0233 ms 64.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:09.8769983Z SingleProcess AUTOTUNE benchmarking takes 0.2576 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:10.1619467Z Autotune Choices Stats: 2025-09-07T11:09:10.1621155Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.016095999628305435, "best_triton_pos": 1, "best_triton_time": 0.017472000792622566, "best_triton_kernel": "triton_mm_348", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:09:10.1779219Z AUTOTUNE mm(6272x896, 896x400) 2025-09-07T11:09:10.1779502Z strides: [896, 1], [1, 896] 2025-09-07T11:09:10.1779774Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:10.1780058Z mm 0.0161 ms 100.0% 2025-09-07T11:09:10.1780665Z triton_mm_348 0.0175 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:10.1781651Z triton_mm_354 0.0184 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:10.1782639Z triton_mm_355 0.0208 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:10.1783980Z triton_mm_350 0.0219 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:10.1785046Z triton_mm_353 0.0219 ms 73.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:10.1786048Z triton_mm_349 0.0227 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:10.1787039Z triton_mm_351 0.0231 ms 69.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:10.1788006Z triton_mm_346 0.0236 ms 68.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:10.1788837Z triton_mm_347 0.0237 ms 67.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:10.1789685Z SingleProcess AUTOTUNE benchmarking takes 0.2692 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:10.4594978Z Autotune Choices Stats: 2025-09-07T11:09:10.4596085Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01648000068962574, "best_triton_pos": 1, "best_triton_time": 0.018079999834299088, "best_triton_kernel": "triton_mm_386", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:09:10.4755844Z AUTOTUNE mm(6272x960, 960x400) 2025-09-07T11:09:10.4756107Z strides: [960, 1], [1, 960] 2025-09-07T11:09:10.4756407Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:10.4756696Z mm 0.0165 ms 100.0% 2025-09-07T11:09:10.4757408Z triton_mm_386 0.0181 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:10.4758404Z triton_mm_392 0.0190 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:10.4759378Z triton_mm_393 0.0220 ms 74.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:10.4760684Z triton_mm_391 0.0233 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:10.4761681Z triton_mm_388 0.0241 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:10.4762672Z triton_mm_387 0.0244 ms 67.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:10.4763656Z triton_mm_389 0.0249 ms 66.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:10.4765000Z triton_mm_382 0.0251 ms 65.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:10.4765976Z triton_mm_385 0.0261 ms 63.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:10.4766876Z SingleProcess AUTOTUNE benchmarking takes 0.2627 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:10.7560819Z Autotune Choices Stats: 2025-09-07T11:09:10.7562532Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.017184000462293625, "best_triton_pos": 1, "best_triton_time": 0.018624000251293182, "best_triton_kernel": "triton_mm_424", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:09:10.7725560Z AUTOTUNE mm(6272x1024, 1024x400) 2025-09-07T11:09:10.7725832Z strides: [1024, 1], [1, 1024] 2025-09-07T11:09:10.7726137Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:10.7726431Z mm 0.0172 ms 100.0% 2025-09-07T11:09:10.7727161Z triton_mm_424 0.0186 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:10.7728326Z triton_mm_430 0.0194 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:10.7729416Z triton_mm_431 0.0222 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:10.7730375Z triton_mm_426 0.0240 ms 71.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:10.7731344Z triton_mm_425 0.0243 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:10.7732331Z triton_mm_429 0.0247 ms 69.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:10.7733301Z triton_mm_420 0.0251 ms 68.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:10.7734466Z triton_mm_427 0.0259 ms 66.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:10.7735438Z triton_mm_422 0.0271 ms 63.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:10.7736287Z SingleProcess AUTOTUNE benchmarking takes 0.2642 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:11.0556818Z Autotune Choices Stats: 2025-09-07T11:09:11.0558544Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.017376000061631203, "best_triton_pos": 1, "best_triton_time": 0.019551999866962433, "best_triton_kernel": "triton_mm_462", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:09:11.0721299Z AUTOTUNE mm(6272x1088, 1088x400) 2025-09-07T11:09:11.0721568Z strides: [1088, 1], [1, 1088] 2025-09-07T11:09:11.0721797Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:11.0722020Z mm 0.0174 ms 100.0% 2025-09-07T11:09:11.0722528Z triton_mm_462 0.0196 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:11.0723342Z triton_mm_468 0.0203 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:11.0724379Z triton_mm_469 0.0232 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:11.0725449Z triton_mm_467 0.0259 ms 67.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:11.0726255Z triton_mm_464 0.0261 ms 66.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:11.0727048Z triton_mm_463 0.0265 ms 65.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:11.0727870Z triton_mm_458 0.0271 ms 64.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:11.0728798Z triton_mm_465 0.0276 ms 62.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:11.0729593Z triton_mm_461 0.0295 ms 59.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:11.0730386Z SingleProcess AUTOTUNE benchmarking takes 0.2668 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:11.4081680Z Autotune Choices Stats: 2025-09-07T11:09:11.4083038Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.023744000121951103, "best_triton_pos": 1, "best_triton_time": 0.029184000566601753, "best_triton_kernel": "triton_mm_1274", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:09:11.4243955Z AUTOTUNE mm(1568x2432, 2432x1600) 2025-09-07T11:09:11.4244251Z strides: [2432, 1], [1, 2432] 2025-09-07T11:09:11.4244530Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:11.4244811Z mm 0.0237 ms 100.0% 2025-09-07T11:09:11.4245442Z triton_mm_1274 0.0292 ms 81.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:11.4246470Z triton_mm_1280 0.0314 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:11.4247593Z triton_mm_1275 0.0362 ms 65.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:11.4249033Z triton_mm_1281 0.0362 ms 65.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:11.4250047Z triton_mm_1272 0.0413 ms 57.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:11.4251061Z triton_mm_1276 0.0416 ms 57.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:11.4252048Z triton_mm_1279 0.0429 ms 55.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:11.4253040Z triton_mm_1270 0.0445 ms 53.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:11.4254177Z triton_mm_1271 0.0454 ms 52.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:11.4255046Z SingleProcess AUTOTUNE benchmarking takes 0.3197 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:11.9934162Z Autotune Choices Stats: 2025-09-07T11:09:11.9935857Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.012703999876976013, "best_triton_pos": 1, "best_triton_time": 0.014112000353634357, "best_triton_kernel": "triton_mm_552", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:09:12.0091780Z AUTOTUNE mm(1568x1216, 1216x800) 2025-09-07T11:09:12.0092056Z strides: [1216, 1], [1, 1216] 2025-09-07T11:09:12.0092307Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:12.0092576Z mm 0.0127 ms 100.0% 2025-09-07T11:09:12.0093378Z triton_mm_552 0.0141 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:12.0098411Z triton_mm_545 0.0158 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:12.0099469Z triton_mm_541 0.0159 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:12.0100368Z triton_mm_551 0.0167 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:12.0101262Z triton_mm_548 0.0179 ms 71.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:12.0102160Z triton_mm_544 0.0179 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:12.0103057Z triton_mm_546 0.0182 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:12.0104116Z triton_mm_542 0.0199 ms 63.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:12.0105023Z triton_mm_543 0.0216 ms 58.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:12.0105801Z SingleProcess AUTOTUNE benchmarking takes 0.2562 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:12.5642008Z Autotune Choices Stats: 2025-09-07T11:09:12.5644398Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.012927999719977379, "best_triton_pos": 1, "best_triton_time": 0.014879999682307243, "best_triton_kernel": "triton_mm_590", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:09:12.5805823Z AUTOTUNE mm(1568x1280, 1280x800) 2025-09-07T11:09:12.5806096Z strides: [1280, 1], [1, 1280] 2025-09-07T11:09:12.5806373Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:12.5806660Z mm 0.0129 ms 100.0% 2025-09-07T11:09:12.5807286Z triton_mm_590 0.0149 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:12.5808441Z triton_mm_579 0.0162 ms 79.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:12.5809403Z triton_mm_583 0.0163 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:12.5810623Z triton_mm_584 0.0178 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:12.5811620Z triton_mm_589 0.0178 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:12.5812589Z triton_mm_582 0.0183 ms 70.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:12.5813552Z triton_mm_586 0.0186 ms 69.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:12.5814810Z triton_mm_580 0.0197 ms 65.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:12.5815902Z triton_mm_581 0.0226 ms 57.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:12.5816741Z SingleProcess AUTOTUNE benchmarking takes 0.2508 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:12.8486212Z Autotune Choices Stats: 2025-09-07T11:09:12.8487572Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.013952000066637993, "best_triton_pos": 1, "best_triton_time": 0.014879999682307243, "best_triton_kernel": "triton_mm_628", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:09:12.8644130Z AUTOTUNE mm(1568x1344, 1344x800) 2025-09-07T11:09:12.8644454Z strides: [1344, 1], [1, 1344] 2025-09-07T11:09:12.8644739Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:12.8645014Z mm 0.0140 ms 100.0% 2025-09-07T11:09:12.8645630Z triton_mm_628 0.0149 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:12.8646623Z triton_mm_621 0.0166 ms 84.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:12.8647643Z triton_mm_617 0.0167 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:12.8649001Z triton_mm_627 0.0176 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:12.8649995Z triton_mm_620 0.0187 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:12.8650955Z triton_mm_624 0.0188 ms 74.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:12.8651935Z triton_mm_622 0.0188 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:12.8652916Z triton_mm_618 0.0209 ms 66.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:12.8654101Z triton_mm_619 0.0231 ms 60.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:12.8654957Z SingleProcess AUTOTUNE benchmarking takes 0.2513 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:13.1335790Z Autotune Choices Stats: 2025-09-07T11:09:13.1337392Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.014303999952971935, "best_triton_pos": 1, "best_triton_time": 0.015104000456631184, "best_triton_kernel": "triton_mm_666", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:09:13.1498005Z AUTOTUNE mm(1568x1408, 1408x800) 2025-09-07T11:09:13.1498341Z strides: [1408, 1], [1, 1408] 2025-09-07T11:09:13.1498654Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:13.1499221Z mm 0.0143 ms 100.0% 2025-09-07T11:09:13.1499833Z triton_mm_666 0.0151 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:13.1500842Z triton_mm_659 0.0171 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:13.1501978Z triton_mm_655 0.0172 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:13.1502943Z triton_mm_665 0.0185 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:13.1504324Z triton_mm_660 0.0188 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:13.1505311Z triton_mm_658 0.0194 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:13.1506271Z triton_mm_662 0.0195 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:13.1507226Z triton_mm_656 0.0206 ms 69.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:13.1508251Z triton_mm_657 0.0236 ms 60.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:13.1509010Z SingleProcess AUTOTUNE benchmarking takes 0.2536 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:13.4197067Z Autotune Choices Stats: 2025-09-07T11:09:13.4198285Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_704", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.01539199985563755, "best_triton_pos": 0} 2025-09-07T11:09:13.4362172Z AUTOTUNE mm(1568x1472, 1472x800) 2025-09-07T11:09:13.4362440Z strides: [1472, 1], [1, 1472] 2025-09-07T11:09:13.4362679Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:13.4363278Z triton_mm_704 0.0154 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:13.4364166Z mm 0.0158 ms 97.6% 2025-09-07T11:09:13.4364668Z triton_mm_697 0.0175 ms 88.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:13.4365504Z triton_mm_693 0.0176 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:13.4366564Z triton_mm_703 0.0187 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:13.4367431Z triton_mm_698 0.0196 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:13.4368401Z triton_mm_696 0.0197 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:13.4369366Z triton_mm_700 0.0199 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:13.4370459Z triton_mm_694 0.0218 ms 70.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:13.4371422Z triton_mm_695 0.0248 ms 62.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:13.4372397Z SingleProcess AUTOTUNE benchmarking takes 0.2540 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:13.7128018Z Autotune Choices Stats: 2025-09-07T11:09:13.7129364Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.014112000353634357, "best_triton_pos": 1, "best_triton_time": 0.01600000075995922, "best_triton_kernel": "triton_mm_742", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:09:13.7298751Z AUTOTUNE mm(1568x1536, 1536x800) 2025-09-07T11:09:13.7299079Z strides: [1536, 1], [1, 1536] 2025-09-07T11:09:13.7299384Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:13.7299663Z mm 0.0141 ms 100.0% 2025-09-07T11:09:13.7300295Z triton_mm_742 0.0160 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:13.7301288Z triton_mm_731 0.0177 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:13.7302256Z triton_mm_735 0.0184 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:13.7303524Z triton_mm_736 0.0195 ms 72.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:13.7304966Z triton_mm_741 0.0195 ms 72.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:13.7305967Z triton_mm_738 0.0206 ms 68.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:13.7306936Z triton_mm_734 0.0207 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:13.7307924Z triton_mm_732 0.0214 ms 65.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:13.7308934Z triton_mm_733 0.0260 ms 54.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:13.7309722Z SingleProcess AUTOTUNE benchmarking takes 0.2609 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:14.0070847Z Autotune Choices Stats: 2025-09-07T11:09:14.0072436Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.014655999839305878, "best_triton_pos": 1, "best_triton_time": 0.016224000602960587, "best_triton_kernel": "triton_mm_780", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:09:14.0245555Z AUTOTUNE mm(1568x1600, 1600x800) 2025-09-07T11:09:14.0245787Z strides: [1600, 1], [1, 1600] 2025-09-07T11:09:14.0246022Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:14.0246257Z mm 0.0147 ms 100.0% 2025-09-07T11:09:14.0246991Z triton_mm_780 0.0162 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:14.0247895Z triton_mm_769 0.0182 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:14.0248984Z triton_mm_773 0.0190 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:14.0249952Z triton_mm_779 0.0200 ms 73.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:14.0250946Z triton_mm_774 0.0204 ms 71.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:14.0251929Z triton_mm_772 0.0212 ms 69.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:14.0252894Z triton_mm_776 0.0214 ms 68.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:14.0254050Z triton_mm_770 0.0227 ms 64.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:14.0255019Z triton_mm_771 0.0268 ms 54.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:14.0255857Z SingleProcess AUTOTUNE benchmarking takes 0.2618 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:14.3036572Z Autotune Choices Stats: 2025-09-07T11:09:14.3037983Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.015359999611973763, "best_triton_pos": 1, "best_triton_time": 0.016672000288963318, "best_triton_kernel": "triton_mm_818", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:09:14.3209731Z AUTOTUNE mm(1568x1664, 1664x800) 2025-09-07T11:09:14.3210000Z strides: [1664, 1], [1, 1664] 2025-09-07T11:09:14.3210236Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:14.3210476Z mm 0.0154 ms 100.0% 2025-09-07T11:09:14.3210984Z triton_mm_818 0.0167 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:14.3211783Z triton_mm_807 0.0190 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:14.3212593Z triton_mm_811 0.0192 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:14.3213551Z triton_mm_812 0.0201 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:14.3214697Z triton_mm_817 0.0202 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:14.3215472Z triton_mm_810 0.0218 ms 70.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:14.3216252Z triton_mm_814 0.0218 ms 70.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:14.3217161Z triton_mm_808 0.0223 ms 68.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:14.3218030Z triton_mm_809 0.0274 ms 56.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:14.3218714Z SingleProcess AUTOTUNE benchmarking takes 0.2634 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:14.5993364Z Autotune Choices Stats: 2025-09-07T11:09:14.5994998Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.015104000456631184, "best_triton_pos": 1, "best_triton_time": 0.01692800037562847, "best_triton_kernel": "triton_mm_856", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:09:14.6164609Z AUTOTUNE mm(1568x1728, 1728x800) 2025-09-07T11:09:14.6164934Z strides: [1728, 1], [1, 1728] 2025-09-07T11:09:14.6165241Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:14.6165543Z mm 0.0151 ms 100.0% 2025-09-07T11:09:14.6166199Z triton_mm_856 0.0169 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:14.6167206Z triton_mm_845 0.0192 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:14.6168254Z triton_mm_849 0.0196 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:14.6169664Z triton_mm_855 0.0206 ms 73.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:14.6170677Z triton_mm_850 0.0214 ms 70.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:14.6171661Z triton_mm_848 0.0229 ms 65.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:14.6172624Z triton_mm_852 0.0231 ms 65.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:14.6173586Z triton_mm_846 0.0236 ms 64.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:14.6174789Z triton_mm_847 0.0279 ms 54.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:14.6175642Z SingleProcess AUTOTUNE benchmarking takes 0.2627 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:14.8934397Z Autotune Choices Stats: 2025-09-07T11:09:14.8936225Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01711999997496605, "best_triton_pos": 1, "best_triton_time": 0.017216000705957413, "best_triton_kernel": "triton_mm_894", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:09:14.9115976Z AUTOTUNE mm(1568x1792, 1792x800) 2025-09-07T11:09:14.9116259Z strides: [1792, 1], [1, 1792] 2025-09-07T11:09:14.9116546Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:14.9117099Z mm 0.0171 ms 100.0% 2025-09-07T11:09:14.9117810Z triton_mm_894 0.0172 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:14.9118807Z triton_mm_883 0.0198 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:14.9119934Z triton_mm_887 0.0201 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:14.9120900Z triton_mm_888 0.0209 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:14.9121876Z triton_mm_893 0.0212 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:14.9122871Z triton_mm_890 0.0233 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:14.9124045Z triton_mm_886 0.0235 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:14.9125008Z triton_mm_884 0.0237 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:14.9125963Z triton_mm_889 0.0289 ms 59.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:14.9126810Z SingleProcess AUTOTUNE benchmarking takes 0.2625 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:15.1950333Z Autotune Choices Stats: 2025-09-07T11:09:15.1951658Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.016063999384641647, "best_triton_pos": 1, "best_triton_time": 0.017920000478625298, "best_triton_kernel": "triton_mm_932", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:09:15.2124985Z AUTOTUNE mm(1568x1856, 1856x800) 2025-09-07T11:09:15.2125286Z strides: [1856, 1], [1, 1856] 2025-09-07T11:09:15.2125564Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:15.2125845Z mm 0.0161 ms 100.0% 2025-09-07T11:09:15.2126461Z triton_mm_932 0.0179 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:15.2127464Z triton_mm_921 0.0201 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:15.2128506Z triton_mm_925 0.0205 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:15.2129679Z triton_mm_931 0.0220 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:15.2130658Z triton_mm_926 0.0223 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:15.2131625Z triton_mm_924 0.0243 ms 66.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:15.2132576Z triton_mm_928 0.0243 ms 66.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:15.2133637Z triton_mm_922 0.0245 ms 65.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:15.2134882Z triton_mm_923 0.0296 ms 54.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:15.2135711Z SingleProcess AUTOTUNE benchmarking takes 0.2675 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:15.4952291Z Autotune Choices Stats: 2025-09-07T11:09:15.4953572Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.016704000532627106, "best_triton_pos": 1, "best_triton_time": 0.017983999103307724, "best_triton_kernel": "triton_mm_970", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:09:15.5122189Z AUTOTUNE mm(1568x1920, 1920x800) 2025-09-07T11:09:15.5122523Z strides: [1920, 1], [1, 1920] 2025-09-07T11:09:15.5122797Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:15.5123088Z mm 0.0167 ms 100.0% 2025-09-07T11:09:15.5123887Z triton_mm_970 0.0180 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:15.5124907Z triton_mm_963 0.0209 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:15.5125880Z triton_mm_959 0.0215 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:15.5127337Z triton_mm_964 0.0217 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:15.5128390Z triton_mm_969 0.0221 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:15.5129511Z triton_mm_960 0.0242 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:15.5130474Z triton_mm_962 0.0243 ms 68.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:15.5131446Z triton_mm_966 0.0245 ms 68.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:15.5132418Z triton_mm_961 0.0306 ms 54.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:15.5133274Z SingleProcess AUTOTUNE benchmarking takes 0.2665 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:15.7943438Z Autotune Choices Stats: 2025-09-07T11:09:15.7945204Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.016896000131964684, "best_triton_pos": 1, "best_triton_time": 0.018303999677300453, "best_triton_kernel": "triton_mm_1008", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:09:15.8114392Z AUTOTUNE mm(1568x1984, 1984x800) 2025-09-07T11:09:15.8114680Z strides: [1984, 1], [1, 1984] 2025-09-07T11:09:15.8114957Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:15.8115473Z mm 0.0169 ms 100.0% 2025-09-07T11:09:15.8116104Z triton_mm_1008 0.0183 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:15.8117209Z triton_mm_1001 0.0214 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:15.8118349Z triton_mm_997 0.0222 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:15.8119367Z triton_mm_1007 0.0228 ms 73.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:15.8120230Z triton_mm_1002 0.0231 ms 73.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:15.8121086Z triton_mm_1004 0.0253 ms 66.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:15.8121932Z triton_mm_1000 0.0254 ms 66.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:15.8122783Z triton_mm_998 0.0257 ms 65.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:15.8123624Z triton_mm_999 0.0313 ms 54.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:15.8124682Z SingleProcess AUTOTUNE benchmarking takes 0.2675 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:16.0986938Z Autotune Choices Stats: 2025-09-07T11:09:16.0988267Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.017920000478625298, "best_triton_pos": 1, "best_triton_time": 0.01852799952030182, "best_triton_kernel": "triton_mm_1046", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:09:16.1165466Z AUTOTUNE mm(1568x2048, 2048x800) 2025-09-07T11:09:16.1165797Z strides: [2048, 1], [1, 2048] 2025-09-07T11:09:16.1166071Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:16.1166365Z mm 0.0179 ms 100.0% 2025-09-07T11:09:16.1166984Z triton_mm_1046 0.0185 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:16.1167987Z triton_mm_1039 0.0220 ms 81.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:16.1169098Z triton_mm_1040 0.0225 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:16.1170359Z triton_mm_1035 0.0226 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:16.1171347Z triton_mm_1045 0.0231 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:16.1172319Z triton_mm_1038 0.0256 ms 69.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:16.1173412Z triton_mm_1042 0.0257 ms 69.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:16.1174700Z triton_mm_1036 0.0258 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:16.1175794Z triton_mm_1037 0.0318 ms 56.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:16.1176630Z SingleProcess AUTOTUNE benchmarking takes 0.2727 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:16.4041669Z Autotune Choices Stats: 2025-09-07T11:09:16.4042801Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1084", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.01894400082528591, "best_triton_pos": 0} 2025-09-07T11:09:16.4213054Z AUTOTUNE mm(1568x2112, 2112x800) 2025-09-07T11:09:16.4213407Z strides: [2112, 1], [1, 2112] 2025-09-07T11:09:16.4213675Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:16.4214595Z triton_mm_1084 0.0189 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:16.4215262Z mm 0.0191 ms 99.0% 2025-09-07T11:09:16.4215851Z triton_mm_1077 0.0226 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:16.4216834Z triton_mm_1073 0.0234 ms 81.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:16.4218307Z triton_mm_1083 0.0238 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:16.4219328Z triton_mm_1078 0.0241 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:16.4220331Z triton_mm_1074 0.0267 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:16.4221245Z triton_mm_1080 0.0268 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:16.4222155Z triton_mm_1076 0.0268 ms 70.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:16.4223065Z triton_mm_1075 0.0329 ms 57.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:16.4224036Z SingleProcess AUTOTUNE benchmarking takes 0.2717 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:16.7086113Z Autotune Choices Stats: 2025-09-07T11:09:16.7087436Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.017696000635623932, "best_triton_pos": 1, "best_triton_time": 0.019200000911951065, "best_triton_kernel": "triton_mm_1122", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:09:16.7259570Z AUTOTUNE mm(1568x2176, 2176x800) 2025-09-07T11:09:16.7259882Z strides: [2176, 1], [1, 2176] 2025-09-07T11:09:16.7260222Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:16.7260693Z mm 0.0177 ms 100.0% 2025-09-07T11:09:16.7261309Z triton_mm_1122 0.0192 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:16.7262305Z triton_mm_1115 0.0228 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:16.7263419Z triton_mm_1116 0.0236 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:16.7264778Z triton_mm_1111 0.0237 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:16.7265758Z triton_mm_1121 0.0240 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:16.7266741Z triton_mm_1112 0.0267 ms 66.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:16.7267732Z triton_mm_1114 0.0271 ms 65.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:16.7268697Z triton_mm_1118 0.0272 ms 65.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:16.7269694Z triton_mm_1113 0.0340 ms 52.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:16.7270545Z SingleProcess AUTOTUNE benchmarking takes 0.2731 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:17.0134561Z Autotune Choices Stats: 2025-09-07T11:09:17.0138257Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.018464000895619392, "best_triton_pos": 1, "best_triton_time": 0.01942400075495243, "best_triton_kernel": "triton_mm_1160", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:09:17.0311562Z AUTOTUNE mm(1568x2240, 2240x800) 2025-09-07T11:09:17.0311854Z strides: [2240, 1], [1, 2240] 2025-09-07T11:09:17.0312159Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:17.0312452Z mm 0.0185 ms 100.0% 2025-09-07T11:09:17.0313078Z triton_mm_1160 0.0194 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:17.0314298Z triton_mm_1153 0.0232 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:17.0315290Z triton_mm_1149 0.0238 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:17.0316886Z triton_mm_1154 0.0244 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:17.0318009Z triton_mm_1159 0.0245 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:17.0319014Z triton_mm_1156 0.0270 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:17.0320112Z triton_mm_1150 0.0272 ms 67.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:17.0320966Z triton_mm_1152 0.0273 ms 67.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:17.0321911Z triton_mm_1151 0.0344 ms 53.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:17.0322659Z SingleProcess AUTOTUNE benchmarking takes 0.2730 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:17.3213485Z Autotune Choices Stats: 2025-09-07T11:09:17.3214951Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01820800080895424, "best_triton_pos": 1, "best_triton_time": 0.019872000440955162, "best_triton_kernel": "triton_mm_1198", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:09:17.3383986Z AUTOTUNE mm(1568x2304, 2304x800) 2025-09-07T11:09:17.3384287Z strides: [2304, 1], [1, 2304] 2025-09-07T11:09:17.3384576Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:17.3384869Z mm 0.0182 ms 100.0% 2025-09-07T11:09:17.3385506Z triton_mm_1198 0.0199 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:17.3386554Z triton_mm_1191 0.0237 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:17.3388106Z triton_mm_1192 0.0242 ms 75.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:17.3389139Z triton_mm_1187 0.0246 ms 74.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:17.3390149Z triton_mm_1197 0.0252 ms 72.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:17.3391006Z triton_mm_1188 0.0275 ms 66.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:17.3391851Z triton_mm_1190 0.0282 ms 64.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:17.3392697Z triton_mm_1194 0.0282 ms 64.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:17.3393539Z triton_mm_1189 0.0353 ms 51.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:17.3394569Z SingleProcess AUTOTUNE benchmarking takes 0.2747 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:17.6302363Z Autotune Choices Stats: 2025-09-07T11:09:17.6304112Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.018592000007629395, "best_triton_pos": 1, "best_triton_time": 0.019999999552965164, "best_triton_kernel": "triton_mm_1236", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2025-09-07T11:09:17.6478151Z AUTOTUNE mm(1568x2368, 2368x800) 2025-09-07T11:09:17.6478456Z strides: [2368, 1], [1, 2368] 2025-09-07T11:09:17.6479181Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:17.6479464Z mm 0.0186 ms 100.0% 2025-09-07T11:09:17.6480111Z triton_mm_1236 0.0200 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:17.6481284Z triton_mm_1229 0.0242 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:17.6482272Z triton_mm_1225 0.0250 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:17.6483304Z triton_mm_1230 0.0256 ms 72.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:17.6484564Z triton_mm_1235 0.0256 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:17.6485578Z triton_mm_1226 0.0285 ms 65.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=False, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:17.6486579Z triton_mm_1228 0.0289 ms 64.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:17.6487571Z triton_mm_1232 0.0292 ms 63.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:17.6488550Z triton_mm_1227 0.0360 ms 51.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:17.6489582Z SingleProcess AUTOTUNE benchmarking takes 0.2760 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:18.4611587Z Autotune Choices Stats: 2025-09-07T11:09:18.4612716Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_1313", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.015615999698638916, "best_triton_pos": 0} 2025-09-07T11:09:18.4788889Z AUTOTUNE mm(392x2432, 2432x1600) 2025-09-07T11:09:18.4789351Z strides: [2432, 1], [1, 2432] 2025-09-07T11:09:18.4789804Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:18.4790888Z triton_mm_1313 0.0156 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:18.4791952Z mm 0.0156 ms 99.8% 2025-09-07T11:09:18.4792927Z triton_mm_1319 0.0200 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:18.4794832Z triton_mm_1309 0.0204 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:18.4796837Z triton_mm_1308 0.0238 ms 65.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:18.4798581Z triton_mm_1312 0.0238 ms 65.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:18.4800129Z triton_mm_1305 0.0242 ms 64.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:18.4801303Z triton_mm_1318 0.0256 ms 60.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:18.4802104Z triton_mm_1302 0.0271 ms 57.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:18.4803000Z triton_mm_1315 0.0278 ms 56.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:18.4803863Z SingleProcess AUTOTUNE benchmarking takes 0.7989 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:19.0732513Z Autotune Choices Stats: 2025-09-07T11:09:19.0734110Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01548799965530634, "best_triton_pos": 1, "best_triton_time": 0.015968000516295433, "best_triton_kernel": "triton_mm_1351", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T11:09:19.0899326Z AUTOTUNE mm(392x2560, 2560x1600) 2025-09-07T11:09:19.0899574Z strides: [2560, 1], [1, 2560] 2025-09-07T11:09:19.0899835Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:19.0900166Z mm 0.0155 ms 100.0% 2025-09-07T11:09:19.0900848Z triton_mm_1351 0.0160 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:19.0901940Z triton_mm_1357 0.0208 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:19.0903307Z triton_mm_1347 0.0208 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:19.0904531Z triton_mm_1350 0.0245 ms 63.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:19.0905502Z triton_mm_1346 0.0246 ms 62.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:19.0906490Z triton_mm_1343 0.0252 ms 61.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:19.0907454Z triton_mm_1356 0.0266 ms 58.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:19.0908426Z triton_mm_1340 0.0287 ms 53.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:19.0909389Z triton_mm_1353 0.0289 ms 53.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:19.0910443Z SingleProcess AUTOTUNE benchmarking takes 0.2747 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:09:33.6045903Z pass 2025-09-07T11:09:38.2628282Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:09:38.2630248Z import pynvml # type: ignore[import] 2025-09-07T11:09:41.2597465Z 2025-09-07T11:09:42.1426267Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:09:42.1426625Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:09:42.1471802Z cuda eval eca_botnext26ts_256 2025-09-07T11:09:58.0284649Z Autotune Choices Stats: 2025-09-07T11:09:58.0285863Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.012799999676644802, "best_triton_pos": 1, "best_triton_time": 0.013952000066637993, "best_triton_kernel": "triton_mm_66", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2025-09-07T11:09:58.0458005Z AUTOTUNE mm(32768x64, 64x256) 2025-09-07T11:09:58.0458301Z strides: [64, 1], [1, 64] 2025-09-07T11:09:58.0458573Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:58.0458863Z mm 0.0128 ms 100.0% 2025-09-07T11:09:58.0459473Z triton_mm_66 0.0140 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:58.0460523Z triton_mm_70 0.0142 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:58.0461518Z triton_mm_71 0.0145 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:58.0462498Z triton_mm_73 0.0145 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:58.0463477Z triton_mm_65 0.0146 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:09:58.0464605Z triton_mm_68 0.0146 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:58.0465614Z triton_mm_67 0.0147 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:58.0466408Z triton_mm_69 0.0148 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:09:58.0467205Z triton_mm_63 0.0148 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:09:58.0467895Z SingleProcess AUTOTUNE benchmarking takes 0.2609 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:09:59.8753478Z Autotune Choices Stats: 2025-09-07T11:09:59.8754992Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_130", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.015904000028967857, "best_triton_pos": 0} 2025-09-07T11:09:59.8922260Z AUTOTUNE mm(32768x256, 256x128) 2025-09-07T11:09:59.8922553Z strides: [256, 1], [1, 256] 2025-09-07T11:09:59.8922825Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:09:59.8924099Z triton_mm_130 0.0159 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:59.8925184Z triton_mm_124 0.0169 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:59.8925762Z mm 0.0170 ms 93.8% 2025-09-07T11:09:59.8926287Z triton_mm_129 0.0174 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:59.8927205Z triton_mm_128 0.0182 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:09:59.8928239Z triton_mm_120 0.0184 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:09:59.8929240Z triton_mm_126 0.0184 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:59.8930136Z triton_mm_131 0.0185 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:09:59.8931027Z triton_mm_123 0.0185 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:09:59.8931920Z triton_mm_122 0.0188 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:09:59.8932714Z SingleProcess AUTOTUNE benchmarking takes 0.2517 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:10:00.4078226Z Autotune Choices Stats: 2025-09-07T11:10:00.4079362Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_186", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.010623999871313572, "best_triton_pos": 0} 2025-09-07T11:10:00.4241419Z AUTOTUNE mm(8192x128, 128x512) 2025-09-07T11:10:00.4241705Z strides: [128, 1], [1, 128] 2025-09-07T11:10:00.4241973Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:10:00.4243029Z triton_mm_186 0.0106 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:00.4244530Z triton_mm_190 0.0107 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:00.4245608Z triton_mm_193 0.0108 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:00.4246582Z triton_mm_183 0.0111 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:10:00.4247543Z triton_mm_194 0.0111 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:00.4248160Z mm 0.0112 ms 94.9% 2025-09-07T11:10:00.4248727Z triton_mm_191 0.0114 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:10:00.4249692Z triton_mm_189 0.0115 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:10:00.4250780Z triton_mm_187 0.0115 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:10:00.4251764Z triton_mm_192 0.0119 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2025-09-07T11:10:00.4252602Z SingleProcess AUTOTUNE benchmarking takes 0.2510 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:10:01.6296557Z Autotune Choices Stats: 2025-09-07T11:10:01.6298290Z {"num_choices": 19, "num_triton_choices": 18, "best_kernel": "triton_mm_87", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.014816000126302242, "best_triton_pos": 0} 2025-09-07T11:10:01.6465281Z AUTOTUNE mm(32768x256, 256x64) 2025-09-07T11:10:01.6465966Z strides: [256, 1], [1, 256] 2025-09-07T11:10:01.6466265Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:10:01.6466945Z triton_mm_87 0.0148 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:01.6467921Z triton_mm_92 0.0150 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:01.6468883Z triton_mm_83 0.0152 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:10:01.6469849Z triton_mm_90 0.0163 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:10:01.6470834Z triton_mm_89 0.0164 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:01.6471801Z triton_mm_85 0.0168 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:01.6472740Z triton_mm_86 0.0168 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:10:01.6474225Z triton_mm_93 0.0169 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:10:01.6475231Z triton_mm_82 0.0170 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:10:01.6476053Z triton_mm_77 0.0174 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:10:01.6476792Z SingleProcess AUTOTUNE benchmarking takes 0.2470 seconds and 0.0002 seconds precompiling for 19 choices 2025-09-07T11:10:01.9182608Z Autotune Choices Stats: 2025-09-07T11:10:01.9183662Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_214", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8", "best_time": 0.012319999746978283, "best_triton_pos": 0} 2025-09-07T11:10:01.9349767Z AUTOTUNE mm(8192x512, 512x256) 2025-09-07T11:10:01.9350061Z strides: [512, 1], [1, 512] 2025-09-07T11:10:01.9350324Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:10:01.9351442Z triton_mm_214 0.0123 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:10:01.9352105Z mm 0.0126 ms 97.5% 2025-09-07T11:10:01.9352682Z triton_mm_207 0.0128 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:01.9353651Z triton_mm_213 0.0132 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:01.9354848Z triton_mm_203 0.0139 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:10:01.9355913Z triton_mm_206 0.0141 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:10:01.9356768Z triton_mm_210 0.0144 ms 85.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:10:01.9357779Z triton_mm_205 0.0149 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:01.9358608Z triton_mm_208 0.0150 ms 82.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:10:01.9359441Z triton_mm_212 0.0157 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:01.9360176Z SingleProcess AUTOTUNE benchmarking takes 0.2558 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:10:02.4418275Z Autotune Choices Stats: 2025-09-07T11:10:02.4419414Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_352", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4", "best_time": 0.00940799992531538, "best_triton_pos": 0} 2025-09-07T11:10:02.4586016Z AUTOTUNE mm(2048x256, 256x1024) 2025-09-07T11:10:02.4586391Z strides: [256, 1], [1, 256] 2025-09-07T11:10:02.4586649Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:10:02.4587264Z triton_mm_352 0.0094 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:02.4588227Z mm 0.0097 ms 97.0% 2025-09-07T11:10:02.4588794Z triton_mm_359 0.0098 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:10:02.4589713Z triton_mm_355 0.0099 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:10:02.4590619Z triton_mm_358 0.0099 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:02.4591509Z triton_mm_351 0.0100 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:10:02.4592405Z triton_mm_348 0.0102 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:10:02.4593293Z triton_mm_350 0.0105 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:02.4594655Z triton_mm_354 0.0105 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:02.4595572Z triton_mm_357 0.0107 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:02.4596372Z SingleProcess AUTOTUNE benchmarking takes 0.2543 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:10:03.3922509Z Autotune Choices Stats: 2025-09-07T11:10:03.3923666Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_170", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.010975999757647514, "best_triton_pos": 0} 2025-09-07T11:10:03.4089133Z AUTOTUNE mm(8192x512, 512x128) 2025-09-07T11:10:03.4096389Z strides: [512, 1], [1, 512] 2025-09-07T11:10:03.4096898Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:10:03.4097499Z triton_mm_170 0.0110 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:10:03.4098065Z mm 0.0114 ms 96.1% 2025-09-07T11:10:03.4098564Z triton_mm_169 0.0118 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:03.4099445Z triton_mm_165 0.0119 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:10:03.4100364Z triton_mm_176 0.0119 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:10:03.4101220Z triton_mm_172 0.0125 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:10:03.4102062Z triton_mm_175 0.0125 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:03.4102900Z triton_mm_168 0.0127 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:10:03.4104046Z triton_mm_166 0.0130 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:10:03.4104913Z triton_mm_171 0.0143 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:03.4105658Z SingleProcess AUTOTUNE benchmarking takes 0.2542 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:10:03.9195421Z Autotune Choices Stats: 2025-09-07T11:10:03.9196770Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.011264000087976456, "best_triton_pos": 1, "best_triton_time": 0.011776000261306763, "best_triton_kernel": "triton_mm_372", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4"} 2025-09-07T11:10:03.9365430Z AUTOTUNE mm(2048x1024, 1024x512) 2025-09-07T11:10:03.9365822Z strides: [1024, 1], [1, 1024] 2025-09-07T11:10:03.9366121Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:10:03.9366401Z mm 0.0113 ms 100.0% 2025-09-07T11:10:03.9367005Z triton_mm_372 0.0118 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:10:03.9368344Z triton_mm_378 0.0133 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:10:03.9369341Z triton_mm_367 0.0138 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:10:03.9370296Z triton_mm_371 0.0138 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:03.9371395Z triton_mm_368 0.0143 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:10:03.9372361Z triton_mm_377 0.0147 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:03.9373441Z triton_mm_374 0.0153 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:10:03.9374824Z triton_mm_370 0.0153 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:10:03.9375783Z triton_mm_373 0.0187 ms 60.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:03.9376505Z SingleProcess AUTOTUNE benchmarking takes 0.2580 seconds and 0.0002 seconds precompiling for 20 choices 2025-09-07T11:10:04.4395894Z Autotune Choices Stats: 2025-09-07T11:10:04.4396965Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_589", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4", "best_time": 0.009312000125646591, "best_triton_pos": 0} 2025-09-07T11:10:04.4564491Z AUTOTUNE mm(512x512, 512x2048) 2025-09-07T11:10:04.4564762Z strides: [512, 1], [1, 512] 2025-09-07T11:10:04.4565033Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:10:04.4565800Z triton_mm_589 0.0093 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:10:04.4566536Z mm 0.0094 ms 98.6% 2025-09-07T11:10:04.4567381Z triton_mm_588 0.0103 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:04.4568371Z triton_mm_584 0.0105 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:10:04.4569341Z triton_mm_595 0.0105 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:10:04.4570307Z triton_mm_594 0.0110 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:04.4571257Z triton_mm_591 0.0111 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:10:04.4572223Z triton_mm_585 0.0115 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:10:04.4573285Z triton_mm_587 0.0115 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:10:04.4574590Z triton_mm_586 0.0124 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:04.4575434Z SingleProcess AUTOTUNE benchmarking takes 0.2563 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:10:05.3783176Z Autotune Choices Stats: 2025-09-07T11:10:05.3784898Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.01071999967098236, "best_triton_pos": 1, "best_triton_time": 0.010751999914646149, "best_triton_kernel": "triton_mm_249", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4"} 2025-09-07T11:10:05.3952527Z AUTOTUNE mm(2048x1024, 1024x256) 2025-09-07T11:10:05.3952939Z strides: [1024, 1], [1, 1024] 2025-09-07T11:10:05.3953532Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:10:05.3954018Z mm 0.0107 ms 100.0% 2025-09-07T11:10:05.3954641Z triton_mm_249 0.0108 ms 99.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:10:05.3955637Z triton_mm_253 0.0116 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:10:05.3956651Z triton_mm_245 0.0131 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:10:05.3957588Z triton_mm_259 0.0135 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:10:05.3958418Z triton_mm_248 0.0136 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:10:05.3959247Z triton_mm_252 0.0141 ms 75.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:05.3960075Z triton_mm_244 0.0144 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:10:05.3961029Z triton_mm_242 0.0149 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:10:05.3961868Z triton_mm_255 0.0153 ms 70.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2025-09-07T11:10:05.3962609Z SingleProcess AUTOTUNE benchmarking takes 0.2615 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:10:05.9237884Z Autotune Choices Stats: 2025-09-07T11:10:05.9239020Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_491", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4", "best_time": 0.011296000331640244, "best_triton_pos": 0} 2025-09-07T11:10:05.9406587Z AUTOTUNE mm(512x2048, 2048x512) 2025-09-07T11:10:05.9406893Z strides: [2048, 1], [1, 2048] 2025-09-07T11:10:05.9407162Z dtypes: torch.bfloat16, torch.bfloat16 2025-09-07T11:10:05.9407867Z triton_mm_491 0.0113 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:10:05.9408501Z mm 0.0116 ms 97.5% 2025-09-07T11:10:05.9409377Z triton_mm_495 0.0123 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4 2025-09-07T11:10:05.9410390Z triton_mm_499 0.0136 ms 83.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2025-09-07T11:10:05.9411361Z triton_mm_505 0.0176 ms 64.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:10:05.9412322Z triton_mm_490 0.0183 ms 61.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:10:05.9413406Z triton_mm_489 0.0189 ms 59.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2025-09-07T11:10:05.9414816Z triton_mm_494 0.0194 ms 58.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=8 2025-09-07T11:10:05.9415917Z triton_mm_498 0.0197 ms 57.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2025-09-07T11:10:05.9416887Z triton_mm_488 0.0198 ms 57.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=4 2025-09-07T11:10:05.9417672Z SingleProcess AUTOTUNE benchmarking takes 0.2648 seconds and 0.0003 seconds precompiling for 20 choices 2025-09-07T11:10:13.8456538Z pass 2025-09-07T11:10:17.8155807Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:10:17.8157213Z import pynvml # type: ignore[import] 2025-09-07T11:10:20.7730665Z 2025-09-07T11:10:21.6794987Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:10:21.6795344Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:10:21.6842156Z cuda eval eca_halonext26ts 2025-09-07T11:10:50.9139975Z pass 2025-09-07T11:10:53.9664541Z accuracy pass_rate=100.00% 2025-09-07T11:10:53.9669685Z calls_captured gmean=361.25x mean=380.875x 2025-09-07T11:10:53.9672946Z unique_graphs gmean=1.00x mean=1.000x 2025-09-07T11:10:53.9675779Z graph_breaks gmean=0.00x mean=0.000x 2025-09-07T11:10:53.9679817Z unique_graph_breaks gmean=0.00x mean=0.000x 2025-09-07T11:10:53.9682657Z autograd_captures gmean=0.00x mean=0.000x 2025-09-07T11:10:53.9686445Z autograd_compiles gmean=0.00x mean=0.000x 2025-09-07T11:10:53.9689544Z cudagraph_skips gmean=0.00x mean=0.000x 2025-09-07T11:10:53.9690770Z compilation_latency mean=30.225 seconds 2025-09-07T11:10:54.9860875Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *cudagraphs_low_precision-true* ]] 2025-09-07T11:10:54.9862943Z + [[ inference == \i\n\f\e\r\e\n\c\e ]] 2025-09-07T11:10:54.9866761Z + python benchmarks/dynamo/timm_models.py --accuracy --no-translation-validation --inference --quant --backend inductor --device cuda --total-partitions 7 --partition-id 1 --output /var/lib/jenkins/workspace/test/test-reports/inductor_cudagraphs_low_precision_timm_models_quant_inference_cuda_h100_accuracy.csv 2025-09-07T11:10:55.9204134Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:10:55.9205237Z import pynvml # type: ignore[import] 2025-09-07T11:10:58.9016269Z usage: timm_models.py 2025-09-07T11:10:58.9016738Z [-h] 2025-09-07T11:10:58.9017090Z [--filter FILTER] 2025-09-07T11:10:58.9017484Z [--exclude EXCLUDE] 2025-09-07T11:10:58.9017902Z [--exclude-exact EXCLUDE_EXACT] 2025-09-07T11:10:58.9018443Z [--total-partitions {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}] 2025-09-07T11:10:58.9019004Z [--partition-id PARTITION_ID] 2025-09-07T11:10:58.9019450Z [--devices DEVICES] 2025-09-07T11:10:58.9019838Z [--device-index DEVICE_INDEX] 2025-09-07T11:10:58.9020241Z [--repeat REPEAT] 2025-09-07T11:10:58.9020681Z [--iterations-per-run ITERATIONS_PER_RUN] 2025-09-07T11:10:58.9021405Z [--randomize-input] 2025-09-07T11:10:58.9021804Z [--threads THREADS] 2025-09-07T11:10:58.9022154Z [--nopython] 2025-09-07T11:10:58.9022475Z [--no-skip] 2025-09-07T11:10:58.9022814Z [--prims-nvfuser] 2025-09-07T11:10:58.9023196Z [--dump-raw-metrics] 2025-09-07T11:10:58.9024366Z [--log-operator-inputs] 2025-09-07T11:10:58.9024781Z [--channels-last] 2025-09-07T11:10:58.9025165Z [--batch-size BATCH_SIZE] 2025-09-07T11:10:58.9025582Z [--iterations ITERATIONS] 2025-09-07T11:10:58.9025870Z [--batch-size-file BATCH_SIZE_FILE] 2025-09-07T11:10:58.9026089Z [--cosine] 2025-09-07T11:10:58.9026256Z [--freezing] 2025-09-07T11:10:58.9026438Z [--inductor-config INDUCTOR_CONFIG] 2025-09-07T11:10:58.9026647Z [--ci] 2025-09-07T11:10:58.9026802Z [--dashboard] 2025-09-07T11:10:58.9026977Z [--skip-fp64-check] 2025-09-07T11:10:58.9027160Z [--fast] 2025-09-07T11:10:58.9027321Z [--only ONLY] 2025-09-07T11:10:58.9027490Z [--multiprocess] 2025-09-07T11:10:58.9027661Z [--ddp] 2025-09-07T11:10:58.9027808Z [--fsdp] 2025-09-07T11:10:58.9027994Z [--optimize-ddp-mode OPTIMIZE_DDP_MODE] 2025-09-07T11:10:58.9028283Z [--distributed-master-port DISTRIBUTED_MASTER_PORT] 2025-09-07T11:10:58.9028555Z [--dynamic-shapes] 2025-09-07T11:10:58.9028754Z [--propagate-real-tensors] 2025-09-07T11:10:58.9028970Z [--dynamic-batch-only] 2025-09-07T11:10:58.9029167Z [--specialize-int] 2025-09-07T11:10:58.9029349Z [--use-eval-mode] 2025-09-07T11:10:58.9029550Z [--skip-accuracy-check] 2025-09-07T11:10:58.9029768Z [--generate-aot-autograd-stats] 2025-09-07T11:10:58.9029995Z [--inductor-settings] 2025-09-07T11:10:58.9030193Z [--suppress-errors] 2025-09-07T11:10:58.9030372Z [--output OUTPUT] 2025-09-07T11:10:58.9030576Z [--output-directory OUTPUT_DIRECTORY] 2025-09-07T11:10:58.9030937Z [--disable-output] 2025-09-07T11:10:58.9031129Z [--baseline BASELINE] 2025-09-07T11:10:58.9031336Z [--part PART] 2025-09-07T11:10:58.9031513Z [--export-profiler-trace] 2025-09-07T11:10:58.9031752Z [--profiler-trace-name PROFILER_TRACE_NAME] 2025-09-07T11:10:58.9032000Z [--profile-details] 2025-09-07T11:10:58.9032200Z [--export-perfdoctor] 2025-09-07T11:10:58.9032396Z [--diff-branch DIFF_BRANCH] 2025-09-07T11:10:58.9032598Z [--tag TAG] 2025-09-07T11:10:58.9032758Z [--explain] 2025-09-07T11:10:58.9032916Z [--stats] 2025-09-07T11:10:58.9033086Z [--use-warm-peak-memory] 2025-09-07T11:10:58.9033292Z [--print-memory] 2025-09-07T11:10:58.9033487Z [--print-compilation-time] 2025-09-07T11:10:58.9033855Z [--print-dataframe-summary] 2025-09-07T11:10:58.9034072Z [--disable-cudagraphs] 2025-09-07T11:10:58.9034284Z [--disable-split-reductions] 2025-09-07T11:10:58.9034522Z [--disable-persistent-reductions] 2025-09-07T11:10:58.9034771Z [--disable-divisible-by-16] 2025-09-07T11:10:58.9035015Z [--inductor-compile-mode INDUCTOR_COMPILE_MODE] 2025-09-07T11:10:58.9035272Z [--print-graph-breaks] 2025-09-07T11:10:58.9035473Z [--log-graph-breaks] 2025-09-07T11:10:58.9035663Z [--trace-on-xla] 2025-09-07T11:10:58.9035929Z [--xla-tolerance XLA_TOLERANCE] 2025-09-07T11:10:58.9036147Z [--collect-outputs] 2025-09-07T11:10:58.9036353Z [--enable-activation-checkpointing] 2025-09-07T11:10:58.9036571Z [--timing] 2025-09-07T11:10:58.9036724Z [--progress] 2025-09-07T11:10:58.9036888Z [--timeout TIMEOUT] 2025-09-07T11:10:58.9037242Z [--per_process_memory_fraction PER_PROCESS_MEMORY_FRACTION] 2025-09-07T11:10:58.9037532Z [--no-translation-validation] 2025-09-07T11:10:58.9037730Z [--minify] 2025-09-07T11:10:58.9037897Z [--compiled-autograd] 2025-09-07T11:10:58.9038096Z [--profile_dynamo_cache_lookup] 2025-09-07T11:10:58.9038400Z [--snapshot-memory] 2025-09-07T11:10:58.9038577Z [--retain-output] 2025-09-07T11:10:58.9038763Z [--caching-precompile] 2025-09-07T11:10:58.9038999Z [--cold-start-latency | --warm-start-latency] 2025-09-07T11:10:58.9039239Z [--nnc] 2025-09-07T11:10:58.9039416Z [--float16 | --bfloat16 | --float32 | --amp] 2025-09-07T11:10:58.9039733Z [--amp-dtype {bfloat16,float16}] 2025-09-07T11:10:58.9039953Z [--verbose | --quiet] 2025-09-07T11:10:58.9042295Z [--coverage | --overhead | --speedup-dynamo-ts | --speedup-fx2trt | --speedup-fx2trt-fp16 | --print-fx | --print-aten-ops | --inductor | --quantization {int8dynamic,int8weightonly,int4weightonly,autoquant,noquant} | --export | --export-aot-inductor | --export-nativert | --torchscript-jit-trace | --xla | --backend {aot_eager,aot_eager_decomp_partition,aot_eager_decomp_partition_crossref,aot_eager_decomp_partition_with_mode,aot_eager_default_partitioner,aot_ts,cudagraphs,dynamo_accuracy_minifier_backend,dynamo_minifier_backend,eager,eager_debug,eager_noexcept,inductor,non_leaf_compile_error_TESTING_ONLY,openxla,openxla_eval,pre_dispatch_eager,relu_accuracy_error_TESTING_ONLY,relu_compile_error_TESTING_ONLY,relu_runtime_error_TESTING_ONLY,ts,tvm} | --nothing | --log-conv-args | --recompile-profiler | --find-batch-sizes] 2025-09-07T11:10:58.9044811Z (--accuracy | --performance | --tolerance) 2025-09-07T11:10:58.9045077Z (--training | --inference) 2025-09-07T11:10:58.9045371Z timm_models.py: error: argument --quantization: expected one argument 2025-09-07T11:10:59.7667415Z + true 2025-09-07T11:10:59.7668376Z + cp /var/lib/jenkins/workspace/test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.csv /var/lib/jenkins/workspace/test/test-reports/inductor_cudagraphs_low_precision_timm_models_quant_inference_cuda_h100_accuracy.csv 2025-09-07T11:10:59.7694274Z + for target in "${targets[@]}" 2025-09-07T11:10:59.7695448Z + target_flag=('--performance') 2025-09-07T11:10:59.7696340Z + local target_flag 2025-09-07T11:10:59.7696596Z + [[ performance == \p\e\r\f\o\r\m\a\n\c\e ]] 2025-09-07T11:10:59.7696873Z + target_flag+=(--cold-start-latency) 2025-09-07T11:10:59.7697900Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *freezing-true* ]] 2025-09-07T11:10:59.7699684Z + [[ training-true-inference-true-default-true-dynamic-true-cudagraphs-true-cppwrapper-true-aotinductor-true-freezing_cudagraphs-true-maxautotune-true-freeze_autotune_cudagraphs-true-cudagraphs_low_precision-true == *default-true* ]] 2025-09-07T11:10:59.7701602Z + python benchmarks/dynamo/timm_models.py --performance --cold-start-latency --inference --bfloat16 --backend inductor --disable-cudagraphs --device cuda --total-partitions 7 --partition-id 1 --output /var/lib/jenkins/workspace/test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance.csv 2025-09-07T11:11:00.7278713Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:11:00.7280234Z import pynvml # type: ignore[import] 2025-09-07T11:11:04.8862048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. 2025-09-07T11:11:04.8863303Z import pynvml # type: ignore[import] 2025-09-07T11:11:07.8630070Z 2025-09-07T11:11:08.6489019Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:11:08.6489366Z loading model: 0it [00:00, ?it/s] 2025-09-07T11:11:08.6557842Z cuda eval crossvit_9_240 2025-09-07T11:11:28.0862130Z 2025-09-07T11:11:28.1902379Z running benchmark: 0% 0/30 [00:00> $GITHUB_ENV 2025-09-07T11:50:03.4853955Z echo "DEVICE_TYPE=$DEVICE_TYPE" >> $GITHUB_ENV 2025-09-07T11:50:03.4869778Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T11:50:03.4870071Z env: 2025-09-07T11:50:03.4870236Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:50:03.4870500Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:50:03.4870843Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:50:03.4871258Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:50:03.4871608Z ##[endgroup] 2025-09-07T11:50:03.4908634Z + [[ -n '' ]] 2025-09-07T11:50:03.4908911Z + python3 -mpip install boto3==1.35.33 psutil==7.0.0 pynvml==12.0.0 2025-09-07T11:50:03.7687135Z Defaulting to user installation because normal site-packages is not writeable 2025-09-07T11:50:04.4585256Z Collecting boto3==1.35.33 2025-09-07T11:50:04.5197492Z Downloading boto3-1.35.33-py3-none-any.whl (139 kB) 2025-09-07T11:50:04.5511742Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 139.1/139.1 KB 4.4 MB/s eta 0:00:00 2025-09-07T11:50:04.6943355Z Collecting psutil==7.0.0 2025-09-07T11:50:04.7057465Z Downloading psutil-7.0.0-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (277 kB) 2025-09-07T11:50:04.7423363Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 278.0/278.0 KB 7.7 MB/s eta 0:00:00 2025-09-07T11:50:04.7841292Z Collecting pynvml==12.0.0 2025-09-07T11:50:04.7953682Z Downloading pynvml-12.0.0-py3-none-any.whl (26 kB) 2025-09-07T11:50:04.8237914Z Collecting jmespath<2.0.0,>=0.7.1 2025-09-07T11:50:04.8352416Z Downloading jmespath-1.0.1-py3-none-any.whl (20 kB) 2025-09-07T11:50:04.8688402Z Collecting s3transfer<0.11.0,>=0.10.0 2025-09-07T11:50:04.8803420Z Downloading s3transfer-0.10.4-py3-none-any.whl (83 kB) 2025-09-07T11:50:04.8921598Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 83.2/83.2 KB 7.3 MB/s eta 0:00:00 2025-09-07T11:50:05.5896881Z Collecting botocore<1.36.0,>=1.35.33 2025-09-07T11:50:05.6012343Z Downloading botocore-1.35.99-py3-none-any.whl (13.3 MB) 2025-09-07T11:50:06.0528342Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.3/13.3 MB 38.8 MB/s eta 0:00:00 2025-09-07T11:50:06.1175870Z Collecting nvidia-ml-py<13.0.0a0,>=12.0.0 2025-09-07T11:50:06.1289209Z Downloading nvidia_ml_py-12.575.51-py3-none-any.whl (47 kB) 2025-09-07T11:50:06.1389000Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.5/47.5 KB 4.6 MB/s eta 0:00:00 2025-09-07T11:50:06.1455071Z Requirement already satisfied: urllib3!=2.2.0,<3,>=1.25.4 in /usr/lib/python3/dist-packages (from botocore<1.36.0,>=1.35.33->boto3==1.35.33) (1.26.5) 2025-09-07T11:50:06.1461529Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/lib/python3/dist-packages (from botocore<1.36.0,>=1.35.33->boto3==1.35.33) (2.8.1) 2025-09-07T11:50:06.4405666Z Installing collected packages: nvidia-ml-py, pynvml, psutil, jmespath, botocore, s3transfer, boto3 2025-09-07T11:50:06.4406398Z Attempting uninstall: nvidia-ml-py 2025-09-07T11:50:06.4411255Z Found existing installation: nvidia-ml-py 11.525.84 2025-09-07T11:50:06.4448189Z Uninstalling nvidia-ml-py-11.525.84: 2025-09-07T11:50:06.4475194Z Successfully uninstalled nvidia-ml-py-11.525.84 2025-09-07T11:50:06.5203443Z Attempting uninstall: psutil 2025-09-07T11:50:06.5209652Z Found existing installation: psutil 5.9.8 2025-09-07T11:50:06.5371896Z Uninstalling psutil-5.9.8: 2025-09-07T11:50:06.5381149Z Successfully uninstalled psutil-5.9.8 2025-09-07T11:50:07.2733126Z Successfully installed boto3-1.35.33 botocore-1.35.99 jmespath-1.0.1 nvidia-ml-py-12.575.51 psutil-7.0.0 pynvml-12.0.0 s3transfer-0.10.4 2025-09-07T11:50:07.3726781Z + DEVICE_NAME= 2025-09-07T11:50:07.3727047Z + DEVICE_TYPE= 2025-09-07T11:50:07.3727255Z + command -v nvidia-smi 2025-09-07T11:50:07.3727893Z /usr/bin/nvidia-smi 2025-09-07T11:50:07.3728113Z + python3 -mpip install torch==2.7.1 2025-09-07T11:50:07.6466732Z Defaulting to user installation because normal site-packages is not writeable 2025-09-07T11:50:07.8625818Z Collecting torch==2.7.1 2025-09-07T11:50:07.9294993Z Downloading torch-2.7.1-cp310-cp310-manylinux_2_28_x86_64.whl (821.2 MB) 2025-09-07T11:50:27.9186502Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 821.2/821.2 MB 1.2 MB/s eta 0:00:00 2025-09-07T11:50:28.8104328Z Collecting triton==3.3.1 2025-09-07T11:50:28.8240677Z Downloading triton-3.3.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (155.6 MB) 2025-09-07T11:50:33.0315324Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.6/155.6 MB 10.3 MB/s eta 0:00:00 2025-09-07T11:50:33.2063108Z Collecting nvidia-nvtx-cu12==12.6.77 2025-09-07T11:50:33.2185999Z Downloading nvidia_nvtx_cu12-12.6.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89 kB) 2025-09-07T11:50:33.2271400Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.3/89.3 KB 11.8 MB/s eta 0:00:00 2025-09-07T11:50:33.2287127Z Requirement already satisfied: typing-extensions>=4.10.0 in /home/charlie/.local/lib/python3.10/site-packages (from torch==2.7.1) (4.15.0) 2025-09-07T11:50:33.2525990Z Collecting nvidia-nvjitlink-cu12==12.6.85 2025-09-07T11:50:33.2650097Z Downloading nvidia_nvjitlink_cu12-12.6.85-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (19.7 MB) 2025-09-07T11:50:33.8504299Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.7/19.7 MB 31.8 MB/s eta 0:00:00 2025-09-07T11:50:33.8901751Z Collecting nvidia-cusparselt-cu12==0.6.3 2025-09-07T11:50:33.9022435Z Downloading nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_x86_64.whl (156.8 MB) 2025-09-07T11:50:45.5595742Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 156.8/156.8 MB 2.8 MB/s eta 0:00:00 2025-09-07T11:50:45.7379294Z Collecting nvidia-cublas-cu12==12.6.4.1 2025-09-07T11:50:45.7500315Z Downloading nvidia_cublas_cu12-12.6.4.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (393.1 MB) 2025-09-07T11:51:04.2278073Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 393.1/393.1 MB 2.0 MB/s eta 0:00:00 2025-09-07T11:51:04.6273389Z Collecting nvidia-cudnn-cu12==9.5.1.17 2025-09-07T11:51:04.6391359Z Downloading nvidia_cudnn_cu12-9.5.1.17-py3-none-manylinux_2_28_x86_64.whl (571.0 MB) 2025-09-07T11:51:24.0507711Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 571.0/571.0 MB 1.8 MB/s eta 0:00:00 2025-09-07T11:51:24.6667900Z Collecting nvidia-cuda-nvrtc-cu12==12.6.77 2025-09-07T11:51:24.6793043Z Downloading nvidia_cuda_nvrtc_cu12-12.6.77-py3-none-manylinux2014_x86_64.whl (23.7 MB) 2025-09-07T11:51:25.3786495Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 31.4 MB/s eta 0:00:00 2025-09-07T11:51:25.4274293Z Collecting nvidia-cufft-cu12==11.3.0.4 2025-09-07T11:51:25.4396062Z Downloading nvidia_cufft_cu12-11.3.0.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (200.2 MB) 2025-09-07T11:51:32.9021921Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.2/200.2 MB 6.7 MB/s eta 0:00:00 2025-09-07T11:51:33.1332181Z Collecting fsspec 2025-09-07T11:51:33.1452129Z Downloading fsspec-2025.9.0-py3-none-any.whl (199 kB) 2025-09-07T11:51:33.1573028Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.3/199.3 KB 18.9 MB/s eta 0:00:00 2025-09-07T11:51:33.1824600Z Collecting nvidia-cusolver-cu12==11.7.1.2 2025-09-07T11:51:33.1945844Z Downloading nvidia_cusolver_cu12-11.7.1.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (158.2 MB) 2025-09-07T11:51:41.0430802Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 158.2/158.2 MB 9.0 MB/s eta 0:00:00 2025-09-07T11:51:41.2173620Z Collecting nvidia-curand-cu12==10.3.7.77 2025-09-07T11:51:41.2296021Z Downloading nvidia_curand_cu12-10.3.7.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (56.3 MB) 2025-09-07T11:51:43.2743059Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.3/56.3 MB 17.8 MB/s eta 0:00:00 2025-09-07T11:51:43.3537800Z Collecting nvidia-cuda-cupti-cu12==12.6.80 2025-09-07T11:51:43.3656427Z Downloading nvidia_cuda_cupti_cu12-12.6.80-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (8.9 MB) 2025-09-07T11:51:43.6835784Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.9/8.9 MB 28.2 MB/s eta 0:00:00 2025-09-07T11:51:43.7328307Z Collecting filelock 2025-09-07T11:51:43.7449212Z Downloading filelock-3.19.1-py3-none-any.whl (15 kB) 2025-09-07T11:51:43.7912168Z Collecting nvidia-cufile-cu12==1.11.1.6 2025-09-07T11:51:43.8032890Z Downloading nvidia_cufile_cu12-1.11.1.6-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.1 MB) 2025-09-07T11:51:43.8480490Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 26.5 MB/s eta 0:00:00 2025-09-07T11:51:43.8928774Z Collecting networkx 2025-09-07T11:51:43.9049018Z Downloading networkx-3.4.2-py3-none-any.whl (1.7 MB) 2025-09-07T11:51:43.9683299Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 27.8 MB/s eta 0:00:00 2025-09-07T11:51:44.0024908Z Collecting jinja2 2025-09-07T11:51:44.0141885Z Downloading jinja2-3.1.6-py3-none-any.whl (134 kB) 2025-09-07T11:51:44.0252475Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.9/134.9 KB 13.6 MB/s eta 0:00:00 2025-09-07T11:51:44.0503546Z Collecting nvidia-cuda-runtime-cu12==12.6.77 2025-09-07T11:51:44.0621402Z Downloading nvidia_cuda_runtime_cu12-12.6.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (897 kB) 2025-09-07T11:51:44.1009160Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 897.7/897.7 KB 24.2 MB/s eta 0:00:00 2025-09-07T11:51:44.1242827Z Collecting nvidia-nccl-cu12==2.26.2 2025-09-07T11:51:44.1361330Z Downloading nvidia_nccl_cu12-2.26.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (201.3 MB) 2025-09-07T11:51:51.0034126Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 201.3/201.3 MB 6.9 MB/s eta 0:00:00 2025-09-07T11:51:51.2198544Z Collecting nvidia-cusparse-cu12==12.5.4.2 2025-09-07T11:51:51.2317685Z Downloading nvidia_cusparse_cu12-12.5.4.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (216.6 MB) 2025-09-07T11:51:58.1808774Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 216.6/216.6 MB 5.9 MB/s eta 0:00:00 2025-09-07T11:51:58.4177228Z Collecting sympy>=1.13.3 2025-09-07T11:51:58.4294398Z Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB) 2025-09-07T11:51:58.6676298Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 26.7 MB/s eta 0:00:00 2025-09-07T11:51:58.7071307Z Requirement already satisfied: setuptools>=40.8.0 in /usr/lib/python3/dist-packages (from triton==3.3.1->torch==2.7.1) (59.6.0) 2025-09-07T11:51:58.7341202Z Collecting mpmath<1.4,>=1.1.0 2025-09-07T11:51:58.7461214Z Downloading mpmath-1.3.0-py3-none-any.whl (536 kB) 2025-09-07T11:51:59.1686655Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 1.3 MB/s eta 0:00:00 2025-09-07T11:51:59.7146526Z Collecting MarkupSafe>=2.0 2025-09-07T11:51:59.7269978Z Downloading MarkupSafe-3.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20 kB) 2025-09-07T11:52:00.1851239Z Installing collected packages: nvidia-cusparselt-cu12, mpmath, triton, sympy, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufile-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, MarkupSafe, fsspec, filelock, nvidia-cusparse-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch 2025-09-07T11:52:04.2728380Z WARNING: The scripts proton and proton-viewer are installed in '/home/charlie/.local/bin' which is not on PATH. 2025-09-07T11:52:04.2729175Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2025-09-07T11:52:07.8166297Z WARNING: The script isympy is installed in '/home/charlie/.local/bin' which is not on PATH. 2025-09-07T11:52:07.8167015Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2025-09-07T11:52:41.7985565Z WARNING: The scripts torchfrtrace and torchrun are installed in '/home/charlie/.local/bin' which is not on PATH. 2025-09-07T11:52:41.7986923Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2025-09-07T11:52:42.7345812Z Successfully installed MarkupSafe-3.0.2 filelock-3.19.1 fsspec-2025.9.0 jinja2-3.1.6 mpmath-1.3.0 networkx-3.4.2 nvidia-cublas-cu12-12.6.4.1 nvidia-cuda-cupti-cu12-12.6.80 nvidia-cuda-nvrtc-cu12-12.6.77 nvidia-cuda-runtime-cu12-12.6.77 nvidia-cudnn-cu12-9.5.1.17 nvidia-cufft-cu12-11.3.0.4 nvidia-cufile-cu12-1.11.1.6 nvidia-curand-cu12-10.3.7.77 nvidia-cusolver-cu12-11.7.1.2 nvidia-cusparse-cu12-12.5.4.2 nvidia-cusparselt-cu12-0.6.3 nvidia-nccl-cu12-2.26.2 nvidia-nvjitlink-cu12-12.6.85 nvidia-nvtx-cu12-12.6.77 sympy-1.14.0 torch-2.7.1 triton-3.3.1 2025-09-07T11:52:43.8547193Z + echo DEVICE_NAME= 2025-09-07T11:52:43.8548588Z + echo DEVICE_TYPE= 2025-09-07T11:52:43.8886013Z ##[group]Run set -eux 2025-09-07T11:52:43.8886222Z set -eux 2025-09-07T11:52:43.8886393Z  2025-09-07T11:52:43.8886584Z if [[ -z "${GITHUB_TOKEN}" ]]; then 2025-09-07T11:52:43.8886851Z  echo "Missing github-token input" 2025-09-07T11:52:43.8887078Z  exit 1 2025-09-07T11:52:43.8887260Z fi 2025-09-07T11:52:43.8902898Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T11:52:43.8903191Z env: 2025-09-07T11:52:43.8903357Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:52:43.8903613Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:52:43.8904274Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:52:43.8904696Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:52:43.8905046Z DEVICE_NAME: 2025-09-07T11:52:43.8905216Z DEVICE_TYPE: 2025-09-07T11:52:43.8905553Z GITHUB_TOKEN: *** 2025-09-07T11:52:43.8905730Z ##[endgroup] 2025-09-07T11:52:43.9383424Z + [[ -z *** ]] 2025-09-07T11:52:44.1350122Z ##[group]Run pytorch/test-infra/.github/actions/get-workflow-job-id@main 2025-09-07T11:52:44.1350454Z with: 2025-09-07T11:52:44.1350853Z github-token: *** 2025-09-07T11:52:44.1351059Z env: 2025-09-07T11:52:44.1351218Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:52:44.1351487Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:52:44.1351842Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:52:44.1352280Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:52:44.1352660Z DEVICE_NAME: 2025-09-07T11:52:44.1352824Z DEVICE_TYPE: 2025-09-07T11:52:44.1352998Z ##[endgroup] 2025-09-07T11:52:44.1392341Z ##[group]Run set -eux 2025-09-07T11:52:44.1392557Z set -eux 2025-09-07T11:52:44.1392727Z  2025-09-07T11:52:44.1393089Z python3 "${GITHUB_ACTION_PATH}/../../scripts/get_workflow_job_id.py" "${GITHUB_RUN_ID}" "${RUNNER_NAME}" 2025-09-07T11:52:44.1409475Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T11:52:44.1409783Z env: 2025-09-07T11:52:44.1409949Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:52:44.1410204Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:52:44.1410554Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:52:44.1411114Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:52:44.1411477Z DEVICE_NAME: 2025-09-07T11:52:44.1411644Z DEVICE_TYPE: 2025-09-07T11:52:44.1411941Z GITHUB_TOKEN: *** 2025-09-07T11:52:44.1412132Z ##[endgroup] 2025-09-07T11:52:44.1909656Z + python3 /home/charlie/_work/_actions/pytorch/test-infra/main/.github/actions/get-workflow-job-id/../../scripts/get_workflow_job_id.py 17525296438 i-05a095f6e498981b2-1003 2025-09-07T11:52:45.0734927Z setting job-id=49775781833 2025-09-07T11:52:45.0737783Z setting job-name=test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100) 2025-09-07T11:52:45.1696629Z ##[group]Run set -eux 2025-09-07T11:52:45.1696865Z set -eux 2025-09-07T11:52:45.1697101Z  2025-09-07T11:52:45.1697284Z if [[ -n "" ]]; then 2025-09-07T11:52:45.1697509Z  source "" 2025-09-07T11:52:45.1697704Z fi 2025-09-07T11:52:45.1697888Z  2025-09-07T11:52:45.1698215Z python3 "${GITHUB_ACTION_PATH}/../../scripts/benchmarks/gather_metadata.py" \ 2025-09-07T11:52:45.1698637Z  --schema-version "${SCHEMA_VERSION}" \ 2025-09-07T11:52:45.1698914Z  --repo "${REPO}" \ 2025-09-07T11:52:45.1699159Z  --head-branch "${HEAD_BRANCH}" \ 2025-09-07T11:52:45.1699418Z  --head-sha "${HEAD_SHA}" \ 2025-09-07T11:52:45.1699847Z  --workflow-id "${WORKFLOW_RUN_ID}" \ 2025-09-07T11:52:45.1700121Z  --run-attempt "${RUN_ATTEMPT}" \ 2025-09-07T11:52:45.1700351Z  --job-id "${JOB_ID}" \ 2025-09-07T11:52:45.1700576Z  --job-name "${JOB_NAME}" 2025-09-07T11:52:45.1715638Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T11:52:45.1715945Z env: 2025-09-07T11:52:45.1716111Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:52:45.1716362Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:52:45.1716695Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:52:45.1717223Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:52:45.1717580Z DEVICE_NAME: 2025-09-07T11:52:45.1717750Z DEVICE_TYPE: 2025-09-07T11:52:45.1718038Z SCHEMA_VERSION: v3 2025-09-07T11:52:45.1718225Z REPO: pytorch/pytorch 2025-09-07T11:52:45.1718431Z HEAD_BRANCH: refs/heads/main 2025-09-07T11:52:45.1718682Z HEAD_SHA: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T11:52:45.1718952Z WORKFLOW_RUN_ID: 17525296438 2025-09-07T11:52:45.1719140Z RUN_ATTEMPT: 1 2025-09-07T11:52:45.1719307Z JOB_ID: 49775781833 2025-09-07T11:52:45.1719591Z JOB_NAME: test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100) 2025-09-07T11:52:45.1719910Z ##[endgroup] 2025-09-07T11:52:45.1752872Z + [[ -n '' ]] 2025-09-07T11:52:45.1754376Z + python3 /home/charlie/_work/_actions/pytorch/test-infra/main/.github/actions/upload-benchmark-results/../../scripts/benchmarks/gather_metadata.py --schema-version v3 --repo pytorch/pytorch --head-branch refs/heads/main --head-sha 93fb23d6fae7c4e82c4239a1033e522088742634 --workflow-id 17525296438 --run-attempt 1 --job-id 49775781833 --job-name 'test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100)' 2025-09-07T11:52:45.2197809Z ##[group]Run set -eux 2025-09-07T11:52:45.2198056Z set -eux 2025-09-07T11:52:45.2198238Z  2025-09-07T11:52:45.2198405Z if [[ -n "" ]]; then 2025-09-07T11:52:45.2198602Z  source "" 2025-09-07T11:52:45.2198772Z fi 2025-09-07T11:52:45.2198915Z  2025-09-07T11:52:45.2199193Z python3 "${GITHUB_ACTION_PATH}/../../scripts/benchmarks/gather_runners_info.py" 2025-09-07T11:52:45.2213624Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T11:52:45.2214090Z env: 2025-09-07T11:52:45.2214248Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:52:45.2214494Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:52:45.2214820Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:52:45.2215377Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:52:45.2215730Z DEVICE_NAME: 2025-09-07T11:52:45.2215892Z DEVICE_TYPE: 2025-09-07T11:52:45.2216056Z ##[endgroup] 2025-09-07T11:52:45.4434369Z + [[ -n '' ]] 2025-09-07T11:52:45.4435480Z + python3 /home/charlie/_work/_actions/pytorch/test-infra/main/.github/actions/upload-benchmark-results/../../scripts/benchmarks/gather_runners_info.py 2025-09-07T11:52:46.5523193Z /home/charlie/.local/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:276: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.) 2025-09-07T11:52:46.5524785Z cpu = _conversion_method_template(device=torch.device("cpu")) 2025-09-07T11:52:47.7509906Z ##[group]Run set -eux 2025-09-07T11:52:47.7510117Z set -eux 2025-09-07T11:52:47.7510275Z  2025-09-07T11:52:47.7510484Z # TODO (huydhn): Implement this part 2025-09-07T11:52:47.7510782Z echo "dependencies={}" >> "${GITHUB_OUTPUT}" 2025-09-07T11:52:47.7526668Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T11:52:47.7526980Z env: 2025-09-07T11:52:47.7527146Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:52:47.7527396Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:52:47.7527932Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:52:47.7528380Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:52:47.7528734Z DEVICE_NAME: 2025-09-07T11:52:47.7528907Z DEVICE_TYPE: 2025-09-07T11:52:47.7529073Z ##[endgroup] 2025-09-07T11:52:47.8000393Z + echo 'dependencies={}' 2025-09-07T11:52:47.9536888Z ##[group]Run set -eux 2025-09-07T11:52:47.9537128Z set -eux 2025-09-07T11:52:47.9537306Z  2025-09-07T11:52:47.9537477Z if [[ -n "" ]]; then 2025-09-07T11:52:47.9537702Z  source "" 2025-09-07T11:52:47.9537887Z fi 2025-09-07T11:52:47.9538055Z  2025-09-07T11:52:47.9538271Z if [[ ! -d "${BENCHMARK_RESULTS_DIR}" ]]; then 2025-09-07T11:52:47.9538633Z  echo "${BENCHMARK_RESULTS_DIR} does not exist, skipping" 2025-09-07T11:52:47.9539186Z  # We don't want the job to fail if the directory doesn't exist 2025-09-07T11:52:47.9539523Z  exit 0 2025-09-07T11:52:47.9539707Z fi 2025-09-07T11:52:47.9539868Z  2025-09-07T11:52:47.9540056Z if [[ "${DRY_RUN}" == "true" ]]; then 2025-09-07T11:52:47.9540436Z  python3 "${GITHUB_ACTION_PATH}/../../scripts/upload_benchmark_results.py" \ 2025-09-07T11:52:47.9540860Z  --benchmark-results-dir "${BENCHMARK_RESULTS_DIR}" \ 2025-09-07T11:52:47.9541177Z  --metadata "${BENCHMARK_METADATA}" \ 2025-09-07T11:52:47.9541444Z  --runners "${RUNNER_INFO}" \ 2025-09-07T11:52:47.9541704Z  --dependencies "${DEPENDENCIES}" \ 2025-09-07T11:52:47.9541952Z  --dry-run 2025-09-07T11:52:47.9542137Z else 2025-09-07T11:52:47.9542413Z  python3 "${GITHUB_ACTION_PATH}/../../scripts/upload_benchmark_results.py" \ 2025-09-07T11:52:47.9542834Z  --benchmark-results-dir "${BENCHMARK_RESULTS_DIR}" \ 2025-09-07T11:52:47.9543148Z  --metadata "${BENCHMARK_METADATA}" \ 2025-09-07T11:52:47.9543415Z  --runners "${RUNNER_INFO}" \ 2025-09-07T11:52:47.9543663Z  --dependencies "${DEPENDENCIES}" 2025-09-07T11:52:47.9544063Z fi 2025-09-07T11:52:47.9559726Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T11:52:47.9560021Z env: 2025-09-07T11:52:47.9560188Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:52:47.9560432Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:52:47.9560764Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:52:47.9561315Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:52:47.9561672Z DEVICE_NAME: 2025-09-07T11:52:47.9561843Z DEVICE_TYPE: 2025-09-07T11:52:47.9562041Z BENCHMARK_RESULTS_DIR: test/test-reports 2025-09-07T11:52:47.9562278Z DRY_RUN: false 2025-09-07T11:52:47.9563168Z BENCHMARK_METADATA: {"timestamp": 1757245965, "schema_version": "v3", "name": "test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100)", "repo": "pytorch/pytorch", "head_branch": "refs/heads/main", "head_sha": "93fb23d6fae7c4e82c4239a1033e522088742634", "workflow_id": 17525296438, "run_attempt": 1, "job_id": 49775781833} 2025-09-07T11:52:47.9564605Z RUNNER_INFO: [{"cpu_info": "x86_64", "cpu_count": 192, "avail_mem_in_gb": 1999, "extra_info": {"hostname": "784802b6db88"}, "name": "cuda", "type": "NVIDIA H100 80GB HBM3", "gpu_count": 1, "avail_gpu_mem_in_gb": 79}] 2025-09-07T11:52:47.9565170Z DEPENDENCIES: {} 2025-09-07T11:52:47.9565338Z ##[endgroup] 2025-09-07T11:52:48.0054497Z + [[ -n '' ]] 2025-09-07T11:52:48.0054755Z + [[ ! -d test/test-reports ]] 2025-09-07T11:52:48.0054994Z + [[ false == \t\r\u\e ]] 2025-09-07T11:52:48.0057739Z + python3 /home/charlie/_work/_actions/pytorch/test-infra/main/.github/actions/upload-benchmark-results/../../scripts/upload_benchmark_results.py --benchmark-results-dir test/test-reports --metadata '{"timestamp": 1757245965, "schema_version": "v3", "name": "test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100)", "repo": "pytorch/pytorch", "head_branch": "refs/heads/main", "head_sha": "93fb23d6fae7c4e82c4239a1033e522088742634", "workflow_id": 17525296438, "run_attempt": 1, "job_id": 49775781833}' --runners '[{"cpu_info": "x86_64", "cpu_count": 192, "avail_mem_in_gb": 1999, "extra_info": {"hostname": "784802b6db88"}, "name": "cuda", "type": "NVIDIA H100 80GB HBM3", "gpu_count": 1, "avail_gpu_mem_in_gb": 79}]' --dependencies '{}' 2025-09-07T11:52:48.1254847Z INFO:root:Upload test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_dynamic_timm_models_amp_training_cuda_h100_accuracy.json 2025-09-07T11:52:48.3554622Z INFO:botocore.credentials:Found credentials from IAM Role: gh-ci-github-action-runners-runner-role 2025-09-07T11:52:48.5670882Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_accuracy.json 2025-09-07T11:52:48.7053583Z INFO:root:Upload test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.json 2025-09-07T11:52:48.8516126Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json 2025-09-07T11:52:49.0342298Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.json 2025-09-07T11:52:49.1651357Z INFO:root:Upload test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_performance.json 2025-09-07T11:52:49.3378737Z INFO:root:Upload test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_max_autotune_timm_models_amp_training_cuda_h100_accuracy.json 2025-09-07T11:52:49.4727418Z INFO:root:Upload test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_performance.json 2025-09-07T11:52:49.6110656Z INFO:root:Upload test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.json 2025-09-07T11:52:49.7439961Z INFO:root:Upload test/test-reports/inductor_export_timm_models_bfloat16_inference_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_export_timm_models_bfloat16_inference_cuda_h100_accuracy.json 2025-09-07T11:52:49.8767467Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json 2025-09-07T11:52:50.1191658Z INFO:root:Upload test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_dynamic_timm_models_amp_training_cuda_h100_performance.json 2025-09-07T11:52:50.1638694Z INFO:root:Upload test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_accuracy.json 2025-09-07T11:52:50.2941752Z INFO:root:Upload test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json 2025-09-07T11:52:50.4776308Z INFO:root:Upload test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json 2025-09-07T11:52:50.6335193Z INFO:root:Upload test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_performance.json 2025-09-07T11:52:50.7835826Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_performance.json 2025-09-07T11:52:50.9181513Z INFO:root:Upload test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_accuracy.json 2025-09-07T11:52:51.0553694Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_performance.json 2025-09-07T11:52:51.2244070Z INFO:root:Upload test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json 2025-09-07T11:52:51.3836608Z INFO:root:Upload test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_performance.json 2025-09-07T11:52:51.5262382Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json 2025-09-07T11:52:51.6640493Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.json 2025-09-07T11:52:51.7928800Z INFO:root:Upload test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_performance.json 2025-09-07T11:52:51.9231036Z INFO:root:Upload test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_accuracy.json 2025-09-07T11:52:52.0481133Z INFO:root:Upload test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_max_autotune_timm_models_amp_training_cuda_h100_performance.json 2025-09-07T11:52:52.1755317Z INFO:root:Upload test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_performance.json 2025-09-07T11:52:52.2997236Z INFO:root:Upload test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json 2025-09-07T11:52:52.4748532Z INFO:root:Upload test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json 2025-09-07T11:52:52.6367688Z INFO:root:Upload test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_dynamic_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json 2025-09-07T11:52:52.8095717Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance.json 2025-09-07T11:52:52.9550848Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json 2025-09-07T11:52:53.1760982Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_performance.json 2025-09-07T11:52:53.3005821Z INFO:root:Upload test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.json 2025-09-07T11:52:53.4275525Z INFO:root:Upload test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json 2025-09-07T11:52:53.5919311Z INFO:root:Upload test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.json 2025-09-07T11:52:53.7174229Z INFO:root:Upload test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance.json 2025-09-07T11:52:53.8660667Z INFO:root:Upload test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_accuracy.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_accuracy.json 2025-09-07T11:52:54.0200972Z INFO:root:Upload test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_max_autotune_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json 2025-09-07T11:52:54.1871969Z INFO:root:Upload test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json to s3://ossci-benchmarks/v3/pytorch/pytorch/17525296438/49775781833/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json 2025-09-07T11:52:54.4409476Z ##[group]Run cat test/**/*_toprint.log || true 2025-09-07T11:52:54.4409828Z cat test/**/*_toprint.log || true 2025-09-07T11:52:54.4425572Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T11:52:54.4425860Z env: 2025-09-07T11:52:54.4426035Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:52:54.4426309Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:52:54.4426794Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:52:54.4427210Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:52:54.4427559Z DEVICE_NAME: 2025-09-07T11:52:54.4427727Z DEVICE_TYPE: 2025-09-07T11:52:54.4427886Z ##[endgroup] 2025-09-07T11:52:54.4983604Z cat: 'test/**/*_toprint.log': No such file or directory 2025-09-07T11:52:54.5812946Z ##[group]Run kill "$MONITOR_SCRIPT_PID" 2025-09-07T11:52:54.5813278Z kill "$MONITOR_SCRIPT_PID" 2025-09-07T11:52:54.5829300Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T11:52:54.5829601Z env: 2025-09-07T11:52:54.5829789Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:52:54.5830064Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:52:54.5830402Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:52:54.5830808Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:52:54.5831182Z DEVICE_NAME: 2025-09-07T11:52:54.5831360Z DEVICE_TYPE: 2025-09-07T11:52:54.5831535Z MONITOR_SCRIPT_PID: 9202 2025-09-07T11:52:54.5831729Z ##[endgroup] 2025-09-07T11:52:54.6393337Z Prepare all required actions 2025-09-07T11:52:54.6393870Z Getting action download info 2025-09-07T11:52:54.8249758Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-09-07T11:52:55.4506050Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-09-07T11:52:57.5887476Z ##[group]Run ./.github/actions/upload-test-artifacts 2025-09-07T11:52:57.5887747Z with: 2025-09-07T11:52:57.5888026Z file-suffix: test-inductor_timm_perf_cuda_h100-2-7-linux.aws.h100_49775781833 2025-09-07T11:52:57.5888497Z s3-bucket: gha-artifacts 2025-09-07T11:52:57.5888701Z env: 2025-09-07T11:52:57.5888859Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:52:57.5889105Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:52:57.5889454Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:52:57.5889895Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:52:57.5890252Z DEVICE_NAME: 2025-09-07T11:52:57.5890415Z DEVICE_TYPE: 2025-09-07T11:52:57.5890578Z ##[endgroup] 2025-09-07T11:52:57.6284939Z ##[group]Run # Remove any previous test jsons if they exist 2025-09-07T11:52:57.6285306Z # Remove any previous test jsons if they exist 2025-09-07T11:52:57.6285607Z rm -f test-jsons-*.zip 2025-09-07T11:52:57.6285952Z zip -r "test-jsons-${FILE_SUFFIX}.zip" test/test-reports -i '*.json' 2025-09-07T11:52:57.6301700Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T11:52:57.6302005Z env: 2025-09-07T11:52:57.6302174Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:52:57.6302436Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:52:57.6302775Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:52:57.6303193Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:52:57.6303538Z DEVICE_NAME: 2025-09-07T11:52:57.6303841Z DEVICE_TYPE: 2025-09-07T11:52:57.6304115Z FILE_SUFFIX: test-inductor_timm_perf_cuda_h100-2-7-linux.aws.h100_49775781833 2025-09-07T11:52:57.6304424Z ##[endgroup] 2025-09-07T11:52:57.6787874Z adding: test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T11:52:57.6801792Z adding: test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T11:52:57.6816147Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T11:52:57.6873364Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T11:52:57.6887550Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T11:52:57.6908610Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_performance.json (deflated 98%) 2025-09-07T11:52:57.6922599Z adding: test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T11:52:57.6943454Z adding: test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_performance.json (deflated 98%) 2025-09-07T11:52:57.6957513Z adding: test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T11:52:57.6971548Z adding: test/test-reports/inductor_export_timm_models_bfloat16_inference_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T11:52:57.7027250Z adding: test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T11:52:57.7045573Z adding: test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_performance.json (deflated 98%) 2025-09-07T11:52:57.7059779Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T11:52:57.7126556Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T11:52:57.7188941Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T11:52:57.7209713Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_performance.json (deflated 98%) 2025-09-07T11:52:57.7230337Z adding: test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_performance.json (deflated 98%) 2025-09-07T11:52:57.7244610Z adding: test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T11:52:57.7265442Z adding: test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_performance.json (deflated 98%) 2025-09-07T11:52:57.7335739Z adding: test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T11:52:57.7356517Z adding: test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_performance.json (deflated 98%) 2025-09-07T11:52:57.7413926Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T11:52:57.7427650Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T11:52:57.7448382Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_performance.json (deflated 98%) 2025-09-07T11:52:57.7462393Z adding: test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T11:52:57.7480934Z adding: test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_performance.json (deflated 98%) 2025-09-07T11:52:57.7501646Z adding: test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_performance.json (deflated 99%) 2025-09-07T11:52:57.7557926Z adding: test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T11:52:57.7615208Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T11:52:57.7671614Z adding: test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T11:52:57.7692278Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance.json (deflated 98%) 2025-09-07T11:52:57.7761464Z adding: test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T11:52:57.7780021Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_performance.json (deflated 98%) 2025-09-07T11:52:57.7794823Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T11:52:57.7848086Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T11:52:57.7862069Z adding: test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T11:52:57.7882789Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance.json (deflated 98%) 2025-09-07T11:52:57.7897224Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_accuracy.json (deflated 99%) 2025-09-07T11:52:57.7965803Z adding: test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T11:52:57.8009626Z adding: test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.json (deflated 99%) 2025-09-07T11:52:57.8133006Z ##[group]Run # Remove any previous test reports if they exist 2025-09-07T11:52:57.8133406Z # Remove any previous test reports if they exist 2025-09-07T11:52:57.8133901Z rm -f test-reports-*.zip 2025-09-07T11:52:57.8134435Z zip -r "test-reports-${FILE_SUFFIX}.zip" test/test-reports -i '*.xml' -i '*.csv' 2025-09-07T11:52:57.8149548Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T11:52:57.8149863Z env: 2025-09-07T11:52:57.8150038Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:52:57.8150300Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:52:57.8150669Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:52:57.8151079Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:52:57.8151427Z DEVICE_NAME: 2025-09-07T11:52:57.8151597Z DEVICE_TYPE: 2025-09-07T11:52:57.8151876Z FILE_SUFFIX: test-inductor_timm_perf_cuda_h100-2-7-linux.aws.h100_49775781833 2025-09-07T11:52:57.8152188Z ##[endgroup] 2025-09-07T11:52:57.8655785Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.csv (deflated 52%) 2025-09-07T11:52:57.8656560Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_accuracy.csv (deflated 52%) 2025-09-07T11:52:57.8657279Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_accuracy.csv (deflated 52%) 2025-09-07T11:52:57.8658093Z adding: test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.csv (deflated 51%) 2025-09-07T11:52:57.8658871Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.csv (deflated 51%) 2025-09-07T11:52:57.8659548Z adding: test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_performance.csv (deflated 49%) 2025-09-07T11:52:57.8660241Z adding: test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.csv (deflated 52%) 2025-09-07T11:52:57.8660994Z adding: test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_performance_compilation_metrics.csv (deflated 50%) 2025-09-07T11:52:57.8662333Z adding: test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.csv (deflated 51%) 2025-09-07T11:52:57.8663184Z adding: test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_performance.csv (deflated 48%) 2025-09-07T11:52:57.8664205Z adding: test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_accuracy.csv (deflated 60%) 2025-09-07T11:52:57.8664957Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.csv (deflated 49%) 2025-09-07T11:52:57.8665733Z adding: test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_performance_compilation_metrics.csv (deflated 48%) 2025-09-07T11:52:57.8666415Z adding: test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_performance.csv (deflated 48%) 2025-09-07T11:52:57.8667392Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.csv (deflated 49%) 2025-09-07T11:52:57.8668219Z adding: test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_performance.csv (deflated 48%) 2025-09-07T11:52:57.8669207Z adding: test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_performance.csv (deflated 49%) 2025-09-07T11:52:57.8669976Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_performance_compilation_metrics.csv (deflated 48%) 2025-09-07T11:52:57.8670888Z adding: test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.csv (deflated 49%) 2025-09-07T11:52:57.8671673Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance.csv (deflated 49%) 2025-09-07T11:52:57.8672536Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_performance_compilation_metrics.csv (deflated 47%) 2025-09-07T11:52:57.8674018Z adding: test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.csv (deflated 49%) 2025-09-07T11:52:57.8674757Z adding: test/test-reports/inductor_cudagraphs_low_precision_timm_models_quant_inference_cuda_h100_accuracy.csv (deflated 52%) 2025-09-07T11:52:57.8675437Z adding: test/test-reports/inductor_no_cudagraphs_timm_models_amp_training_cuda_h100_performance.csv (deflated 47%) 2025-09-07T11:52:57.8676148Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.csv (deflated 49%) 2025-09-07T11:52:57.8676913Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_accuracy.csv (deflated 50%) 2025-09-07T11:52:57.8677542Z adding: test/test-reports/inductor_max_autotune_timm_models_amp_training_cuda_h100_accuracy.csv (deflated 50%) 2025-09-07T11:52:57.8678203Z adding: test/test-reports/inductor_with_cudagraphs_freezing_timm_models_bfloat16_inference_cuda_h100_accuracy.csv (deflated 52%) 2025-09-07T11:52:57.8678891Z adding: test/test-reports/inductor_max_autotune_timm_models_bfloat16_inference_cuda_h100_performance.csv (deflated 48%) 2025-09-07T11:52:57.8679554Z adding: test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_performance.csv (deflated 49%) 2025-09-07T11:52:57.8680245Z adding: test/test-reports/inductor_cudagraphs_low_precision_timm_models_quant_inference_cuda_h100_performance.csv (deflated 48%) 2025-09-07T11:52:57.8680984Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_performance_compilation_metrics.csv (deflated 47%) 2025-09-07T11:52:57.8681668Z adding: test/test-reports/inductor_dynamic_timm_models_bfloat16_inference_cuda_h100_accuracy.csv (deflated 52%) 2025-09-07T11:52:57.8682287Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_performance.csv (deflated 47%) 2025-09-07T11:52:57.8682903Z adding: test/test-reports/inductor_dynamic_timm_models_amp_training_cuda_h100_accuracy.csv (deflated 51%) 2025-09-07T11:52:57.8683541Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_bfloat16_inference_cuda_h100_performance.csv (deflated 48%) 2025-09-07T11:52:57.8684474Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_bfloat16_inference_cuda_h100_performance.csv (deflated 48%) 2025-09-07T11:52:57.8685105Z adding: test/test-reports/inductor_cpp_wrapper_timm_models_amp_training_cuda_h100_accuracy.csv (deflated 51%) 2025-09-07T11:52:57.8685728Z adding: test/test-reports/inductor_with_cudagraphs_timm_models_amp_training_cuda_h100_performance.csv (deflated 48%) 2025-09-07T11:52:57.8686425Z adding: test/test-reports/inductor_with_cudagraphs_freezing_autotune_timm_models_bfloat16_inference_cuda_h100_accuracy.csv (deflated 52%) 2025-09-07T11:52:57.8687106Z adding: test/test-reports/inductor_export_timm_models_bfloat16_inference_cuda_h100_accuracy.csv (deflated 53%) 2025-09-07T11:52:57.8687802Z adding: test/test-reports/inductor_aot_inductor_timm_models_bfloat16_inference_cuda_h100_performance_compilation_metrics.csv (deflated 48%) 2025-09-07T11:52:57.8984172Z ##[group]Run # Remove any previous usage logs if they exist 2025-09-07T11:52:57.8984538Z # Remove any previous usage logs if they exist 2025-09-07T11:52:57.8985021Z rm -f logs-*.zip 2025-09-07T11:52:57.8985287Z zip "logs-${FILE_SUFFIX}.zip" 'usage_log.txt' || true 2025-09-07T11:52:57.8985662Z zip -r "logs-${FILE_SUFFIX}.zip" test/test-reports -i '*.log' || true 2025-09-07T11:52:57.9001164Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T11:52:57.9001459Z env: 2025-09-07T11:52:57.9001631Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:52:57.9001877Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:52:57.9002226Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:52:57.9002785Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:52:57.9003161Z DEVICE_NAME: 2025-09-07T11:52:57.9003330Z DEVICE_TYPE: 2025-09-07T11:52:57.9003627Z FILE_SUFFIX: test-inductor_timm_perf_cuda_h100-2-7-linux.aws.h100_49775781833 2025-09-07T11:52:57.9004133Z ##[endgroup] 2025-09-07T11:52:57.9605180Z adding: usage_log.txt (deflated 91%) 2025-09-07T11:52:57.9623175Z 2025-09-07T11:52:57.9623581Z zip error: Nothing to do! (logs-test-inductor_timm_perf_cuda_h100-2-7-linux.aws.h100_49775781833.zip) 2025-09-07T11:52:57.9886160Z ##[group]Run # Remove any previous debugging artifacts if they exist 2025-09-07T11:52:57.9886591Z # Remove any previous debugging artifacts if they exist 2025-09-07T11:52:57.9886916Z rm -f debug-*.zip 2025-09-07T11:52:57.9887141Z if [ -d 'test/debug' ]; then 2025-09-07T11:52:57.9887431Z  zip -r "debug-${FILE_SUFFIX}.zip" test/debug 2025-09-07T11:52:57.9887724Z fi 2025-09-07T11:52:57.9901856Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T11:52:57.9902148Z env: 2025-09-07T11:52:57.9902310Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:52:57.9902580Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:52:57.9902938Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:52:57.9903365Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:52:57.9903861Z DEVICE_NAME: 2025-09-07T11:52:57.9904027Z DEVICE_TYPE: 2025-09-07T11:52:57.9904304Z FILE_SUFFIX: test-inductor_timm_perf_cuda_h100-2-7-linux.aws.h100_49775781833 2025-09-07T11:52:57.9904623Z ##[endgroup] 2025-09-07T11:52:58.1297770Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-09-07T11:52:58.1298029Z with: 2025-09-07T11:52:58.1298200Z s3-bucket: gha-artifacts 2025-09-07T11:52:58.1298447Z s3-prefix: pytorch/pytorch/17525296438/1/artifact 2025-09-07T11:52:58.1298723Z retention-days: 14 2025-09-07T11:52:58.1298915Z if-no-files-found: warn 2025-09-07T11:52:58.1299124Z path: test-jsons-*.zip 2025-09-07T11:52:58.1299339Z name: artifact 2025-09-07T11:52:58.1299514Z region: us-east-1 2025-09-07T11:52:58.1299781Z env: 2025-09-07T11:52:58.1299951Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:52:58.1300207Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:52:58.1300568Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:52:58.1301001Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:52:58.1301380Z DEVICE_NAME: 2025-09-07T11:52:58.1301552Z DEVICE_TYPE: 2025-09-07T11:52:58.1301719Z ##[endgroup] 2025-09-07T11:52:58.4336631Z NOTE: s3-prefix specified, ignoring name parameter 2025-09-07T11:52:58.4337017Z With the provided path, there will be 1 file uploaded 2025-09-07T11:52:58.4337362Z Uploading to s3 prefix: pytorch/pytorch/17525296438/1/artifact 2025-09-07T11:52:58.4345527Z Starting upload of test-jsons-test-inductor_timm_perf_cuda_h100-2-7-linux.aws.h100_49775781833.zip 2025-09-07T11:52:58.8141158Z Finished upload of test-jsons-test-inductor_timm_perf_cuda_h100-2-7-linux.aws.h100_49775781833.zip 2025-09-07T11:52:58.8464805Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-09-07T11:52:58.8465052Z with: 2025-09-07T11:52:58.8465223Z s3-bucket: gha-artifacts 2025-09-07T11:52:58.8465685Z s3-prefix: pytorch/pytorch/17525296438/1/artifact 2025-09-07T11:52:58.8465959Z retention-days: 14 2025-09-07T11:52:58.8466139Z if-no-files-found: error 2025-09-07T11:52:58.8466341Z path: test-reports-*.zip 2025-09-07T11:52:58.8466529Z name: artifact 2025-09-07T11:52:58.8466697Z region: us-east-1 2025-09-07T11:52:58.8466856Z env: 2025-09-07T11:52:58.8467008Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:52:58.8467252Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:52:58.8467581Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:52:58.8468113Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:52:58.8468460Z DEVICE_NAME: 2025-09-07T11:52:58.8468621Z DEVICE_TYPE: 2025-09-07T11:52:58.8468782Z ##[endgroup] 2025-09-07T11:52:59.1464130Z NOTE: s3-prefix specified, ignoring name parameter 2025-09-07T11:52:59.1464614Z With the provided path, there will be 1 file uploaded 2025-09-07T11:52:59.1465057Z Uploading to s3 prefix: pytorch/pytorch/17525296438/1/artifact 2025-09-07T11:52:59.1472567Z Starting upload of test-reports-test-inductor_timm_perf_cuda_h100-2-7-linux.aws.h100_49775781833.zip 2025-09-07T11:52:59.3264424Z Finished upload of test-reports-test-inductor_timm_perf_cuda_h100-2-7-linux.aws.h100_49775781833.zip 2025-09-07T11:52:59.3729869Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-09-07T11:52:59.3730145Z with: 2025-09-07T11:52:59.3730330Z s3-bucket: gha-artifacts 2025-09-07T11:52:59.3730588Z s3-prefix: pytorch/pytorch/17525296438/1/artifact 2025-09-07T11:52:59.3730877Z retention-days: 14 2025-09-07T11:52:59.3731084Z if-no-files-found: ignore 2025-09-07T11:52:59.3731311Z path: logs-*.zip 2025-09-07T11:52:59.3731499Z name: artifact 2025-09-07T11:52:59.3731675Z region: us-east-1 2025-09-07T11:52:59.3731856Z env: 2025-09-07T11:52:59.3732032Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:52:59.3732301Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:52:59.3732669Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:52:59.3733136Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:52:59.3733526Z DEVICE_NAME: 2025-09-07T11:52:59.3733872Z DEVICE_TYPE: 2025-09-07T11:52:59.3734048Z ##[endgroup] 2025-09-07T11:52:59.6764439Z NOTE: s3-prefix specified, ignoring name parameter 2025-09-07T11:52:59.6764815Z With the provided path, there will be 1 file uploaded 2025-09-07T11:52:59.6765175Z Uploading to s3 prefix: pytorch/pytorch/17525296438/1/artifact 2025-09-07T11:52:59.6773081Z Starting upload of logs-test-inductor_timm_perf_cuda_h100-2-7-linux.aws.h100_49775781833.zip 2025-09-07T11:52:59.8528530Z Finished upload of logs-test-inductor_timm_perf_cuda_h100-2-7-linux.aws.h100_49775781833.zip 2025-09-07T11:52:59.9010898Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-09-07T11:52:59.9011291Z with: 2025-09-07T11:52:59.9011471Z s3-bucket: gha-artifacts 2025-09-07T11:52:59.9011757Z s3-prefix: pytorch/pytorch/17525296438/1/artifact 2025-09-07T11:52:59.9012035Z retention-days: 14 2025-09-07T11:52:59.9012240Z if-no-files-found: ignore 2025-09-07T11:52:59.9012457Z path: debug-*.zip 2025-09-07T11:52:59.9012644Z name: artifact 2025-09-07T11:52:59.9012821Z region: us-east-1 2025-09-07T11:52:59.9013003Z env: 2025-09-07T11:52:59.9013169Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:52:59.9013437Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:52:59.9013988Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:52:59.9014434Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:52:59.9014790Z DEVICE_NAME: 2025-09-07T11:52:59.9014958Z DEVICE_TYPE: 2025-09-07T11:52:59.9015122Z ##[endgroup] 2025-09-07T11:53:00.1962948Z No files were found with the provided path: debug-*.zip. No artifacts will be uploaded. 2025-09-07T11:53:00.2192856Z ##[group]Run # shellcheck disable=SC2156 2025-09-07T11:53:00.2193167Z # shellcheck disable=SC2156 2025-09-07T11:53:00.2193611Z find . -iname "core.[1-9]*" -exec docker exec "${DOCKER_CONTAINER_ID}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \; 2025-09-07T11:53:00.2210141Z shell: /usr/bin/bash -e {0} 2025-09-07T11:53:00.2210364Z env: 2025-09-07T11:53:00.2210536Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:53:00.2210800Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:53:00.2211138Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:53:00.2211557Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:53:00.2212062Z DEVICE_NAME: 2025-09-07T11:53:00.2212230Z DEVICE_TYPE: 2025-09-07T11:53:00.2212390Z ##[endgroup] 2025-09-07T11:53:00.8144806Z Prepare all required actions 2025-09-07T11:53:00.8145190Z Getting action download info 2025-09-07T11:53:00.9771167Z ##[group]Run ./.github/actions/upload-utilization-stats 2025-09-07T11:53:00.9771441Z with: 2025-09-07T11:53:00.9771605Z job_id: 49775781833 2025-09-07T11:53:00.9771894Z job_name: test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100) 2025-09-07T11:53:00.9772268Z workflow_name: inductor-perf-nightly-h100 2025-09-07T11:53:00.9772523Z workflow_run_id: 17525296438 2025-09-07T11:53:00.9772739Z workflow_attempt: 1 2025-09-07T11:53:00.9772914Z env: 2025-09-07T11:53:00.9773060Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:53:00.9773308Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:53:00.9773648Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:53:00.9774261Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:53:00.9774615Z DEVICE_NAME: 2025-09-07T11:53:00.9774809Z DEVICE_TYPE: 2025-09-07T11:53:00.9774970Z ##[endgroup] 2025-09-07T11:53:01.0745128Z ##[group]Run echo "workflow_id: 17525296438" 2025-09-07T11:53:01.0745399Z echo "workflow_id: 17525296438" 2025-09-07T11:53:01.0745649Z echo "workflow_attempt: 1" 2025-09-07T11:53:01.0745942Z echo "workflow_Name: inductor-perf-nightly-h100" 2025-09-07T11:53:01.0746237Z echo "job_id: 49775781833" 2025-09-07T11:53:01.0746598Z echo "job_name: test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100)" 2025-09-07T11:53:01.0746973Z echo "artifact_prefix: " 2025-09-07T11:53:01.0747199Z python3 --version 2025-09-07T11:53:01.0762812Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T11:53:01.0763106Z env: 2025-09-07T11:53:01.0763264Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:53:01.0763522Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:53:01.0764000Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:53:01.0764426Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:53:01.0764919Z DEVICE_NAME: 2025-09-07T11:53:01.0765082Z DEVICE_TYPE: 2025-09-07T11:53:01.0765250Z ##[endgroup] 2025-09-07T11:53:01.1232815Z workflow_id: 17525296438 2025-09-07T11:53:01.1233025Z workflow_attempt: 1 2025-09-07T11:53:01.1233247Z workflow_Name: inductor-perf-nightly-h100 2025-09-07T11:53:01.1233495Z job_id: 49775781833 2025-09-07T11:53:01.1234010Z job_name: test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100) 2025-09-07T11:53:01.1234379Z artifact_prefix: 2025-09-07T11:53:01.1252075Z Python 3.10.12 2025-09-07T11:53:01.1689417Z ##[group]Run nick-fields/retry@v3.0.0 2025-09-07T11:53:01.1689741Z with: 2025-09-07T11:53:01.1689950Z shell: bash 2025-09-07T11:53:01.1690202Z timeout_minutes: 5 2025-09-07T11:53:01.1690458Z max_attempts: 5 2025-09-07T11:53:01.1690713Z retry_wait_seconds: 30 2025-09-07T11:53:01.1691211Z command: set -eu python3 -m pip install python-dateutil==2.8.2 boto3==1.35.42 pandas==2.1.3 dataclasses_json==0.6.7 2025-09-07T11:53:01.1691762Z polling_interval_seconds: 1 2025-09-07T11:53:01.1692054Z warning_on_retry: true 2025-09-07T11:53:01.1692320Z continue_on_error: false 2025-09-07T11:53:01.1692560Z env: 2025-09-07T11:53:01.1692793Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:53:01.1693169Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:53:01.1693629Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:53:01.1706403Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:53:01.1706815Z DEVICE_NAME: 2025-09-07T11:53:01.1706991Z DEVICE_TYPE: 2025-09-07T11:53:01.1707158Z ##[endgroup] 2025-09-07T11:53:01.5200087Z Defaulting to user installation because normal site-packages is not writeable 2025-09-07T11:53:02.0051297Z Collecting python-dateutil==2.8.2 2025-09-07T11:53:02.0627539Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB) 2025-09-07T11:53:02.4560396Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 247.7/247.7 KB 610.2 kB/s eta 0:00:00 2025-09-07T11:53:03.4525974Z Collecting boto3==1.35.42 2025-09-07T11:53:03.4649102Z Downloading boto3-1.35.42-py3-none-any.whl (139 kB) 2025-09-07T11:53:03.9518522Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 139.2/139.2 KB 265.9 kB/s eta 0:00:00 2025-09-07T11:53:04.6846067Z Collecting pandas==2.1.3 2025-09-07T11:53:04.6962326Z Downloading pandas-2.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB) 2025-09-07T11:53:05.8762930Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.3/12.3 MB 9.9 MB/s eta 0:00:00 2025-09-07T11:53:05.9161393Z Requirement already satisfied: dataclasses_json==0.6.7 in /home/charlie/.local/lib/python3.10/site-packages (0.6.7) 2025-09-07T11:53:05.9177299Z Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil==2.8.2) (1.16.0) 2025-09-07T11:53:05.9222391Z Requirement already satisfied: botocore<1.36.0,>=1.35.42 in /home/charlie/.local/lib/python3.10/site-packages (from boto3==1.35.42) (1.35.99) 2025-09-07T11:53:05.9227477Z Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /home/charlie/.local/lib/python3.10/site-packages (from boto3==1.35.42) (1.0.1) 2025-09-07T11:53:05.9231925Z Requirement already satisfied: s3transfer<0.11.0,>=0.10.0 in /home/charlie/.local/lib/python3.10/site-packages (from boto3==1.35.42) (0.10.4) 2025-09-07T11:53:06.4685292Z Collecting pytz>=2020.1 2025-09-07T11:53:06.4804811Z Downloading pytz-2025.2-py2.py3-none-any.whl (509 kB) 2025-09-07T11:53:06.8211698Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 509.2/509.2 KB 1.5 MB/s eta 0:00:00 2025-09-07T11:53:07.3816403Z Collecting tzdata>=2022.1 2025-09-07T11:53:07.3931542Z Downloading tzdata-2025.2-py2.py3-none-any.whl (347 kB) 2025-09-07T11:53:07.8186492Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 347.8/347.8 KB 797.2 kB/s eta 0:00:00 2025-09-07T11:53:08.6231329Z Collecting numpy<2,>=1.22.4 2025-09-07T11:53:08.6351684Z Downloading numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB) 2025-09-07T11:53:10.0486622Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 8.4 MB/s eta 0:00:00 2025-09-07T11:53:10.0760162Z Requirement already satisfied: typing-inspect<1,>=0.4.0 in /home/charlie/.local/lib/python3.10/site-packages (from dataclasses_json==0.6.7) (0.9.0) 2025-09-07T11:53:10.0766839Z Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /home/charlie/.local/lib/python3.10/site-packages (from dataclasses_json==0.6.7) (3.26.1) 2025-09-07T11:53:10.0820837Z Requirement already satisfied: urllib3!=2.2.0,<3,>=1.25.4 in /usr/lib/python3/dist-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.26.5) 2025-09-07T11:53:10.0937111Z Requirement already satisfied: packaging>=17.0 in /home/charlie/.local/lib/python3.10/site-packages (from marshmallow<4.0.0,>=3.18.0->dataclasses_json==0.6.7) (25.0) 2025-09-07T11:53:10.1037514Z Requirement already satisfied: typing-extensions>=3.7.4 in /home/charlie/.local/lib/python3.10/site-packages (from typing-inspect<1,>=0.4.0->dataclasses_json==0.6.7) (4.15.0) 2025-09-07T11:53:10.1041709Z Requirement already satisfied: mypy-extensions>=0.3.0 in /home/charlie/.local/lib/python3.10/site-packages (from typing-inspect<1,>=0.4.0->dataclasses_json==0.6.7) (1.1.0) 2025-09-07T11:53:10.3954957Z Installing collected packages: pytz, tzdata, python-dateutil, numpy, pandas, boto3 2025-09-07T11:53:14.1810593Z WARNING: The script f2py is installed in '/home/charlie/.local/bin' which is not on PATH. 2025-09-07T11:53:14.1811307Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2025-09-07T11:53:18.7945643Z Attempting uninstall: boto3 2025-09-07T11:53:18.7951227Z Found existing installation: boto3 1.35.33 2025-09-07T11:53:18.8175373Z Uninstalling boto3-1.35.33: 2025-09-07T11:53:18.8195797Z Successfully uninstalled boto3-1.35.33 2025-09-07T11:53:19.8373952Z Successfully installed boto3-1.35.42 numpy-1.26.4 pandas-2.1.3 python-dateutil-2.8.2 pytz-2025.2 tzdata-2025.2 2025-09-07T11:53:20.2564479Z Command completed after 1 attempt(s). 2025-09-07T11:53:20.3991011Z ##[group]Run python3 -m tools.stats.upload_utilization_stats.upload_utilization_stats \ 2025-09-07T11:53:20.3991594Z python3 -m tools.stats.upload_utilization_stats.upload_utilization_stats \ 2025-09-07T11:53:20.3991959Z  --workflow-run-id "17525296438" \ 2025-09-07T11:53:20.3992255Z  --workflow-name "inductor-perf-nightly-h100" \ 2025-09-07T11:53:20.3992547Z  --workflow-run-attempt "1" \ 2025-09-07T11:53:20.3992795Z  --job-id "49775781833" \ 2025-09-07T11:53:20.3993147Z  --job-name "test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100)" \ 2025-09-07T11:53:20.3993517Z  --local-path "" \ 2025-09-07T11:53:20.3993927Z  --artifact-prefix "" 2025-09-07T11:53:20.4009864Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T11:53:20.4010163Z env: 2025-09-07T11:53:20.4010471Z GIT_DEFAULT_BRANCH: main 2025-09-07T11:53:20.4010728Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-09-07T11:53:20.4011069Z SCCACHE_SERVER_PORT_DOCKER_FLAG: -e SCCACHE_SERVER_PORT=5229 2025-09-07T11:53:20.4011503Z DOCKER_CONTAINER_ID: 041e022c010b277563b6b009de7f317cbe81470c090a84737c08d082827a2881 2025-09-07T11:53:20.4011880Z DEVICE_NAME: 2025-09-07T11:53:20.4012049Z DEVICE_TYPE: 2025-09-07T11:53:20.4012213Z ##[endgroup] 2025-09-07T11:53:22.7057031Z repo: pytorch/pytorch 2025-09-07T11:53:22.7057604Z Search for test log in s3 bucket: ossci-utilization 2025-09-07T11:53:22.7058389Z Downloading logs-test-inductor_timm_perf_cuda_h100-2-7-linux.aws.h100_49775781833.zip 2025-09-07T11:53:22.7059456Z extracting usage_log.txt from zip file logs-test-inductor_timm_perf_cuda_h100-2-7-linux.aws.h100_49775781833.zip 2025-09-07T11:53:22.7060344Z Converted Log Model: UtilizationMetadata: 2025-09-07T11:53:22.7062150Z UtilizationMetadata(level='metadata', workflow_id='17525296438', job_id='49775781833', workflow_name='inductor-perf-nightly-h100', job_name='test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100)', usage_collect_interval=4.0, data_model_version=1.5, start_at=1757233510, gpu_count=1, cpu_count=192, gpu_type='pynvml', error=None) 2025-09-07T11:53:22.7063261Z [Db Segments] detected pytest cmd: 4, generated segments: 4 2025-09-07T11:53:22.7063554Z [db model] Peek db timeseries 2025-09-07T11:53:22.7064088Z :{ 2025-09-07T11:53:22.7064247Z "created_at": 1757246002, 2025-09-07T11:53:22.7064460Z "type": "utilization", 2025-09-07T11:53:22.7064645Z "tags": [ 2025-09-07T11:53:22.7064794Z "record" 2025-09-07T11:53:22.7064958Z ], 2025-09-07T11:53:22.7065115Z "time_stamp": 1757233510, 2025-09-07T11:53:22.7065324Z "repo": "pytorch/pytorch", 2025-09-07T11:53:22.7065519Z "workflow_id": 17525296438, 2025-09-07T11:53:22.7065713Z "run_attempt": 1, 2025-09-07T11:53:22.7065894Z "job_id": 49775781833, 2025-09-07T11:53:22.7066122Z "workflow_name": "inductor-perf-nightly-h100", 2025-09-07T11:53:22.7066495Z "job_name": "test-weekly / test (inductor_timm_perf_cuda_h100, 2, 7, linux.aws.h100)", 2025-09-07T11:53:22.7066824Z "json_data": "{}" 2025-09-07T11:53:22.7066992Z } 2025-09-07T11:53:22.7067350Z Writing 1 documents to S3 ossci-utilization/util_metadata/v_1.5/pytorch/pytorch/17525296438/1/49775781833/metadata 2025-09-07T11:53:22.7068011Z Done! Finish writing document to S3 ossci-utilization/util_metadata/v_1.5/pytorch/pytorch/17525296438/1/49775781833/metadata 2025-09-07T11:53:22.7068691Z Writing 829 documents to S3 ossci-utilization/util_timeseries/v_1.5/pytorch/pytorch/17525296438/1/49775781833/time_series 2025-09-07T11:53:22.7069379Z Done! Finish writing document to S3 ossci-utilization/util_timeseries/v_1.5/pytorch/pytorch/17525296438/1/49775781833/time_series 2025-09-07T11:53:22.8506068Z Post job cleanup. 2025-09-07T11:53:23.1364677Z Post job cleanup. 2025-09-07T11:53:23.2288821Z [command]/usr/bin/git version 2025-09-07T11:53:23.2330607Z git version 2.50.1 2025-09-07T11:53:23.2371141Z Temporarily overriding HOME='/home/charlie/_work/_temp/94da2629-dcec-4bfb-a543-e772fba110c8' before making global git config changes 2025-09-07T11:53:23.2371881Z Adding repository directory to the temporary git global config as a safe directory 2025-09-07T11:53:23.2376067Z [command]/usr/bin/git config --global --add safe.directory /home/charlie/_work/pytorch/pytorch 2025-09-07T11:53:23.3290837Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-09-07T11:53:23.3326317Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-09-07T11:53:23.4888022Z Entering 'android/libs/fbjni' 2025-09-07T11:53:23.4941353Z Entering 'third_party/FP16' 2025-09-07T11:53:23.4991453Z Entering 'third_party/FXdiv' 2025-09-07T11:53:23.5041380Z Entering 'third_party/NNPACK' 2025-09-07T11:53:23.5092339Z Entering 'third_party/NVTX' 2025-09-07T11:53:23.5142617Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T11:53:23.5196770Z Entering 'third_party/XNNPACK' 2025-09-07T11:53:23.5263518Z Entering 'third_party/aiter' 2025-09-07T11:53:23.5316876Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T11:53:23.5375947Z Entering 'third_party/benchmark' 2025-09-07T11:53:23.5427975Z Entering 'third_party/composable_kernel' 2025-09-07T11:53:23.5487132Z Entering 'third_party/cpp-httplib' 2025-09-07T11:53:23.5538483Z Entering 'third_party/cpuinfo' 2025-09-07T11:53:23.5590557Z Entering 'third_party/cudnn_frontend' 2025-09-07T11:53:23.5641921Z Entering 'third_party/cutlass' 2025-09-07T11:53:23.5702122Z Entering 'third_party/fbgemm' 2025-09-07T11:53:23.5757809Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T11:53:23.5806267Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T11:53:23.5862313Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T11:53:23.5911508Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T11:53:23.5968126Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T11:53:23.6016466Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T11:53:23.6064023Z Entering 'third_party/fbgemm/external/json' 2025-09-07T11:53:23.6118366Z Entering 'third_party/flash-attention' 2025-09-07T11:53:23.6168976Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T11:53:23.6223953Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T11:53:23.6284460Z Entering 'third_party/flatbuffers' 2025-09-07T11:53:23.6337986Z Entering 'third_party/fmt' 2025-09-07T11:53:23.6390775Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T11:53:23.6440948Z Entering 'third_party/gloo' 2025-09-07T11:53:23.6492796Z Entering 'third_party/googletest' 2025-09-07T11:53:23.6542549Z Entering 'third_party/ideep' 2025-09-07T11:53:23.6590734Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T11:53:23.6650875Z Entering 'third_party/ittapi' 2025-09-07T11:53:23.6699903Z Entering 'third_party/kineto' 2025-09-07T11:53:23.6749497Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T11:53:23.6799168Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T11:53:23.6850969Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T11:53:23.6900401Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T11:53:23.6950103Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T11:53:23.6998159Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T11:53:23.7051741Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T11:53:23.7101418Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T11:53:23.7152197Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T11:53:23.7203084Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T11:53:23.7254925Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T11:53:23.7303926Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T11:53:23.7356725Z Entering 'third_party/kleidiai' 2025-09-07T11:53:23.7408001Z Entering 'third_party/mimalloc' 2025-09-07T11:53:23.7458931Z Entering 'third_party/nlohmann' 2025-09-07T11:53:23.7511161Z Entering 'third_party/onnx' 2025-09-07T11:53:23.7576564Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T11:53:23.7632120Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T11:53:23.7683445Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T11:53:23.7731521Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T11:53:23.7780120Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T11:53:23.7828514Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T11:53:23.7879822Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T11:53:23.7927672Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T11:53:23.7975397Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T11:53:23.8024316Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T11:53:23.8075044Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T11:53:23.8127952Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T11:53:23.8197908Z Entering 'third_party/pocketfft' 2025-09-07T11:53:23.8727403Z Entering 'third_party/protobuf' 2025-09-07T11:53:23.8782390Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T11:53:23.8831381Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T11:53:23.8883194Z Entering 'third_party/psimd' 2025-09-07T11:53:23.8932774Z Entering 'third_party/pthreadpool' 2025-09-07T11:53:23.8984190Z Entering 'third_party/pybind11' 2025-09-07T11:53:23.9035326Z Entering 'third_party/python-peachpy' 2025-09-07T11:53:23.9086952Z Entering 'third_party/sleef' 2025-09-07T11:53:23.9138361Z Entering 'third_party/tensorpipe' 2025-09-07T11:53:23.9187641Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T11:53:23.9236608Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T11:53:23.9283471Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T11:53:23.9331780Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T11:53:23.9379095Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T11:53:23.9456174Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-09-07T11:53:23.9481110Z http.https://github.com/.extraheader 2025-09-07T11:53:23.9492715Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-09-07T11:53:23.9667641Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-09-07T11:53:23.9943461Z Entering 'android/libs/fbjni' 2025-09-07T11:53:23.9973410Z http.https://github.com/.extraheader 2025-09-07T11:53:24.0141261Z Entering 'third_party/FP16' 2025-09-07T11:53:24.0170919Z http.https://github.com/.extraheader 2025-09-07T11:53:24.0387912Z Entering 'third_party/FXdiv' 2025-09-07T11:53:24.0416973Z http.https://github.com/.extraheader 2025-09-07T11:53:24.0794719Z Entering 'third_party/NNPACK' 2025-09-07T11:53:24.0823154Z http.https://github.com/.extraheader 2025-09-07T11:53:24.0922454Z Entering 'third_party/NVTX' 2025-09-07T11:53:24.0951044Z http.https://github.com/.extraheader 2025-09-07T11:53:24.1051040Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T11:53:24.1080787Z http.https://github.com/.extraheader 2025-09-07T11:53:24.1547135Z Entering 'third_party/XNNPACK' 2025-09-07T11:53:24.1576349Z http.https://github.com/.extraheader 2025-09-07T11:53:24.2044940Z Entering 'third_party/aiter' 2025-09-07T11:53:24.2076117Z http.https://github.com/.extraheader 2025-09-07T11:53:24.2451299Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T11:53:24.2479512Z http.https://github.com/.extraheader 2025-09-07T11:53:24.2938256Z Entering 'third_party/benchmark' 2025-09-07T11:53:24.2967628Z http.https://github.com/.extraheader 2025-09-07T11:53:24.3374151Z Entering 'third_party/composable_kernel' 2025-09-07T11:53:24.3406147Z http.https://github.com/.extraheader 2025-09-07T11:53:24.3858164Z Entering 'third_party/cpp-httplib' 2025-09-07T11:53:24.3886870Z http.https://github.com/.extraheader 2025-09-07T11:53:24.4332186Z Entering 'third_party/cpuinfo' 2025-09-07T11:53:24.4362164Z http.https://github.com/.extraheader 2025-09-07T11:53:24.4748843Z Entering 'third_party/cudnn_frontend' 2025-09-07T11:53:24.4778117Z http.https://github.com/.extraheader 2025-09-07T11:53:24.5225228Z Entering 'third_party/cutlass' 2025-09-07T11:53:24.5254693Z http.https://github.com/.extraheader 2025-09-07T11:53:24.5719405Z Entering 'third_party/fbgemm' 2025-09-07T11:53:24.5748067Z http.https://github.com/.extraheader 2025-09-07T11:53:24.6163265Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T11:53:24.6191225Z http.https://github.com/.extraheader 2025-09-07T11:53:24.6641937Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T11:53:24.6669213Z http.https://github.com/.extraheader 2025-09-07T11:53:24.7068463Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T11:53:24.7095826Z http.https://github.com/.extraheader 2025-09-07T11:53:24.7526048Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T11:53:24.7556253Z http.https://github.com/.extraheader 2025-09-07T11:53:24.8019189Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T11:53:24.8047014Z http.https://github.com/.extraheader 2025-09-07T11:53:24.8462054Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T11:53:24.8489572Z http.https://github.com/.extraheader 2025-09-07T11:53:24.8942990Z Entering 'third_party/fbgemm/external/json' 2025-09-07T11:53:24.8970995Z http.https://github.com/.extraheader 2025-09-07T11:53:24.9388654Z Entering 'third_party/flash-attention' 2025-09-07T11:53:24.9417310Z http.https://github.com/.extraheader 2025-09-07T11:53:24.9927996Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T11:53:24.9956188Z http.https://github.com/.extraheader 2025-09-07T11:53:25.0415037Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T11:53:25.0442735Z http.https://github.com/.extraheader 2025-09-07T11:53:25.0872780Z Entering 'third_party/flatbuffers' 2025-09-07T11:53:25.0903671Z http.https://github.com/.extraheader 2025-09-07T11:53:25.1346148Z Entering 'third_party/fmt' 2025-09-07T11:53:25.1374080Z http.https://github.com/.extraheader 2025-09-07T11:53:25.1801059Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T11:53:25.1829665Z http.https://github.com/.extraheader 2025-09-07T11:53:25.2303359Z Entering 'third_party/gloo' 2025-09-07T11:53:25.2332032Z http.https://github.com/.extraheader 2025-09-07T11:53:25.2744623Z Entering 'third_party/googletest' 2025-09-07T11:53:25.2773154Z http.https://github.com/.extraheader 2025-09-07T11:53:25.3227481Z Entering 'third_party/ideep' 2025-09-07T11:53:25.3256305Z http.https://github.com/.extraheader 2025-09-07T11:53:25.3708084Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T11:53:25.3735317Z http.https://github.com/.extraheader 2025-09-07T11:53:25.4170608Z Entering 'third_party/ittapi' 2025-09-07T11:53:25.4200561Z http.https://github.com/.extraheader 2025-09-07T11:53:25.4630578Z Entering 'third_party/kineto' 2025-09-07T11:53:25.4659610Z http.https://github.com/.extraheader 2025-09-07T11:53:25.5110324Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T11:53:25.5138829Z http.https://github.com/.extraheader 2025-09-07T11:53:25.5562361Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T11:53:25.5589792Z http.https://github.com/.extraheader 2025-09-07T11:53:25.6043261Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T11:53:25.6071247Z http.https://github.com/.extraheader 2025-09-07T11:53:25.6497210Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T11:53:25.6525098Z http.https://github.com/.extraheader 2025-09-07T11:53:25.6972907Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T11:53:25.7001580Z http.https://github.com/.extraheader 2025-09-07T11:53:25.7421458Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T11:53:25.7449578Z http.https://github.com/.extraheader 2025-09-07T11:53:25.7906045Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T11:53:25.7934545Z http.https://github.com/.extraheader 2025-09-07T11:53:25.8346833Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T11:53:25.8374424Z http.https://github.com/.extraheader 2025-09-07T11:53:25.8832209Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T11:53:25.8860000Z http.https://github.com/.extraheader 2025-09-07T11:53:25.9308238Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T11:53:25.9336561Z http.https://github.com/.extraheader 2025-09-07T11:53:25.9737045Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T11:53:25.9764364Z http.https://github.com/.extraheader 2025-09-07T11:53:26.0711609Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T11:53:26.0739678Z http.https://github.com/.extraheader 2025-09-07T11:53:26.2898069Z Entering 'third_party/kleidiai' 2025-09-07T11:53:26.2927637Z http.https://github.com/.extraheader 2025-09-07T11:53:26.3380506Z Entering 'third_party/mimalloc' 2025-09-07T11:53:26.3410336Z http.https://github.com/.extraheader 2025-09-07T11:53:26.3823450Z Entering 'third_party/nlohmann' 2025-09-07T11:53:26.3852766Z http.https://github.com/.extraheader 2025-09-07T11:53:26.4307731Z Entering 'third_party/onnx' 2025-09-07T11:53:26.4336655Z http.https://github.com/.extraheader 2025-09-07T11:53:26.5691881Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T11:53:26.5721598Z http.https://github.com/.extraheader 2025-09-07T11:53:26.6190465Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T11:53:26.6220317Z http.https://github.com/.extraheader 2025-09-07T11:53:26.6666169Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T11:53:26.6693045Z http.https://github.com/.extraheader 2025-09-07T11:53:26.7144095Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T11:53:26.7171208Z http.https://github.com/.extraheader 2025-09-07T11:53:26.7562705Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T11:53:26.7590637Z http.https://github.com/.extraheader 2025-09-07T11:53:26.8040789Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T11:53:26.8068522Z http.https://github.com/.extraheader 2025-09-07T11:53:26.8519849Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T11:53:26.8547346Z http.https://github.com/.extraheader 2025-09-07T11:53:26.8932646Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T11:53:26.8960275Z http.https://github.com/.extraheader 2025-09-07T11:53:26.9417587Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T11:53:26.9447031Z http.https://github.com/.extraheader 2025-09-07T11:53:26.9894095Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T11:53:26.9921518Z http.https://github.com/.extraheader 2025-09-07T11:53:27.0345685Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T11:53:27.0372651Z http.https://github.com/.extraheader 2025-09-07T11:53:27.0826000Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T11:53:27.0854370Z http.https://github.com/.extraheader 2025-09-07T11:53:27.1261603Z Entering 'third_party/pocketfft' 2025-09-07T11:53:27.1290833Z http.https://github.com/.extraheader 2025-09-07T11:53:27.1718479Z Entering 'third_party/protobuf' 2025-09-07T11:53:27.1747588Z http.https://github.com/.extraheader 2025-09-07T11:53:27.2198843Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T11:53:27.2226148Z http.https://github.com/.extraheader 2025-09-07T11:53:27.2630152Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T11:53:27.2658174Z http.https://github.com/.extraheader 2025-09-07T11:53:27.3121055Z Entering 'third_party/psimd' 2025-09-07T11:53:27.3150126Z http.https://github.com/.extraheader 2025-09-07T11:53:27.3597709Z Entering 'third_party/pthreadpool' 2025-09-07T11:53:27.3628164Z http.https://github.com/.extraheader 2025-09-07T11:53:27.4049090Z Entering 'third_party/pybind11' 2025-09-07T11:53:27.4078065Z http.https://github.com/.extraheader 2025-09-07T11:53:27.4532056Z Entering 'third_party/python-peachpy' 2025-09-07T11:53:27.4561132Z http.https://github.com/.extraheader 2025-09-07T11:53:27.4946296Z Entering 'third_party/sleef' 2025-09-07T11:53:27.4975027Z http.https://github.com/.extraheader 2025-09-07T11:53:27.5427398Z Entering 'third_party/tensorpipe' 2025-09-07T11:53:27.5455855Z http.https://github.com/.extraheader 2025-09-07T11:53:27.5902781Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T11:53:27.5935274Z http.https://github.com/.extraheader 2025-09-07T11:53:27.6334405Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T11:53:27.6362102Z http.https://github.com/.extraheader 2025-09-07T11:53:27.6798989Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T11:53:27.6826551Z http.https://github.com/.extraheader 2025-09-07T11:53:27.7277717Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T11:53:27.7307377Z http.https://github.com/.extraheader 2025-09-07T11:53:27.7730594Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T11:53:27.7759407Z http.https://github.com/.extraheader 2025-09-07T11:53:27.8615583Z Post job cleanup. 2025-09-07T11:53:27.9518933Z [command]/usr/bin/git version 2025-09-07T11:53:27.9557436Z git version 2.50.1 2025-09-07T11:53:27.9595971Z Temporarily overriding HOME='/home/charlie/_work/_temp/1c7b41b7-04ae-4b2f-9413-982acfcd21e3' before making global git config changes 2025-09-07T11:53:27.9596629Z Adding repository directory to the temporary git global config as a safe directory 2025-09-07T11:53:27.9600628Z [command]/usr/bin/git config --global --add safe.directory /home/charlie/_work/pytorch/pytorch 2025-09-07T11:53:28.0051931Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-09-07T11:53:28.0086765Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-09-07T11:53:28.0364197Z Entering 'android/libs/fbjni' 2025-09-07T11:53:28.0415692Z Entering 'third_party/FP16' 2025-09-07T11:53:28.0465017Z Entering 'third_party/FXdiv' 2025-09-07T11:53:28.0514262Z Entering 'third_party/NNPACK' 2025-09-07T11:53:28.0564512Z Entering 'third_party/NVTX' 2025-09-07T11:53:28.0617310Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T11:53:28.0669435Z Entering 'third_party/XNNPACK' 2025-09-07T11:53:28.0733958Z Entering 'third_party/aiter' 2025-09-07T11:53:28.0787494Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T11:53:28.0846960Z Entering 'third_party/benchmark' 2025-09-07T11:53:28.0898171Z Entering 'third_party/composable_kernel' 2025-09-07T11:53:28.0956484Z Entering 'third_party/cpp-httplib' 2025-09-07T11:53:28.1008210Z Entering 'third_party/cpuinfo' 2025-09-07T11:53:28.1059626Z Entering 'third_party/cudnn_frontend' 2025-09-07T11:53:28.1110845Z Entering 'third_party/cutlass' 2025-09-07T11:53:28.1170030Z Entering 'third_party/fbgemm' 2025-09-07T11:53:28.1223910Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T11:53:28.1272392Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T11:53:28.1327714Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T11:53:28.1377270Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T11:53:28.1432004Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T11:53:28.1478931Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T11:53:28.1524615Z Entering 'third_party/fbgemm/external/json' 2025-09-07T11:53:28.1577296Z Entering 'third_party/flash-attention' 2025-09-07T11:53:28.1630033Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T11:53:28.1686336Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T11:53:28.1746018Z Entering 'third_party/flatbuffers' 2025-09-07T11:53:28.1799256Z Entering 'third_party/fmt' 2025-09-07T11:53:28.1849266Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T11:53:28.1900568Z Entering 'third_party/gloo' 2025-09-07T11:53:28.1953048Z Entering 'third_party/googletest' 2025-09-07T11:53:28.2003486Z Entering 'third_party/ideep' 2025-09-07T11:53:28.2052805Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T11:53:28.2108726Z Entering 'third_party/ittapi' 2025-09-07T11:53:28.2160913Z Entering 'third_party/kineto' 2025-09-07T11:53:28.2210716Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T11:53:28.2260264Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T11:53:28.2308747Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T11:53:28.2355639Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T11:53:28.2404555Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T11:53:28.2450978Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T11:53:28.2503158Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T11:53:28.2549889Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T11:53:28.2597210Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T11:53:28.2644394Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T11:53:28.2695824Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T11:53:28.2743145Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T11:53:28.2794672Z Entering 'third_party/kleidiai' 2025-09-07T11:53:28.2845505Z Entering 'third_party/mimalloc' 2025-09-07T11:53:28.2897077Z Entering 'third_party/nlohmann' 2025-09-07T11:53:28.2950626Z Entering 'third_party/onnx' 2025-09-07T11:53:28.3016684Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T11:53:28.3071648Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T11:53:28.3122787Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T11:53:28.3170973Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T11:53:28.3218094Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T11:53:28.3265991Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T11:53:28.3314556Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T11:53:28.3361905Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T11:53:28.3409392Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T11:53:28.3458214Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T11:53:28.3507959Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T11:53:28.3559472Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T11:53:28.3628233Z Entering 'third_party/pocketfft' 2025-09-07T11:53:28.3678493Z Entering 'third_party/protobuf' 2025-09-07T11:53:28.3730976Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T11:53:28.3779575Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T11:53:28.3830820Z Entering 'third_party/psimd' 2025-09-07T11:53:28.3880255Z Entering 'third_party/pthreadpool' 2025-09-07T11:53:28.3930908Z Entering 'third_party/pybind11' 2025-09-07T11:53:28.3980844Z Entering 'third_party/python-peachpy' 2025-09-07T11:53:28.4030745Z Entering 'third_party/sleef' 2025-09-07T11:53:28.4082864Z Entering 'third_party/tensorpipe' 2025-09-07T11:53:28.4132760Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T11:53:28.4180516Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T11:53:28.4227947Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T11:53:28.4275132Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T11:53:28.4321418Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T11:53:28.4396407Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-09-07T11:53:28.4429816Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-09-07T11:53:28.4698997Z Entering 'android/libs/fbjni' 2025-09-07T11:53:28.4749593Z Entering 'third_party/FP16' 2025-09-07T11:53:28.4799716Z Entering 'third_party/FXdiv' 2025-09-07T11:53:28.4849348Z Entering 'third_party/NNPACK' 2025-09-07T11:53:28.4898406Z Entering 'third_party/NVTX' 2025-09-07T11:53:28.4949848Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T11:53:28.4999043Z Entering 'third_party/XNNPACK' 2025-09-07T11:53:28.5061917Z Entering 'third_party/aiter' 2025-09-07T11:53:28.5113613Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T11:53:28.5172769Z Entering 'third_party/benchmark' 2025-09-07T11:53:28.5224134Z Entering 'third_party/composable_kernel' 2025-09-07T11:53:28.5283453Z Entering 'third_party/cpp-httplib' 2025-09-07T11:53:28.5335805Z Entering 'third_party/cpuinfo' 2025-09-07T11:53:28.5387294Z Entering 'third_party/cudnn_frontend' 2025-09-07T11:53:28.5439045Z Entering 'third_party/cutlass' 2025-09-07T11:53:28.5498873Z Entering 'third_party/fbgemm' 2025-09-07T11:53:28.5551972Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T11:53:28.5601139Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T11:53:28.5655685Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T11:53:28.5703598Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T11:53:28.5759603Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T11:53:28.5807973Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T11:53:28.5855361Z Entering 'third_party/fbgemm/external/json' 2025-09-07T11:53:28.5907443Z Entering 'third_party/flash-attention' 2025-09-07T11:53:28.5957629Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T11:53:28.6011025Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T11:53:28.6068832Z Entering 'third_party/flatbuffers' 2025-09-07T11:53:28.6122285Z Entering 'third_party/fmt' 2025-09-07T11:53:28.6173283Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T11:53:28.6224445Z Entering 'third_party/gloo' 2025-09-07T11:53:28.6274656Z Entering 'third_party/googletest' 2025-09-07T11:53:28.6326191Z Entering 'third_party/ideep' 2025-09-07T11:53:28.6376206Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T11:53:28.6432374Z Entering 'third_party/ittapi' 2025-09-07T11:53:28.6483357Z Entering 'third_party/kineto' 2025-09-07T11:53:28.6533951Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T11:53:28.6581706Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T11:53:28.6630904Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T11:53:28.6679417Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T11:53:28.6727234Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T11:53:28.6773627Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T11:53:28.6825919Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T11:53:28.6873527Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T11:53:28.6921751Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T11:53:28.6970979Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T11:53:28.7021847Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T11:53:28.7069784Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T11:53:28.7120221Z Entering 'third_party/kleidiai' 2025-09-07T11:53:28.7170750Z Entering 'third_party/mimalloc' 2025-09-07T11:53:28.7222810Z Entering 'third_party/nlohmann' 2025-09-07T11:53:28.7275172Z Entering 'third_party/onnx' 2025-09-07T11:53:28.7340529Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T11:53:28.7393912Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T11:53:28.7444871Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T11:53:28.7492198Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T11:53:28.7538104Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T11:53:28.7584294Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T11:53:28.7634221Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T11:53:28.7683969Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T11:53:28.7732470Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T11:53:28.7781949Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T11:53:28.7831129Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T11:53:28.7882731Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T11:53:28.7949082Z Entering 'third_party/pocketfft' 2025-09-07T11:53:28.7999721Z Entering 'third_party/protobuf' 2025-09-07T11:53:28.8052947Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T11:53:28.8100648Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T11:53:28.8150805Z Entering 'third_party/psimd' 2025-09-07T11:53:28.8202501Z Entering 'third_party/pthreadpool' 2025-09-07T11:53:28.8253259Z Entering 'third_party/pybind11' 2025-09-07T11:53:28.8303989Z Entering 'third_party/python-peachpy' 2025-09-07T11:53:28.8354575Z Entering 'third_party/sleef' 2025-09-07T11:53:28.8404999Z Entering 'third_party/tensorpipe' 2025-09-07T11:53:28.8455999Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T11:53:28.8504345Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T11:53:28.8551659Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T11:53:28.8600374Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T11:53:28.8646941Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T11:53:28.8845315Z Cleaning up orphan processes